# Numpy
- A linear algebra library for python
- Numpy is the basis of many python libraries 
- Built on C library, so it is a fast library

Numpy should already be installed with our Anoconda install, so we just need to import it. 

But if it wasn't then we could install it
- If we have an anaconda build
  > condo install numpy
- With any python build
  > pip install numpy


In [2]:
import numpy as np

# Numpy Arrays

Numpy arrays come in two flavors
- vectors (1D)
- Matrices (1D and 2D) i.e. rows and columns

We can cast a list to a 1D numpy array

In [9]:
#array
my_list = [0,1,2]
arr = np.array(my_list)
type(arr)
arr

array([0, 1, 2])

or a matrix

In [12]:
my_mat = [[1,2,3],[4,5,6],[7,8,9]]
type(my_mat)

list

In [13]:
np.array(my_mat)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

**Double** brackets indicate it is a 2D array. **Single** brackets indicate 1D array.

To create an numpy array

In [16]:
np.arange(0,10) # 1 is the default step size

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]:
np.arange(0,10,2)

array([0, 2, 4, 6, 8])

We can also generate arrays of 0's or 1's easily

In [19]:
np.zeros(3)

array([0., 0., 0.])

In [23]:
np.zeros((3,5)) # row, col

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [24]:
np.ones(3)

array([1., 1., 1.])

another useful method is **linspace**, which creates a specified number of equally spaced points
> linspace(start,stop,npoints)

In [25]:
np.linspace(0,5,10) 

array([0.        , 0.55555556, 1.11111111, 1.66666667, 2.22222222,
       2.77777778, 3.33333333, 3.88888889, 4.44444444, 5.        ])

### What is the difference between the arange and linspace methods?

In linear algebra, the identidy matrix is important numpy can also do this and much more which we won't cover

In [26]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

# Random Numbers

Numpy has many methods for generating random numbers. The methods are accessable from numpy.random

- rand: uniform distribution from 0 to 1
- randn: normal distribution centered at zero
- randint: random integer from [start,stop) *stop value not include*
- many more

In [27]:
np.random.rand(5)

array([0.44561722, 0.04856497, 0.86238955, 0.16575437, 0.53901013])

In [28]:
np.random.rand(5,5)

array([[0.83496881, 0.19574695, 0.18788935, 0.05012382, 0.48671393],
       [0.45142282, 0.1235921 , 0.77097698, 0.85203922, 0.11174791],
       [0.32805848, 0.03259499, 0.21565559, 0.00786832, 0.89443875],
       [0.32224149, 0.82665696, 0.67035539, 0.09461771, 0.07334421],
       [0.14434513, 0.14824388, 0.57652356, 0.95340034, 0.50140387]])

In [29]:
np.random.randn(5)

array([ 1.22876045, -0.19585896, -2.95206082, -0.3069403 ,  0.95951346])

In [4]:
np.random.randn(5,5)

array([[-1.10482163,  0.65053944,  0.53454238,  0.23966709,  0.8680239 ],
       [ 1.24642423,  1.19916597, -1.12816648, -0.67411787,  0.40705119],
       [-0.18820111,  1.06150163, -0.07287321, -0.48524395,  0.32635882],
       [-1.29879   , -1.70439819,  0.83969219,  1.07144719,  0.82062611],
       [ 0.8906226 ,  0.90804277,  2.1886475 , -1.44802826, -0.8021044 ]])

In [32]:
np.random.randint(1,100,3) #100 not included

array([86, 52, 32])

## Reshape

We can reshape our arrays, i.e. change the number of rows and columns. We just need to keep the product of rows and columns equal to the original array size.

In [34]:
arr = np.arange(25)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

In [37]:
arr.reshape(5,5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

In [38]:
arr.reshape(4,5) 

ValueError: cannot reshape array of size 25 into shape (4,5)

Other helpful methods are max,min which return the max and min array value

In [43]:
arr = np.random.randint(5,10,10)
arr

array([5, 7, 8, 5, 8, 5, 6, 7, 9, 5])

In [44]:
arr.max()

9

In [45]:
arr.min()

5

We can also find the location of those values using argmin and argmax. These return the index location (starting at 0)

In [47]:
arr.argmax()

8

In [49]:
arr.argmin()

0

When working with random numbers, it is sometimes convient to always generate the same random numbers when running your code. To do this we can specify a *seed* for the random number generator.

In [55]:
np.random.randint(0,10,10)

array([9, 7, 3, 5, 7, 7, 1, 7, 1, 2])

In [62]:
np.random.seed(10) #any int
np.random.randint(0,10,10)

array([9, 4, 0, 1, 9, 0, 1, 8, 9, 0])

# Attributes
dtype and shape are also numpy attributes which return the data type and shape of the array

In [103]:
arr1 = np.arange(25)
arr2 = arr1.reshape(5,5)

In [104]:
arr1.shape

(25,)

In [105]:
arr2.shape

(5, 5)

In [106]:
arr1.dtype

dtype('int64')

In [107]:
arr2.dtype

dtype('int64')

In [108]:
arr3 = np.arange(25.0)
arr3.dtype

dtype('float64')

## Indexing
Indexing works a lot like it does with lists, i.e. brackets and slices

In [112]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [113]:
arr[3]

3

In [114]:
arr[0:5]

array([0, 1, 2, 3, 4])

In [115]:
arr[0:3] = -1

In [116]:
arr

array([-1, -1, -1,  3,  4,  5,  6,  7,  8,  9])

In [117]:
arr_new = arr

In [118]:
arr_new

array([-1, -1, -1,  3,  4,  5,  6,  7,  8,  9])

In [119]:
arr_new[0:3] = 1

In [120]:
arr_new

array([1, 1, 1, 3, 4, 5, 6, 7, 8, 9])

In [121]:
arr

array([1, 1, 1, 3, 4, 5, 6, 7, 8, 9])

By default numpy works with references to the arrays, this saves memory with large arrays. So if you can assing the array a new name and any modifications to it also apply to the original array. To make a second independent one, use the **copy** method.

In [122]:
arr

array([1, 1, 1, 3, 4, 5, 6, 7, 8, 9])

In [123]:
new_arr = arr.copy()

In [124]:
new_arr[0:3] = 0

In [125]:
arr

array([1, 1, 1, 3, 4, 5, 6, 7, 8, 9])

In [126]:
new_arr

array([0, 0, 0, 3, 4, 5, 6, 7, 8, 9])

# 2D Arrays

Bracket notation corresponds to 
>array[row,col]

In [139]:
arr_2d = np.array([[0,1,2],[3,4,5],[6,7,8]])
arr_2d

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [140]:
arr_2d[0,1] 

1

In [141]:
arr_2d[2]

array([6, 7, 8])

In [142]:
arr_2d[:,:]

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [143]:
arr_2d[:,1] #col 1

array([1, 4, 7])

In [144]:
arr_2d[:,0] #col 0

array([0, 3, 6])

In [147]:
arr_2d[:2,1:] #upto but not including

array([[1, 2],
       [4, 5]])

In [149]:
arr_2d[1:2,1:2]

array([[4]])

In [150]:
arr_2d[1:2,0:2]

array([[3, 4]])

# Comparison Operators
We can use them to conditionally select elements in an array

In [153]:
arr = np.arange(1,11)
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [155]:
bool_arr = arr > 5
bool_arr

array([False, False, False, False, False,  True,  True,  True,  True,
        True])

In [156]:
arr[bool_arr]

array([ 6,  7,  8,  9, 10])

This can be done in one line

In [157]:
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [158]:
arr[arr>5]

array([ 6,  7,  8,  9, 10])

# Operations
- Array with Array 
- Array with scalars
- Universal array functions 

In [160]:
arr = np.arange(0,11)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

Array with Array

In [161]:
arr + arr

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

In [162]:
arr * arr

array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100])

Array with scalar: broadcast the scalar to the array

In [163]:
arr +10

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])

In [164]:
arr *10

array([  0,  10,  20,  30,  40,  50,  60,  70,  80,  90, 100])

In [165]:
arr**2

array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100])

Native python and numpy handle division and infinities differently

In [169]:
0/0

ZeroDivisionError: division by zero

In [168]:
arr/arr

  """Entry point for launching an IPython kernel.


array([nan,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

In [170]:
1/0

ZeroDivisionError: division by zero

In [167]:
(arr+1)/arr

  """Entry point for launching an IPython kernel.


array([       inf, 2.        , 1.5       , 1.33333333, 1.25      ,
       1.2       , 1.16666667, 1.14285714, 1.125     , 1.11111111,
       1.1       ])

Universal Functions
- Many universal functions: https://numpy.org/doc/stable/reference/ufuncs.html
- $log$ and $ln$ are one of those definitons that has no set standard. In nupy 
> np.log()

is the natrual log, $ln$
- There are also a lot of functions in the python math module: https://docs.python.org/3/library/math.html
  - can access via numpy.math
  - or importing the math module and using that
  > import math \
  > math.log()
  - native python math module can not directly broadcast across numpy arrays

In [171]:
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [176]:
np.log(np.exp(arr))

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

In [172]:
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ,
       3.16227766])

In [173]:
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03, 2.20264658e+04])

In [174]:
np.log(arr)

  """Entry point for launching an IPython kernel.


array([      -inf, 0.        , 0.69314718, 1.09861229, 1.38629436,
       1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458,
       2.30258509])

In [182]:
import math
math.log(arr)

TypeError: only size-1 arrays can be converted to Python scalars

In [183]:
math.log(arr[2])

0.6931471805599453

# More Methods
- There are various sums that can be done on arrays
- And basic statistics 

In [184]:
my_mat = np.random.randint(0,100,100)

In [185]:
my_mat

array([88, 11, 17, 46,  7, 75, 28, 33, 84, 96, 88, 44,  5,  4, 71, 88, 88,
       50, 54, 34, 15, 77, 88, 15,  6, 85, 22, 11, 12, 92, 96, 62, 57, 79,
       42, 57, 97, 50, 45, 40, 89, 73, 37,  0, 18, 23,  3, 29, 16, 84, 82,
       14, 51, 79, 17, 50, 53, 25, 48, 17, 32, 81, 80, 41, 90, 12, 30, 81,
       17, 16,  0, 31, 73, 64, 38, 22, 96, 66, 67, 62, 95, 99, 27, 82, 62,
       77, 48, 93, 75, 86, 37, 11, 21, 33, 95, 43, 88, 96, 73, 40])

In [189]:
my_mat = my_mat.reshape(10,10)

In [190]:
my_mat

array([[88, 11, 17, 46,  7, 75, 28, 33, 84, 96],
       [88, 44,  5,  4, 71, 88, 88, 50, 54, 34],
       [15, 77, 88, 15,  6, 85, 22, 11, 12, 92],
       [96, 62, 57, 79, 42, 57, 97, 50, 45, 40],
       [89, 73, 37,  0, 18, 23,  3, 29, 16, 84],
       [82, 14, 51, 79, 17, 50, 53, 25, 48, 17],
       [32, 81, 80, 41, 90, 12, 30, 81, 17, 16],
       [ 0, 31, 73, 64, 38, 22, 96, 66, 67, 62],
       [95, 99, 27, 82, 62, 77, 48, 93, 75, 86],
       [37, 11, 21, 33, 95, 43, 88, 96, 73, 40]])

In [191]:
my_mat.sum()

5147

In [192]:
my_mat.std()

30.305925163241593

In [193]:
my_mat.cumsum()

array([  88,   99,  116,  162,  169,  244,  272,  305,  389,  485,  573,
        617,  622,  626,  697,  785,  873,  923,  977, 1011, 1026, 1103,
       1191, 1206, 1212, 1297, 1319, 1330, 1342, 1434, 1530, 1592, 1649,
       1728, 1770, 1827, 1924, 1974, 2019, 2059, 2148, 2221, 2258, 2258,
       2276, 2299, 2302, 2331, 2347, 2431, 2513, 2527, 2578, 2657, 2674,
       2724, 2777, 2802, 2850, 2867, 2899, 2980, 3060, 3101, 3191, 3203,
       3233, 3314, 3331, 3347, 3347, 3378, 3451, 3515, 3553, 3575, 3671,
       3737, 3804, 3866, 3961, 4060, 4087, 4169, 4231, 4308, 4356, 4449,
       4524, 4610, 4647, 4658, 4679, 4712, 4807, 4850, 4938, 5034, 5107,
       5147])

We can also sum up only rows or columns by specifying which axis to sum over
![axis.png](attachment:axis.png)

In [198]:
my_arr = np.arange(1,7)
my_arr = my_arr.reshape(3,2)
my_arr

array([[1, 2],
       [3, 4],
       [5, 6]])

In [199]:
my_arr.sum(axis=0)

array([ 9, 12])

In [200]:
my_arr.sum(axis=1)

array([ 3,  7, 11])

In [201]:
my_mat.sum(axis=0)

array([622, 503, 456, 443, 446, 532, 553, 534, 491, 567])

In [202]:
my_mat.sum(axis=1)

array([485, 526, 423, 625, 372, 436, 480, 519, 744, 537])

In [203]:
my_mat.std(axis=0)

array([35.07648785, 30.81898765, 27.17057232, 29.67170369, 31.66133288,
       26.41136119, 32.89696035, 28.18226393, 25.14935387, 29.54335797])