<font size = 6> <b> Numpy! </b> </font>

In [1]:
import numpy as np
from numpy import random

<font size = 5> <b> Creating Numpy Arrays </b> </font> <br>
<font size = 3> <b> Method 1: Numpy Array from Python List </b></font>

In [2]:
list1=[1,2,3,4,5]
npArr1 = np.array(list1, dtype=np.int16) #type specification is optional
print("npArr1: \n", npArr1)
list2=[[1,2,3],[4,5,6],[7,8,9]]
npArr2=np.array(list2)
print("npArr2: \n", npArr2)

npArr1: 
 [1 2 3 4 5]
npArr2: 
 [[1 2 3]
 [4 5 6]
 [7 8 9]]


<font size = 3><b> Method 2: Using arange and linspace </b> </font> <br><br>
The arange(start, stop, step) constructs a np array from the starting position (inclusive) to the ending position (exclusive) in increments defined by the step size. Default step size = 1. Compared to linspace, arange gives the user more control over the step size. <br><br>
The linspace(start, stop, array length) constructs a np array from the starting position (inclusive) to the ending position (inclusive) with equally spaced value. The returned array will have the specified number of elements. Compared to arange, linspace gives the user more control over the start and stop values.

In [3]:
print("arange:", np.arange(1,10, 2))
print("linspace:", np.linspace(0,10,4))

arange: [1 3 5 7 9]
linspace: [ 0.          3.33333333  6.66666667 10.        ]


<font size = 3><b> Method 3: Using ones and zeros </b> </font>

In [4]:
print(np.zeros(2))
print(np.zeros((2,2)))
print(np.ones((3,4)).size)
print(np.ones((3,4)).shape)
print(np.ones((3,4)).dtype)

[0. 0.]
[[0. 0.]
 [0. 0.]]
12
(3, 4)
float64


<font size = 3><b> Method 4: Creating random arrays</b> </font><br><br>
The np.random.randint(from, to, size) function generates an array of size n with integers values ranging from (inclusive) and to (exclusive) the first two arguments. Default from = 0, size = 1.<br><br>

In [5]:
print(np.random.randint(10,21,10)) #from (inclusive), to (exclusive), n
print(np.random.randint(10,21, size=(2,2)))

[15 18 10 17 20 14 20 14 20 15]
[[16 10]
 [16 14]]


The np.random.rand(n) function creates an array of size n with random values from a uniform distribution from 0 to 1. Default n=1.

In [6]:
print(np.random.rand())
print(np.random.rand(2,2))

0.7064390165996903
[[0.39935716 0.32137261]
 [0.42840504 0.54052851]]


The np.random.choice(a, size=None, replace=True, p=None) function randomly chooses elements from a given input a.

In [7]:
input = [1,2,3,4,5]
print(np.random.choice(input, size=(2,2), replace=True, p=[0.1,0.1,0.1,0.1,0.6]))

[[5 3]
 [3 3]]


In [8]:
arr=np.array([1,2,3,4,5])
np.random.shuffle(arr)
print(arr)

[3 1 4 5 2]


The np.random.permutation(arr) function returns a permutation of the elements in the input array. Unlike the shuffle function, the permutation function does not affect the original array.

In [9]:
arr=np.array([1,2,3,4,5])
print(np.random.permutation(arr))

[4 1 5 3 2]


The np.random.normal(loc=0, scale=1, size=None) returns random values from a normal distruction with mean=loc and standard deviation = 1.

In [10]:
print(np.random.normal(2,0.1, size=(3,3)))

[[1.84217188 1.91588264 1.98874021]
 [1.99113559 1.98770448 2.02349518]
 [2.00258734 2.05462299 1.95375071]]


<font size = 5> <b> Slicing, Indexing and Reshaping Arrays </b> </font> <br>

There are two ways to access an item in a numpy array. We can use standard indexing, or we can use the np.item function. Note that indexing will allow us to extract a row or column, but the np.item function will only allow us to access a single item, not a row.

In [11]:
arr=np.array([[1,2],[3,4]])
print(arr[0,0])
print(arr[0])
print(arr.item(0,1))

1
[1 2]
2


We can also index an array with another array.

In [86]:
arr1=np.arange(10)+5
arr2=np.array([4,1,2])
print("arr1:", arr1)
print("arr2:", arr2)
print("arr1[arr2]:", arr1[arr2])

arr1: [ 5  6  7  8  9 10 11 12 13 14]
arr2: [4 1 2]
arr1[arr2]: [9 6 7]


We can slice a matrix using the from:to:by syntax as follows.

In [159]:
arr=np.array([[1,2,3], [4,5,6], [7,8,9]])
print("Original array")
print(arr)
print("Only columns 1 and 2: arr[:, 0:2]")
print(arr[:, 0:2]) #all rows, only cols 0 and 1
print("Reverse the order of the rows, not the columns: arr[::-1]")
print(arr[::-1]) #reverses the order of the rows, not the columns
print("Reverse the order of the columns, not the rows: arr[:, ::-1]")
print(arr[:, ::-1]) #reverses the order of the columns, not the rows
print("Reverse both: arr[::-1, ::-1]")
print(arr[::-1, ::-1]) #reverses both

Original array
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Only columns 1 and 2: arr[:, 0:2]
[[1 2]
 [4 5]
 [7 8]]
Reverse the order of the rows, not the columns: arr[::-1]
[[7 8 9]
 [4 5 6]
 [1 2 3]]
Reverse the order of the columns, not the rows: arr[:, ::-1]
[[3 2 1]
 [6 5 4]
 [9 8 7]]
Reverse both: arr[::-1, ::-1]
[[9 8 7]
 [6 5 4]
 [3 2 1]]


We can also index an array by using a boolean matrix which will filter the elements. Note that this will flatten the array.

In [160]:
arr=np.array([[1,2,3], [4,5,6], [7,8,9]])
arr[arr%2==0]

array([2, 4, 6, 8])

We can also filter out items by using a boolean mask. To do this, we create a np ones array, , and index our desired array with the boolean mask.

In [161]:
arr=np.arange(12) + 1
print(arr)
mask=np.ones(len(arr), dtype=bool)
print(mask)
mask[2::3]=False
print(mask)
print(arr[mask])

[ 1  2  3  4  5  6  7  8  9 10 11 12]
[ True  True  True  True  True  True  True  True  True  True  True  True]
[ True  True False  True  True False  True  True False  True  True False]
[ 1  2  4  5  7  8 10 11]


There are two ways to set an item in a numpy array. We can use standard assignment with the '=' operator, or we can use the np.itemset function.

In [12]:
arr=np.zeros((3,3), dtype=int)
arr[0,0] = 1
arr.itemset(0,1,1) #(row, col, val)
print(arr)

[[1 1 0]
 [0 0 0]
 [0 0 0]]


The np.take function allows us to extract multiple items at once. The array is "flattened", and each element is indexed with a single value.

In [13]:
arr=np.array([[1,2],[3,4]])
arr.take([1,2])

array([2, 3])

The np.put function allows us to set multiple items at once. The array is "flattened", and we provide first the indicies of the elements to change, and then the values to change them to.

In [14]:
arr=np.zeros((3,3))
arr.put([1,5,8], [10,10,10])
print(arr)

[[ 0. 10.  0.]
 [ 0.  0. 10.]
 [ 0.  0. 10.]]


We can observe the .shape and .size attributes of an array.

In [15]:
print(arr.size)
print(arr.shape)

9
(3, 3)


We can slice a numpy array with the \[from:to:by\] syntax. Note that this _includes_ the starting point and _excludes_ the ending point. The default is \[beginning:end:1\].

In [16]:
arr2=np.array([1,2,3,4,5,6,7,8,9,0])
print("arr2: ", arr2)
print(arr2[:5:2])

arr2:  [1 2 3 4 5 6 7 8 9 0]
[1 3 5]


We can slice a multidimension array as follows. If only one list-like object is present, such as in arr\[1:3] then the slicing applies to rows (the outermost dimension) only. If we wish to select specific columns and all rows, then we must specify that, such as with the syntax arr\[:, 1:3\] where the first ":" indicates that we will use all rows, and the "1:3" indicates that we will use the second and third columns.

In [17]:
arr3=np.array([[1,2,3], [4,5,6], [7,8,9]])
print("arr3\n",arr3)
print("arr3[:, 0:2]\n", arr3[:, 0:2]) #all rows, only cols 0 and 1
print("arr3[::-1]\n", arr3[::-1]) #reverses the order of the rows, not the columns
print("arr3[::-1, ::-1]\n", arr3[::-1, ::-1]) #reverses both

arr3
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
arr3[:, 0:2]
 [[1 2]
 [4 5]
 [7 8]]
arr3[::-1]
 [[7 8 9]
 [4 5 6]
 [1 2 3]]
arr3[::-1, ::-1]
 [[9 8 7]
 [6 5 4]
 [3 2 1]]


We can also index an array by passing a boolean array. This returns a _flattened_ array with values from the original array that correspond to True elements in the indexing array.

In [18]:
print("Evens only")
evens=arr3[arr3%2 == 0]
print(evens)

Evens only
[2 4 6 8]


We can filter out the unique elements of an array with the .unique() function.

In [19]:
np.unique(arr3)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

We can reshape or resize arrays with the .reshape and .resize functions. The .reshape() function returns a reshaped array but does not modify the original values, whereas the .resize() function does modify the original values. The array is flattened, and values are filled into the new shape in the order of the flattened array. Note that with the reshape function, we must pass a new shape that maintains the original size of the array. However, with the resize function, we can truncate the array by passing a new size that is smaller than the original array.

In [20]:
arr3=np.array([[1,2,3], [4,5,6], [7,8,9]])
print(arr3.reshape((3,3))) #returns reshaped array but does not modify original values
print(arr3)
arr3.resize(1,8)
print(arr3)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[1 2 3 4 5 6 7 8]]


We can swap axes with the .swapaxes() function. For 2D arrays, this is equivalent to taking the transpose of the matrix.

In [21]:
arr3.swapaxes(0,1)
print(arr3)

[[1 2 3 4 5 6 7 8]]


We can flatten a multidimensional array with the .flatten() function. By default, we flatten rows first, so the output 1D array contains the first row, then the second row, and so on until the last row. We can also flatten by columns by passing 'F' to the flatten function. Note that this function does not modify the original array.

In [22]:
arr3=np.array([[1,2,3], [4,5,6], [7,8,9]])
print(arr3.flatten())
print(arr3.flatten("F"))

[1 2 3 4 5 6 7 8 9]
[1 4 7 2 5 8 3 6 9]


We can sort the items in each row or column using the .sort() function. The axis argument of this function represents whether the rows or columns are sorted. If axis = 0, then we sort the items in each row. If axis = 2, we do columns.

In [28]:
arr4 = np.array([[5,1,3], [2,4,1], [6,0,3]])
arr5 = arr4.copy()
print(arr5)
arr5.sort(axis=0)
print(arr5)
arr4.sort(axis=1)
print(arr4)

[[5 1 3]
 [2 4 1]
 [6 0 3]]
[[2 0 1]
 [5 1 3]
 [6 4 3]]
[[1 3 5]
 [1 2 4]
 [0 3 6]]


<font size = 5> <b> Stacking and Splitting Arrays </b> </font> <br>

We can "stack" two matricies together using .hstack and .vstack.

In [29]:
arr1 = np.random.randint(10, size=(2,2)) #ints [0,10)
print("arr1\n", arr1)
arr2 = np.random.randint(10, size=(2,2))
print("arr2\n", arr2)
print("vertical stack:\n", np.vstack((arr1, arr2)))
print("horizontal stack:\n", np.hstack((arr1, arr2)))

arr1
 [[4 9]
 [2 0]]
arr2
 [[5 4]
 [5 9]]
vertical stack:
 [[4 9]
 [2 0]
 [5 4]
 [5 9]]
horizontal stack:
 [[4 9 5 4]
 [2 0 5 9]]


We can also stack matricies using the column_stack and row_stack functions. These functions take a tuple of two  1D arrays as an input, and returns an array of the two input arrays stacked as either columns or rows.

In [43]:
print(np.row_stack(([1,2,3],[4,5,6])))
print(np.column_stack(([1,2,3],[4,5,6])))

[[1 2 3]
 [4 5 6]]
[[1 4]
 [2 5]
 [3 6]]


We can delete a row or column using the .delete function. The .delete(arr, obj, axis) function returns an array with sub-arrays along an axis deleted. If axis=0, then we delete a row, whereas if axis=1, we delete a column. The obj attribute determines which rows or columns we delete. Note that this function does not affect the original array.

In [36]:
print("arr1:\n ", arr1)
arr2=np.delete(arr1, 0, 0)
print("delete(arr1,0,0):\n", arr2)
arr3=np.delete(arr1, 1, 1)
print("delete(arr1,1,1):\n", arr3)

arr1:
  [[4 9]
 [2 0]]
delete(arr1,0,0):
 [[2 0]]
delete(arr1,1,1):
 [[4]
 [2]]


We can split a matrix by columns or rows using the hsplit(arr, index or section) and vsplit(arr, index or section) functions. The second argument to these functions can be either the number of subarrays to split the array into, or the indexes of the ending positions of the intermediary subarrays.

In [52]:
arr = np.random.randint(10, size=(2,10))
print(arr)
print("hsplit:",np.hsplit(arr, 5))
print("hsplit:", np.hsplit(arr, [2,8]))
print("vsplit:", np.vsplit(arr,2))

[[4 6 9 1 0 4 5 2 7 1]
 [2 0 5 1 9 3 0 6 9 4]]
hsplit: [array([[4, 6],
       [2, 0]]), array([[9, 1],
       [5, 1]]), array([[0, 4],
       [9, 3]]), array([[5, 2],
       [0, 6]]), array([[7, 1],
       [9, 4]])]
hsplit: [array([[4, 6],
       [2, 0]]), array([[9, 1, 0, 4, 5, 2],
       [5, 1, 9, 3, 0, 6]]), array([[7, 1],
       [9, 4]])]
vsplit: [array([[4, 6, 9, 1, 0, 4, 5, 2, 7, 1]]), array([[2, 0, 5, 1, 9, 3, 0, 6, 9, 4]])]


<font size = 5> <b> Basic Math </b> </font> <br>

We can perform the following basic mathematical functions on numpy arrays.

In [56]:
arr1=np.array([1,2,3,4])
arr2=np.array([2,4,6,8])
arr3=np.arange(16).reshape(4,4)

Addition, subtraction, multiplication and division

In [55]:
print(arr1+arr2)
print(arr1-arr2)
print(arr1*arr2)
print(arr1/arr2)

[ 3  6  9 12]
[-1 -2 -3 -4]
[ 2  8 18 32]
[0.5 0.5 0.5 0.5]


Summation

In [65]:
print("arr1:", arr1)
print("arr1.sum():", arr1.sum())
print("arr3:\n", arr3)
print("arr3.sum(axis=0)", arr3.sum(axis=0))
print("arr3.sum(axis=1)", arr3.sum(axis=1))

arr1: [1 2 3 4]
arr1.sum(): 10
arr3:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
arr3.sum(axis=0) [24 28 32 36]
arr3.sum(axis=1) [ 6 22 38 54]


Cumulative sum

In [67]:
print("arr1:\n", arr1)
print("arr1.cumsum()\n", arr1.cumsum())
print("arr3:\n", arr3)
print("arr3.cumsum(axis=0)\n", arr3.cumsum(axis=0))

arr1:
 [1 2 3 4]
arr1.cumsum()
 [ 1  3  6 10]
arr3:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
arr3.cumsum(axis=0)
 [[ 0  1  2  3]
 [ 4  6  8 10]
 [12 15 18 21]
 [24 28 32 36]]


Minimum and maximum.

Heuristic to remember axis directionality: Axis = 0 means apply function to each column separately, going row by row. Axis = 1 means apply function to each row separately, going column by column. <b>Axis = 0 means output matches up with a row, axis = 1 means output matches up with a column.</b> 

In [62]:
print(arr1.min())
print(arr3.max(axis=1))

1
[ 3  7 11 15]


Arithmetic with np functions

In [82]:
print("arr1:", arr1, "\narr2:", arr2)
print("np.add(arr1, arr2):", np.add(arr1, arr2))
print("np.subtract(arr1, arr2):", np.subtract(arr1, arr2))
print("np.multiply(arr1, arr2):", np.multiply(arr1, arr2))
print("np.divide(arr1, arr2):", np.divide(arr1, arr2))
print("np.remainder(arr1, arr2):", np.remainder(arr1, arr2)) #divide first input by second input and take remainder
print("np.power(arr1, arr2):", np.power(arr1, arr2))
print("np.sqrt(arr1):", np.sqrt(arr1))
print("np.sqrt(arr1):", np.sqrt(arr1))
print("np.cbrt(arr1):", np.cbrt(arr1))
print("np.absolute(-arr1):", np.absolute(-arr1))
print("np.exp(arr1):", np.exp(arr1))
print("np.log(arr1):", np.log(arr1)) #default base e
print("np.log2(arr1):", np.log2(arr1))
print("np.log10(arr1):", np.log10(arr1))
print("np.gcd.reduce([9,15,12]):", np.gcd.reduce([9,15,12]))
print("np.lcm.reduce([9,15,12]):", np.lcm.reduce([9,15,12]))
print("np.floor([0.2,1.2]):", np.floor([0.2,1.2]))
print("np.ceil([0.2,1.2]):", np.ceil([0.2,1.2]))

arr1: [1 2 3 4] 
arr2: [2 4 6 8]
np.add(arr1, arr2): [ 3  6  9 12]
np.subtract(arr1, arr2): [-1 -2 -3 -4]
np.multiply(arr1, arr2): [ 2  8 18 32]
np.divide(arr1, arr2): [0.5 0.5 0.5 0.5]
np.remainder(arr1, arr2): [1 2 3 4]
np.power(arr1, arr2): [    1    16   729 65536]
np.sqrt(arr1): [1.         1.41421356 1.73205081 2.        ]
np.sqrt(arr1): [1.         1.41421356 1.73205081 2.        ]
np.cbrt(arr1): [1.         1.25992105 1.44224957 1.58740105]
np.absolute(-arr1): [1 2 3 4]
np.exp(arr1): [ 2.71828183  7.3890561  20.08553692 54.59815003]
np.log(arr1): [0.         0.69314718 1.09861229 1.38629436]
np.log2(arr1): [0.        1.        1.5849625 2.       ]
np.log10(arr1): [0.         0.30103    0.47712125 0.60205999]
np.gcd.reduce([9,15,12]): 3
np.lcm.reduce([9,15,12]): 180
np.floor([0.2,1.2]): [0. 1.]
np.ceil([0.2,1.2]): [1. 2.]


Argmax and argmin

In [136]:
np.random.seed(0)
arr1 = np.random.randint(10, size=(5,3))
print("arr1:", arr1)
print("arr1.argmax(axis=0):", arr1.argmax(axis=0))
rowIndicies=arr1.argmax(axis=0)
print("rowIndicies:", rowIndicies)
colIndicies=arr1.argmax(axis=1)
print("colIndicies:", colIndicies)
print("Highest values in each column:", arr1[rowIndicies,np.arange(arr1.shape[1])]) #number of columns == length of a row
print("Highest values in each row:", arr1[np.arange(arr1.shape[0]), colIndicies]) #number of rows == length of a column

arr1: [[5 0 3]
 [3 7 9]
 [3 5 2]
 [4 7 6]
 [8 8 1]]
arr1.argmax(axis=0): [4 4 1]
rowIndicies: [4 4 1]
colIndicies: [0 2 1 1 0]
Highest values in each column: [8 8 9]
Highest values in each row: [5 9 5 7 8]


<font size = 5> <b> Stats functions </b> </font> <br>

We can use the following basic statistics functions.

In [142]:
arr=np.arange(3,10,2)
print("arr:", arr)
print("np.mean(arr):", np.mean(arr))
print("np.median(arr):", np.median(arr))
print("np.std(arr):", np.std(arr))
print("np.var(arr):", np.var(arr))

arr: [3 5 7 9]
np.mean(arr): 6.0
np.median(arr): 6.0
np.std(arr): 2.23606797749979
np.var(arr): 5.0


If our array contains missing data, we can use the following variations of the above functions

In [144]:
arr=np.array([1,2,3,np.nan])
print("arr:", arr)
print("np.nanmean(arr):", np.nanmean(arr))
print("np.nanmedian(arr):", np.nanmedian(arr))
print("np.nanstd(arr):", np.nanstd(arr))
print("np.nanvar(arr):", np.nanvar(arr))

arr: [ 1.  2.  3. nan]
np.nanmean(arr): 2.0
np.nanmedian(arr): 2.0
np.nanstd(arr): 0.816496580927726
np.nanvar(arr): 0.6666666666666666


Percentile

In [147]:
arr=np.arange(20)
print("arr:", arr)
print("50th percentile:", np.percentile(arr, 50))
print("20th percentile:", np.percentile(arr, 20))

arr: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
50th percentile: 9.5
20th percentile: 3.8000000000000003


Correlation coefficient with two input arrays

In [149]:
x=np.arange(10)
y=np.array([1,2,5,3,4,7,5,6,8,9])
np.corrcoef(x,y)[0,1]

0.9096563897839379

Correlation coefficient with multiple input arrays. Note that by default, the rows are treated as the variables, and rows are compared with eachother. By setting rowvar=False, we treat the columns as the variables, and compare columns with each other. For a stardard dataframe where each row is an observation and each column is a variable, we would want to set rowvar to False.

In [157]:
x=np.random.rand(3,4)
print("Input array:\n", x)
np.corrcoef(x, rowvar=False)

Input array:
 [[0.97645947 0.4686512  0.97676109 0.60484552]
 [0.73926358 0.03918779 0.28280696 0.12019656]
 [0.2961402  0.11872772 0.31798318 0.41426299]]


array([[1.        , 0.64362772, 0.73737479, 0.22595909],
       [0.64362772, 1.        , 0.99156879, 0.89097814],
       [0.73737479, 0.99156879, 1.        , 0.8246302 ],
       [0.22595909, 0.89097814, 0.8246302 , 1.        ]])