# Numpy

## Introduction

Numpy stands for Numerical Python. The package comes with a lot of functionality, but the core is the `array` datatype.

You can think of `np.array` objects as being similar to lists, but focused on maths rather than being collections of objects. Because of this, `np.array` objects MUST have the same datatype (unlike lists).

## Basic Numpy Arrays

In [None]:
import numpy as np

vec1 = np.array([7, 16, 4, 3])
vec2 = np.array([2, 7, 3, '5'])
print(vec1)
print(vec2)
print(vec1 + vec2.astype(float))
print(vec1 + vec2)

[ 7 16  4  3]
['2' '7' '3' '5']
[ 9. 23.  7.  8.]


UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('int64'), dtype('<U21')) -> None

## Arrays from Lists - syntax comparison

Converting lists to `np.array` objects is easy!

In [None]:
costs = [800.00, 250.50, 101.90]
np_costs = np.array(costs)
print(np_costs)
print(type(np_costs))

[800.  250.5 101.9]
<class 'numpy.ndarray'>


Maths works as you would expect it to for mathematical vectors.

In [None]:
usd_costs = np_costs*4.12
print(usd_costs)
with_fixed_cost = np_costs + 3000.00
print(with_fixed_cost)

print(np_costs + np_costs)  # Numpy array
print(costs + costs)  # Normal list

[3296.    1032.06   419.828]
[3800.  3250.5 3101.9]
[1600.   501.   203.8]
[800.0, 250.5, 101.9, 800.0, 250.5, 101.9]


Indexing works similar to that of lists.

random.randrange(48, 96) generates a random integer between 48 and 95 (inclusive of 48, exclusive of 96) for each iteration of the loop.
The loop runs 11 times (range(11)), so the list comprehension creates a list with 11 elements, each being a randomly chosen integer from the specified range.

If your primary concern is performance or if you are performing numerical computations on your data (like matrix operations, statistical analysis, etc.), then using numpy arrays might be beneficial. However, for general purposes such as printing elements or basic data manipulations like slicing, Python lists are perfectly adequate and more straightforward.

In [None]:
import random
weights = [random.randrange(48, 96) for value in range(11)]
weights = np.array(weights)
print(weights)
print(weights[0])
print(weights[-1])
print(weights[4])
print(weights[:4])#This command prints the elements from the beginning of the list up to, but not including, index 4.
print(weights[3:-6]) #This command prints the elements from index 3 up to, but not including, index -6. In Python, negative indices count from the end of the list.
print(weights[:-1]) #slicing this list to exclude the last element.
print(weights[8:])#This command prints the elements from index 8 to the end of the list.
print(weights[::2])#This command prints every second element of the list starting from the first element in the list.
print(weights[1:3])# including first element but excluding 3rd
print(weights[2:8:3])#This will give every third element from index 2 to 7. (not including 8)
print(weights[2:9:3]) # specifies a slice that starts at index 2, ends at index 9 (exclusive), and takes steps of 3 indices at a time.

[57 88 53 48 72 68 53 55 77 80 69]
57
69
72
[57 88 53 48]
[48 72]
[57 88 53 48 72 68 53 55 77 80]
[77 80 69]
[57 53 72 53 77 69]
[88 53]
[53 68]
[53 68 77]


However, you can also use boolean indexing. This means you provide a list/array of boolean values, and those values which are `True` are returned as the result of the slicing.

In [None]:
heavy = weights > 80
print(heavy)
print(weights[heavy])
print(weights[np.invert(heavy)])
# Use 'help' to figure out what np.invert does! It Computes the bit-wise NOT operation on each element of the input array.

[False  True False False False False False False False False False]
[88]
[57 53 48 72 68 53 55 77 80 69]


## Multi-dimensional Numpy arrays

Numpy arrays can have more than 1 dimension (that's why they're nd-arrays, where n can be any number).

In [None]:
weights = [random.randrange(48, 96) for value in range(11)]
heights = [random.randrange(140, 190) for value in range(11)]
print(weights)
print(heights)
combined = list(zip(heights, weights))
print(combined)
combined = np.array(combined)
print(combined)
print(combined.shape)

[54, 89, 75, 55, 58, 75, 76, 61, 59, 92, 81]
[175, 167, 188, 187, 160, 187, 148, 184, 181, 162, 189]
[(175, 54), (167, 89), (188, 75), (187, 55), (160, 58), (187, 75), (148, 76), (184, 61), (181, 59), (162, 92), (189, 81)]
[[175  54]
 [167  89]
 [188  75]
 [187  55]
 [160  58]
 [187  75]
 [148  76]
 [184  61]
 [181  59]
 [162  92]
 [189  81]]
(11, 2)


In [None]:
import random
weights = [random.randrange(48, 96) for value in range(11)]
heights = [random.randrange(140, 190) for value in range(11)]

combined = list(zip(heights, weights))
print(type(combined))
print(combined)


<class 'list'>
[(151, 73), (160, 65), (166, 54), (143, 72), (162, 62), (153, 54), (179, 87), (172, 64), (161, 69), (167, 77), (174, 87)]


With multiple dimensions we need multiple indices (plural of index).

In [None]:
print(combined[3]) #This command accesses the element at index 3 of the NumPy array combined, which corresponds to the fourth row.
print(combined[3,:]) #This command accesses the entire row at index 3 of the NumPy array combined.
print(combined[3,0]) #This command accesses the element at row 3 and column 0 of the NumPy array combined (array[row,column])
print(combined[3,1]) #This command accesses the element at row 3 and column 1 of the NumPy array combined.
print(combined[3:7]) #This command accesses a slice of rows from index 3 to 6 (inclusive) of the NumPy array combined.
print(combined[3:7,:]) #This command accesses a slice of rows from index 3 to 6 (inclusive) and all columns of the NumPy array combined.
print(combined[3:7,0]) #This command accesses a slice of rows from index 3 to 6 (inclusive) and only the first column of the NumPy array combined
print(combined[:,0]) #This command accesses the first column of all rows in the NumPy array combined.
print(heights)

[187  55]
[187  55]
187
55
[[187  55]
 [160  58]
 [187  75]
 [148  76]]
[[187  55]
 [160  58]
 [187  75]
 [148  76]]
[187 160 187 148]
[175 167 188 187 160 187 148 184 181 162 189]
[175, 167, 188, 187, 160, 187, 148, 184, 181, 162, 189]


We can also do maths on these multi-dimensional arrays.

In [None]:
print(combined/1.2)
print(combined**2)
print(combined - 50)

[[145.83333333  45.        ]
 [139.16666667  74.16666667]
 [156.66666667  62.5       ]
 [155.83333333  45.83333333]
 [133.33333333  48.33333333]
 [155.83333333  62.5       ]
 [123.33333333  63.33333333]
 [153.33333333  50.83333333]
 [150.83333333  49.16666667]
 [135.          76.66666667]
 [157.5         67.5       ]]
[[30625  2916]
 [27889  7921]
 [35344  5625]
 [34969  3025]
 [25600  3364]
 [34969  5625]
 [21904  5776]
 [33856  3721]
 [32761  3481]
 [26244  8464]
 [35721  6561]]
[[125   4]
 [117  39]
 [138  25]
 [137   5]
 [110   8]
 [137  25]
 [ 98  26]
 [134  11]
 [131   9]
 [112  42]
 [139  31]]


In [None]:
conversion = [0.0328084, 2.20462262]  # cm->feet, kg->lb
print(combined)
print(combined*conversion)

[[175  54]
 [167  89]
 [188  75]
 [187  55]
 [160  58]
 [187  75]
 [148  76]
 [184  61]
 [181  59]
 [162  92]
 [189  81]]
[[  5.74147    119.04962148]
 [  5.4790028  196.21141318]
 [  6.1679792  165.3466965 ]
 [  6.1351708  121.2542441 ]
 [  5.249344   127.86811196]
 [  6.1351708  165.3466965 ]
 [  4.8556432  167.55131912]
 [  6.0367456  134.48197982]
 [  5.9383204  130.07273458]
 [  5.3149608  202.82528104]
 [  6.2007876  178.57443222]]


## Basic Stats with Numpy

Obviously, one of the more interesting mathematical things to do when you have arrays is calculate some basic stats.

In [None]:
print(combined)
print(np.mean(combined))  # Does this make sense? it calculates the mean of all these values together.

[[175  54]
 [167  89]
 [188  75]
 [187  55]
 [160  58]
 [187  75]
 [148  76]
 [184  61]
 [181  59]
 [162  92]
 [189  81]]
122.86363636363636


Just like with anything else to do with `nd-array` objects, stats would probably benefit from being more specific...

In [None]:
print(combined)
print(np.mean(combined[:,0]))
print(np.mean(combined[:,1]))
heights = combined[:,0]
print(np.max(heights), np.median(heights), np.min(heights), np.std(heights))
weights = combined[:,1]
print(np.corrcoef(heights, weights)) #computes the Pearson correlation coefficient between two arrays, heights and weights

[[175  54]
 [167  89]
 [188  75]
 [187  55]
 [160  58]
 [187  75]
 [148  76]
 [184  61]
 [181  59]
 [162  92]
 [189  81]]
175.27272727272728
70.45454545454545
189 181.0 148 13.33546814865536
[[ 1.         -0.24496512]
 [-0.24496512  1.        ]]


You can even filter out certain values and do your simple stats calculations on the results.

In [None]:
heavy = weights > 80
print(weights)
print(weights[heavy])
print(np.mean(weights))
print(np.mean(weights[heavy]))

[54 89 75 55 58 75 76 61 59 92 81]
[89 92 81]
70.45454545454545
87.33333333333333
