# Numpy

## Introduction

Numpy stands for Numerical Python. The package comes with a lot of functionality, but the core is the `array` datatype.

You can think of `np.array` objects as being similar to lists, but focused on maths rather than being collections of objects. Because of this, `np.array` objects MUST have the same datatype (unlike lists).

## Basic Numpy Arrays

In [6]:
import numpy as np

vec1 = np.array([7, 16, 4, 3])
vec2 = np.array([2, 7, 3, '5'])
print(vec1)
print(vec2)
print(vec1 + vec2.astype(float))
# print(vec1 + vec2)

[ 7 16  4  3]
['2' '7' '3' '5']
[ 9. 23.  7.  8.]


## Arrays from Lists - syntax comparison

Converting lists to `np.array` objects is easy!

In [8]:
costs = [800.00, 250.50, 101.90]
np_costs = np.array(costs)
print(np_costs)
print(type(np_costs))

[800.  250.5 101.9]
<class 'numpy.ndarray'>


Maths works as you would expect it to for mathematical vectors.

In [10]:
usd_costs = np_costs*4.12
print(usd_costs)
with_fixed_cost = np_costs + 3000.00
print(with_fixed_cost)

print(np_costs + np_costs)  # Numpy array
print(costs + costs)  # Normal list

[3296.    1032.06   419.828]
[3800.  3250.5 3101.9]
[1600.   501.   203.8]
[800.0, 250.5, 101.9, 800.0, 250.5, 101.9]


Indexing works similar to that of lists.

In [22]:
import random
weights = [random.randrange(48, 96) for value in range(11)] 
# for value in range() = will have 11 loops
# random.ramge = function to generate a random integer 
weights = np.array(weights)
# to convert Python list into a NumPy array
print(weights)
print(weights[4])
print(weights[:4])
print(weights[3:-6]) # including index [3] , but not including index [-6]
print(weights[8:])
print(weights[::2]) # has a step of 2

[55 77 57 66 51 49 63 87 61 56 77]
51
[55 77 57 66]
[66 51]
[61 56 77]
[55 57 51 63 61 77]


However, you can also use boolean indexing. This means you provide a list/array of boolean values, and those values which are `True` are returned as the result of the slicing.

In [26]:
heavy = weights > 80
print(heavy)
print(weights[heavy])
print(weights[np.invert(heavy)])
# Use 'help' to figure out what np.invert does!

help(np.invert)
# For int: np.invert() = flips all bits if each int element in array
# For boolean: np.invert() = flips 'True' into 'False'; 'False' to 'True'

[False False False False False False False  True False False False]
[87]
[55 77 57 66 51 49 63 61 56 77]
Help on ufunc:

invert = <ufunc 'invert'>
    invert(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature, extobj])
    
    Compute bit-wise inversion, or bit-wise NOT, element-wise.
    
    Computes the bit-wise NOT of the underlying binary representation of
    the integers in the input arrays. This ufunc implements the C/Python
    operator ``~``.
    
    For signed integer inputs, the two's complement is returned.  In a
    two's-complement system negative numbers are represented by the two's
    complement of the absolute value. This is the most common method of
    representing signed integers on computers [1]_. A N-bit
    two's-complement system can represent every integer in the range
    :math:`-2^{N-1}` to :math:`+2^{N-1}-1`.
    
    Parameters
    ----------
    x : array_like
        Only integer and boolean types are han

## Multi-dimensional Numpy arrays

Numpy arrays can have more than 1 dimension (that's why they're nd-arrays, where n can be any number).

In [28]:
weights = [random.randrange(48, 96) for value in range(11)]
heights = [random.randrange(140, 190) for value in range(11)]
combined = list(zip(heights, weights))
combined = np.array(combined)
print(combined)
print(combined.shape)

[[159  53]
 [153  63]
 [146  83]
 [142  83]
 [185  53]
 [169  72]
 [179  48]
 [144  74]
 [174  56]
 [159  95]
 [172  69]]
(11, 2)


With multiple dimensions we need multiple indices (plural of index).

In [40]:
print(combined[5])
print('\n')
print(combined[5,:]) # pring 5th row, all column
print('\n')
print(combined[5:7,:])
print('\n')
print(combined[5:7,0]) # '0' = In this case, this is the index for column, so only selecting first column
print('\n')
print(combined[:,0]) # print all the row, 1st column
print('\n')
print(heights)

[169  72]


[169  72]


[[169  72]
 [179  48]]


[169 179]


[159 153 146 142 185 169 179 144 174 159 172]


[159, 153, 146, 142, 185, 169, 179, 144, 174, 159, 172]


We can also do maths on these multi-dimensional arrays.

In [46]:
print(combined/1.2)
print('\n')
print(combined**2)
print('\n')
print(combined - 50)

[[132.5         44.16666667]
 [127.5         52.5       ]
 [121.66666667  69.16666667]
 [118.33333333  69.16666667]
 [154.16666667  44.16666667]
 [140.83333333  60.        ]
 [149.16666667  40.        ]
 [120.          61.66666667]
 [145.          46.66666667]
 [132.5         79.16666667]
 [143.33333333  57.5       ]]


[[25281  2809]
 [23409  3969]
 [21316  6889]
 [20164  6889]
 [34225  2809]
 [28561  5184]
 [32041  2304]
 [20736  5476]
 [30276  3136]
 [25281  9025]
 [29584  4761]]


[[109   3]
 [103  13]
 [ 96  33]
 [ 92  33]
 [135   3]
 [119  22]
 [129  -2]
 [ 94  24]
 [124   6]
 [109  45]
 [122  19]]


In [64]:
import numpy as np

conversion = [0.0328084, 2.20462262]  # cm->feet, kg->lb
print(combined)
print('\n')

new_value = combined*conversion
round_value = np.around(new_value,2)
print (round_value)


[[159  53]
 [153  63]
 [146  83]
 [142  83]
 [185  53]
 [169  72]
 [179  48]
 [144  74]
 [174  56]
 [159  95]
 [172  69]]


[[  5.22 116.84]
 [  5.02 138.89]
 [  4.79 182.98]
 [  4.66 182.98]
 [  6.07 116.84]
 [  5.54 158.73]
 [  5.87 105.82]
 [  4.72 163.14]
 [  5.71 123.46]
 [  5.22 209.44]
 [  5.64 152.12]]


## Basic Stats with Numpy

Obviously, one of the more interesting mathematical things to do when you have arrays is calculate some basic stats.

In [66]:
print(combined)
print(np.mean(combined))  # Does this make sense?

[[159  53]
 [153  63]
 [146  83]
 [142  83]
 [185  53]
 [169  72]
 [179  48]
 [144  74]
 [174  56]
 [159  95]
 [172  69]]
115.04545454545455


Just like with anything else to do with `nd-array` objects, stats would probably benefit from being more specific...

In [70]:
print(combined)
print('\n')
print(np.mean(combined[:,0])) # '0' = In this case, this is the index for column, so only selecting first column
print('\n')
print(np.mean(combined[:,1])) # '1' = In this case, this is the index for column, so only selecting second column
print('\n')
heights = combined[:,0]
print(np.max(heights), np.median(heights), np.min(heights), np.std(heights))
print('\n')
weights = combined[:,1]
print(np.corrcoef(heights, weights))

[[159  53]
 [153  63]
 [146  83]
 [142  83]
 [185  53]
 [169  72]
 [179  48]
 [144  74]
 [174  56]
 [159  95]
 [172  69]]


162.0


68.0909090909091


185 159.0 142 14.109957799047773


[[ 1.         -0.64407286]
 [-0.64407286  1.        ]]


You can even filter out certain values and do your simple stats calculations on the results.

In [72]:
heavy = weights > 80
print(weights)
print(weights[heavy])
print(np.mean(weights))
print(np.mean(weights[heavy]))

[53 63 83 83 53 72 48 74 56 95 69]
[83 83 95]
68.0909090909091
87.0
