# Numpy is Fast

<iframe src="https://giphy.com/embed/mdfPpglf2c0QxED151" width="480" height="269" style="" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/xbox-game-xbox-series-x-s-mdfPpglf2c0QxED151">via GIPHY</a></p>

Numpy is a python module that allows us to speed up the computation of list like objects (arrays). Normally with python you would need to iterate through each element in a list - one at a time - performing the operation. With a numpy array we are able to take advantage of the [CPU vectorization of the list](https://superuser.com/questions/1170062/whats-the-difference-between-a-superscalar-and-a-vector-processor) performing the operation on many elements in the array at the same time. 

Reference: https://jakevdp.github.io/PythonDataScienceHandbook/02.03-computation-on-arrays-ufuncs.html#:~:text=For%20many%20types,much%20faster%20execution.

"Why is going fast important? I don't care if it takes me 1 second vs 5 seconds to perform an operation on a data set." 

A few seconds time difference on your data set may not seem like a big deal, however with the rise of "big data" mass amount of data is collected and is cleaned, augmented, or manipulated. Not to mention with larger data sets you may want to perform multiple operations, not just one. This can become even more important when we perform more complex computations we will discuss in the machine learning course.

## Prove it

In [1]:
some_values = list(range(1, 1001)) # list looks like [1, 2, ... 1000]
print(f"{some_values=}")

some_values=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 

In [2]:

def compute_reciprocals(values):
    output = []
    # for each element in values, find the reciprocal
    for i in range(len(values)):
        output.append( 1.0 / values[i] )
    return output

%timeit compute_reciprocals(some_values)

101 μs ± 19.3 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [3]:
some_values_reciprocals = compute_reciprocals(some_values)
print(f"{some_values_reciprocals[:5]=}")

some_values_reciprocals[:5]=[1.0, 0.5, 0.3333333333333333, 0.25, 0.2]


In [7]:
import numpy as np
numpy_array = np.array(some_values)

%timeit 1/numpy_array


ModuleNotFoundError: No module named 'numpy'

In [5]:
numpy_array_reciprocals = 1/numpy_array
print(numpy_array_reciprocals[:5]) # Verify we have the same output

[1.         0.5        0.33333333 0.25       0.2       ]


# Popular Operations

Below we will discuss popular operations used in numpy, however there are too many operations to cover in our limited time. It is great practice to skim through the documentation of numpy to determine what operations you could use on your datasets. 

Numpy Website: https://numpy.org/doc/stable/index.html
Numpy User Guide: https://numpy.org/doc/stable/user/index.html
Numpy API Reference: https://numpy.org/doc/stable/reference/index.html#reference


In [6]:
# Fill an array with random numbers
rand_arry = np.random.random(5)
print(rand_arry)

[0.49583673 0.57001672 0.62134419 0.08479882 0.83682817]


In [7]:
# Fill an array with 1s
ones_arry = np.ones([5])
print(ones_arry)
print(f"Type {type(ones_arry[0])}")

# Change the type of elements in the array
ones_int_arry = np.ones([5], dtype=int)
print(ones_int_arry)
print(f"Type {type(ones_int_arry[0])}")

[1. 1. 1. 1. 1.]
Type <class 'numpy.float64'>
[1 1 1 1 1]
Type <class 'numpy.int64'>


In [8]:
# Find an array with a range of numbers
range_arry = np.arange(0, 10, 2)
print(f"{range_arry=}")

# For fill an array with a specific number of steps
lin_arry = np.linspace(0, 10, 7)

# Set print options to help us see only 3 decimal places
np.set_printoptions(precision=3)
print(f"{lin_arry=}")
# Return print options to default
np.set_printoptions(precision=8)

range_arry=array([0, 2, 4, 6, 8])
lin_arry=array([ 0.   ,  1.667,  3.333,  5.   ,  6.667,  8.333, 10.   ])


<Token var=<ContextVar name='format_options' default={'edgeitems': 3, 'threshold': 1000, 'floatmode': 'maxprec', 'precision': 8, 'suppress': False, 'linewidth': 75, 'nanstr': 'nan', 'infstr': 'inf', 'sign': '-', 'formatter': None, 'legacy': 9223372036854775807, 'override_repr': None} at 0x00000214585A9580> at 0x000002146E896940>

In [9]:
# Operations
print(f"{ones_arry=}")
print(f"{range_arry=}")

add_arry = ones_arry + range_arry
sub_array = ones_arry - range_arry
mul_arry = ones_arry * range_arry
div_arry = ones_arry / range_arry

print("Add:", add_arry)
print("Subtract:", sub_array)
print("Multiply:", mul_arry)
print("Divide:", div_arry)


ones_arry=array([1., 1., 1., 1., 1.])
range_arry=array([0, 2, 4, 6, 8])
Add: [1. 3. 5. 7. 9.]
Subtract: [ 1. -1. -3. -5. -7.]
Multiply: [0. 2. 4. 6. 8.]
Divide: [       inf 0.5        0.25       0.16666667 0.125     ]


  div_arry = ones_arry / range_arry


# Multi-dimensional arrays
Also known as a matrix... actually these are all technically matrices. 

In [10]:
multidim_list = [
    [0, 1, 2],
    [3, 4, 5], 
    [6, 7, 8]
]
multidim_array = np.array([multidim_list])
print(f"multidim_array\n {multidim_array}")

# We can check the size and shape too

print(f"{multidim_array.size=}")
print(f"{multidim_array.shape=}")


multidim_array
 [[[0 1 2]
  [3 4 5]
  [6 7 8]]]
multidim_array.size=9
multidim_array.shape=(1, 3, 3)


In [11]:
# We can also perform operations, these are scalar operations
print(f"Adding: \n{multidim_array + 4}")
print(f"Subtracting: \n{multidim_array - 4}")
print(f"Multiplying: \n{multidim_array * 4}")
print(f"Dividing: \n{multidim_array / 4}")

Adding: 
[[[ 4  5  6]
  [ 7  8  9]
  [10 11 12]]]
Subtracting: 
[[[-4 -3 -2]
  [-1  0  1]
  [ 2  3  4]]]
Multiplying: 
[[[ 0  4  8]
  [12 16 20]
  [24 28 32]]]
Dividing: 
[[[0.   0.25 0.5 ]
  [0.75 1.   1.25]
  [1.5  1.75 2.  ]]]


In [12]:
# And we can do operations between matrices too
print(f"Multiply as if scalar:\n {multidim_array * multidim_array} ")

# Sometimes we need special operations, which we can get from numpy
print(f"Multiply as if scalar:\n {np.multiply(multidim_array, multidim_array)}")
print(f"Matrix Multiplication:\n {np.matmul(multidim_array, multidim_array)}")
print(f"Cross Product:\n {np.cross(multidim_array, multidim_array)}") # Parallel



Multiply as if scalar:
 [[[ 0  1  4]
  [ 9 16 25]
  [36 49 64]]] 
Multiply as if scalar:
 [[[ 0  1  4]
  [ 9 16 25]
  [36 49 64]]]
Matrix Multiplication:
 [[[ 15  18  21]
  [ 42  54  66]
  [ 69  90 111]]]
Cross Product:
 [[[0 0 0]
  [0 0 0]
  [0 0 0]]]


In [13]:
# Sometimes we want to access the array
print(f"{multidim_array.shape=}")
print(f"{multidim_array[0]}")
print(f"{multidim_array[0].shape=}")

# Grab the index 1 row of the matrix
print(f"Row Index 1: {multidim_array[0][1]}") # In our array, grab the index row

# Or grab a column
print(f"Column Index 2: {multidim_array[0][:, 2]}") # In our array, for each row, grab the second index column value

multidim_array.shape=(1, 3, 3)
[[0 1 2]
 [3 4 5]
 [6 7 8]]
multidim_array[0].shape=(3, 3)
Row Index 1: [3 4 5]
Column Index 2: [2 5 8]


In [16]:
 # Write to a file

exp_base_array = np.arange(0, 50, 1).reshape(50, 1)
exp_transformed_array = exp_base_array ** 3
combined_array = np.concatenate((exp_base_array, exp_transformed_array), axis=1)
np.savetxt("np_exp.csv", combined_array, delimiter=',', fmt="%.f") # seperate each row by a comma, and format the float where there are no decimal places
print(combined_array)


[[     0      0]
 [     1      1]
 [     2      8]
 [     3     27]
 [     4     64]
 [     5    125]
 [     6    216]
 [     7    343]
 [     8    512]
 [     9    729]
 [    10   1000]
 [    11   1331]
 [    12   1728]
 [    13   2197]
 [    14   2744]
 [    15   3375]
 [    16   4096]
 [    17   4913]
 [    18   5832]
 [    19   6859]
 [    20   8000]
 [    21   9261]
 [    22  10648]
 [    23  12167]
 [    24  13824]
 [    25  15625]
 [    26  17576]
 [    27  19683]
 [    28  21952]
 [    29  24389]
 [    30  27000]
 [    31  29791]
 [    32  32768]
 [    33  35937]
 [    34  39304]
 [    35  42875]
 [    36  46656]
 [    37  50653]
 [    38  54872]
 [    39  59319]
 [    40  64000]
 [    41  68921]
 [    42  74088]
 [    43  79507]
 [    44  85184]
 [    45  91125]
 [    46  97336]
 [    47 103823]
 [    48 110592]
 [    49 117649]]


In [15]:
# read from a file

read_data = np.loadtxt("np_exp.csv", delimiter=',', dtype=np.uint32)
print(read_data)

[[     0      0]
 [     1      1]
 [     2      8]
 [     3     27]
 [     4     64]
 [     5    125]
 [     6    216]
 [     7    343]
 [     8    512]
 [     9    729]
 [    10   1000]
 [    11   1331]
 [    12   1728]
 [    13   2197]
 [    14   2744]
 [    15   3375]
 [    16   4096]
 [    17   4913]
 [    18   5832]
 [    19   6859]
 [    20   8000]
 [    21   9261]
 [    22  10648]
 [    23  12167]
 [    24  13824]
 [    25  15625]
 [    26  17576]
 [    27  19683]
 [    28  21952]
 [    29  24389]
 [    30  27000]
 [    31  29791]
 [    32  32768]
 [    33  35937]
 [    34  39304]
 [    35  42875]
 [    36  46656]
 [    37  50653]
 [    38  54872]
 [    39  59319]
 [    40  64000]
 [    41  68921]
 [    42  74088]
 [    43  79507]
 [    44  85184]
 [    45  91125]
 [    46  97336]
 [    47 103823]
 [    48 110592]
 [    49 117649]]


References: https://jakevdp.github.io/PythonDataScienceHandbook/02.03-computation-on-arrays-ufuncs.html

Numpy Website: https://numpy.org/doc/stable/index.html

Numpy API: https://numpy.org/doc/stable/reference/index.html

Topics to explore: 
- Masking Numpy Arrays
- Broadcasting Arrays
- Structured Arrays