# Introduction to NumPy

## What is NumPy?

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

## Why NumPy?

Put simply, it's fast at performing numeric functions. This is due to is being written in C.

To speed up calculations, NumPy uses vectorisation via broadcasting. In English, it avoids using loops as that can slow down processing time, especially with large datasets.

Finally, NumPy is also the backbone for other Python scientific packages. 

## NumPy DataTypes and Attributes

In [1]:
# --- Import NumPy amd pandas (needed for later on)
import numpy as np
import pandas as pd

In [2]:
# --- NumPy uses ndarray (n-dimensional array) for its main datatype.
# --- Create a simple one-dimensional array, also called a vector.
# --- Note: This has a shape of 1,3 (one row, three columns):
sample_array_1 = np.array([1,2,3])
sample_array_1

array([1, 2, 3])

In [3]:
# --- Create a two-dimensional array.
# --- Note 1: This has a shape of 2,3 (two rows, three columns).
# --- Note 2: As there is a float in the array, all the numbers will be converted to float:
sample_array_2 = np.array([[1, 2.0, 3.3],
                           [4, 5, 6.5]])
sample_array_2

array([[1. , 2. , 3.3],
       [4. , 5. , 6.5]])

In [4]:
# --- Create a multi-dimensional array.
# --- Note: This has a shape of 2, 3,  (two matrix's deep, three rows and three columns per matrix):
sample_array_3 = np.array([[[1, 2, 3],
                           [4, 5, 6],
                           [7, 8, 9]],
                          [[10, 11, 12],
                           [13, 14, 15],
                           [16, 17, 18]]])
sample_array_3

array([[[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9]],

       [[10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]]])

In [5]:
# --- Show the shape and size of each sample array:
print(f"sample array 1. shape: {sample_array_1.shape}, size: {sample_array_1.size}")
print(f"sample array 2. shape: {sample_array_2.shape}, size: {sample_array_2.size}")
print(f"sample array 3. shape: {sample_array_3.shape}, size: {sample_array_3.size}")

sample array 1. shape: (3,), size: 3
sample array 2. shape: (2, 3), size: 6
sample array 3. shape: (2, 3, 3), size: 18


In [6]:
# --- Show the number of dimensions for each sample array:
sample_array_1.ndim, sample_array_2.ndim, sample_array_3.ndim

(1, 2, 3)

In [7]:
# --- Create a pandas dataframe from an ndarray:
sample_df_2 = pd.DataFrame(sample_array_2)
sample_df_2


Unnamed: 0,0,1,2
0,1.0,2.0,3.3
1,4.0,5.0,6.5


## Creating NumPy Arrays

In [8]:
# --- Create a 2 x 3 ndarray with values of 1 using the ones function.
# --- Note: The default datatype for each 1 is float64 so they will be 1. instead.
# --- You can change that with dtype = int:
ones = np.ones(shape=(1, 3))
ones

array([[1., 1., 1.]])

In [9]:
# --- Create a 2 x 3 ndarray with values of 1 using the zeros function.
# --- Note: The default datatype for each 0 is float64 so they will be 0. instead.
# --- You can change that with dtype = int:
zeros = np.zeros(shape=(2, 3))
zeros

array([[0., 0., 0.],
       [0., 0., 0.]])

In [10]:
# --- Create an ndarray with a range starting at 0, upto 10 and increment in 2:
range_array = np.arange(0, 10, 2)
range_array

array([0, 2, 4, 6, 8])

In [11]:
# --- Create an ndarray with 3 rows and five random integers per row:
random_array = np.random.randint(low = 0, high = 100, size = (3, 5))
print(random_array)
print(f"\nrandom_array size: {random_array.size}\nrandom_array shape: {random_array.shape}")


[[21 28 50 57 42]
 [56 97 92 29 20]
 [51 71 53 17  3]]

random_array size: 15
random_array shape: (3, 5)


Note: NumPy random numbers are Pseudo-random numbers. In short, it's random to us but not to a computer.

You can set the random number generators in NumPy to have a base starting point so that they start at the same point using the `np.random.seed()` function.

By default, the random.seed() is set to None. This will mean that each time a random number generator function is called, the seed will have a random value that will then generate a random number.

If you set a value in the seed function, each time a random number generator is run, it will generate the same numbers as before as the starting point will always be the same:

In [12]:
print(np.random.seed())

None


In [13]:
# --- Set the seed value to None and generate a random array of ints.
np.random.seed(seed=None)
random_array_seed_1 = np.random.randint(0, 10, size=(3, 5))
print(f"random_array_seed_1\n{random_array_seed_1}")

# --- Set the seed value to 1 and generate an array wind random ints:
np.random.seed(seed=1)
random_array_seed_2 = np.random.randint(0, 10, size=(3, 5))
print(f"\nrandom_array_seed_2\n{random_array_seed_2}")

# --- The result should be this each time:
# [[5 8 9 5 0]
# [0 1 7 6 9]
# [2 4 5 2 4]]

# --- Note: random.seed() only applies to the cell in Jupyter notebooks that it was run in.

random_array_seed_1
[[0 5 5 3 5]
 [9 5 0 3 9]
 [7 0 2 0 6]]

random_array_seed_2
[[5 8 9 5 0]
 [0 1 7 6 9]
 [2 4 5 2 4]]


## Viewing Arrays and Matrices

In [14]:
# --- Show the unique numbers in an ndarray:
np.unique(random_array_seed_2)

array([0, 1, 2, 4, 5, 6, 7, 8, 9])

In [15]:
# --- Show the first item in a 1-D list:
sample_array_1[0]

1

In [16]:
# --- Show the first item of the first row in a 2-D matrix:
sample_array_2[0,0]

1.0

In [17]:
# --- Show the first item of the first row in a 3-D matrix:
# --- 0, 0, 0 = z (depth) x (row) y (column)
sample_array_3[0,0,0]

1

In [18]:
# --- Using the the first two matrices (:2), from the first three rows (:3) of each of the 
# --- two matrices, get the first two numbers (:2).
# --- This is using Python splits(:):
sample_array_3[:2, :3, :2]

array([[[ 1,  2],
        [ 4,  5],
        [ 7,  8]],

       [[10, 11],
        [13, 14],
        [16, 17]]])

## Manipulating Arrays

### Arithmatic

In [19]:
# --- View the arrays we will work with
print(f"Ones Array:     {ones}")
print(f"Sample Array 1: {sample_array_1}\n")
print(f"Sample Array 2:\n{sample_array_2}\n")
print(f"Sample Array 3:\n{sample_array_3}")


Ones Array:     [[1. 1. 1.]]
Sample Array 1: [1 2 3]

Sample Array 2:
[[1.  2.  3.3]
 [4.  5.  6.5]]

Sample Array 3:
[[[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]]

 [[10 11 12]
  [13 14 15]
  [16 17 18]]]


In [20]:
# --- Perform some basic maths on with two 1-D arrays:
print(f"Addition:    {sample_array_1 + ones}")
print(f"Subtraction: {sample_array_1 - ones}")
print(f"Multiplied:  {sample_array_1 * ones}")
print(f"Squared By:  {sample_array_1 ** 2}")
print(f"Divided By:  {sample_array_1 / ones}")

Addition:    [[2. 3. 4.]]
Subtraction: [[0. 1. 2.]]
Multiplied:  [[1. 2. 3.]]
Squared By:  [1 4 9]
Divided By:  [[1. 2. 3.]]


In [21]:
# --- Perform some basic maths on with one 1-D array and a 2-D array.
# --- To mix things up, use the built-in numpy functions for arithmetic this time.
# --- What will happen is that all of the rows in sample_array_2 will be acted on by the values
# --- in ones where each column position is matched. This is called shape broadcasting:
print(f"Addition:    {np.add(sample_array_2, ones)}")
print(f"Subtraction: {np.subtract(sample_array_2, ones)}")
print(f"Multiplied:  {np.multiply(sample_array_2, ones)}")
print(f"Squared By:  {np.square(sample_array_2)}")
print(f"Divided By:  {np.divide(sample_array_2, ones)}")

Addition:    [[2.  3.  4.3]
 [5.  6.  7.5]]
Subtraction: [[0.  1.  2.3]
 [3.  4.  5.5]]
Multiplied:  [[1.  2.  3.3]
 [4.  5.  6.5]]
Squared By:  [[ 1.    4.   10.89]
 [16.   25.   42.25]]
Divided By:  [[1.  2.  3.3]
 [4.  5.  6.5]]


In [22]:
# --- Try adding a 2-D array to a 3-D array:
# --- This will not work due to their sizes being different on the z-axis.
# --- This breaks broadcasting rules. This fix is to have the shapes match.
print(f"Addition:    {sample_array_3 + sample_array_2}")

ValueError: operands could not be broadcast together with shapes (2,3,3) (2,3) 

In [None]:
# --- To get the array shapes to match, you can either recreate one to match the other
# --- or use the reshape function. For example, reshape sample_array_2 as a new array:
sample_array_4 = sample_array_2.reshape(2,1,3)

# --- Let's try adding array 4 to array 3:
print(f"Addition:\n{sample_array_3 + sample_array_4}")


Addition:
[[[ 2.   4.   6.3]
  [ 5.   7.   9.3]
  [ 8.  10.  12.3]]

 [[14.  16.  18.5]
  [17.  19.  21.5]
  [20.  22.  24.5]]]


### Aggregation

Aggregation is grouping items up and performing the same operation on each item in the group.

In [31]:
# --- Sum the values of sample_array_1 with NumPy
print(sum(sample_array_1))
print(np.sum(sample_array_1))

6
6


Whilst you can use sum() on NumPy ndarrays, it is recommended to do the following:
- For python datatypes (dicts, lists, sets and tuples), use python methods (`sum()`).
- For NumPy datatypes (ndarrays), use NumPy methods (`np.sum()`).

In [58]:
# --- Create an array with 10,000 random numbers and show the first 10 items:
sample_array_4 = np.random.randint(low = 1, high = 100, size = 10000)
sample_array_4[:10]

array([37, 37, 93, 16, 34, 18, 37, 13, 38, 66])

In [59]:
# --- Check the time (%timeit) it takes each method to run the same aggregation function.
# --- Spoilers: NumPy is WAAAAAAY faster:
%timeit sum(sample_array_4)
%timeit np.sum(sample_array_4)

612 µs ± 4.12 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
7.49 µs ± 60.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [60]:
# --- Additional examples of NumPy aggregate methods:
print(f"Mean:    {np.mean(sample_array_4)}")
print(f"Median:  {np.median(sample_array_4)}")
print(f"Min:     {np.min(sample_array_4)}")
print(f"Max:     {np.max(sample_array_4)}")
print(f"Var:     {np.var(sample_array_4)}")
print(f"Std Dev: {np.std(sample_array_4)}")


Mean:    49.5591
Median:  49.0
Min:     1
Max:     99
Var:     826.13350719
Std Dev: 28.74253828717986


### Variance (np.var)

Variance is the measure of the average degree to which each number is different to the mean.
Higher variance = Wider range of numbers
Lower variance = Lower range of numbers

### Standard Deviation (np.std)

Standard deviation is the measure of how spread out a group of numbers are from the mean.

Another way to put it, the standard deviation is the square root of the variance.

In [61]:
# --- Just to show that the above is correct, show the std dev and the sqrt of var for
# --- sample_array_5:
print(f"Std Dev:  {np.std(sample_array_4)}")
print(f"Sqrt Var: {np.sqrt(np.var(sample_array_4))}")

Std Dev:  28.74253828717986
Sqrt Var: 28.74253828717986


### Reshape

Reshape allows you to change the shape of an existing array to whatever size you need, as long at the data will fit into it.

In [62]:
# --- Try adding a 2-D array to a 3-D array:
# --- This will not work due to their sizes being different on the z-axis.
# --- This breaks broadcasting rules. This fix is to have the shapes match.
print(f"Addition:    {sample_array_3 + sample_array_2}")

ValueError: operands could not be broadcast together with shapes (2,3,3) (2,3) 

In [68]:
# --- To get the array shapes to match, you can either recreate one to match the other
# --- or use the reshape function. For example, reshape sample_array_2 as a new array:
sample_array_5 = sample_array_2.reshape(2,1,3)
print(f"Array 5 Shape: {sample_array_5.shape}\n")
print(f"Array 5 Reshaped from Array 2:\n{sample_array_5}\n\n")

# --- Let's try adding array 4 to array 3:
print(f"Add Array 3 to Array 5:\n{sample_array_3 + sample_array_5}")

Array 5 Shape: (2, 1, 3)

Array 5 Reshaped from Array 2:
[[[1.  2.  3.3]]

 [[4.  5.  6.5]]]


Add Array 3 to Array 5:
[[[ 2.   4.   6.3]
  [ 5.   7.   9.3]
  [ 8.  10.  12.3]]

 [[14.  16.  18.5]
  [17.  19.  21.5]
  [20.  22.  24.5]]]


### Transpose

The transpose method will simply reverse the shape of an array. For example, if an array has a shape of 2, 3, running transpose on it will change the shape to 3, 2.

In [75]:
# --- Have a look at sample array 3:
print(sample_array_3)
print(sample_array_3.shape)

# --- To transpose, you can use either transpose() or simply T.
# --- The arrays shape will now be reversed:
print(sample_array_3.transpose())
print(sample_array_3.T.shape)

[[[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]]

 [[10 11 12]
  [13 14 15]
  [16 17 18]]]
(2, 3, 3)
[[[ 1 10]
  [ 4 13]
  [ 7 16]]

 [[ 2 11]
  [ 5 14]
  [ 8 17]]

 [[ 3 12]
  [ 6 15]
  [ 9 18]]]
(3, 3, 2)
