# Fundamental of Numpy

Numpy is a library built on top of C++, commonly used in data science and machine learning. This is typically the backbone for many other machine learning algs. For instance, pandas DataFrame's are built on numpy arrays.

* https://numpy.org/doc/stable/reference/index.html

The primary reason numpy is commonly used is due to the performance behind numpy. Numpy arrays utilize vectorization which can utilize operations on entire sets of data rather than a single datapoint at a time.

* https://www.scaler.com/topics/np-vectorize/

The purpose of this notebook is to dig into the fundamentals of Numpy.

## Imports

In [1]:
from matplotlib.image import imread
import numpy as np
import pandas as pd

## DataTypes & Attributes

Numpy's main datatype is `ndarray`.

In [2]:
a1 = np.array([1, 2, 3])
a1, type(a1)

(array([1, 2, 3]), numpy.ndarray)

In [3]:
# Shape = (2, 3)
a2 = np.array([[1, 2.0, 3.3],
               [4, 5,6.5]])

# Shape = (2, 3, 3)
a3 = np.array([[[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]],
               
               [[10, 11, 12],
                [13, 14, 15],
                [16, 17, 18]]])
a2, a3

(array([[1. , 2. , 3.3],
        [4. , 5. , 6.5]]),
 array([[[ 1,  2,  3],
         [ 4,  5,  6],
         [ 7,  8,  9]],
 
        [[10, 11, 12],
         [13, 14, 15],
         [16, 17, 18]]]))

In [4]:
# Looking at shapes
a1.shape, a2.shape, a3.shape

((3,), (2, 3), (2, 3, 3))

In [5]:
# Looking at dims
a1.ndim, a2.ndim, a3.ndim

(1, 2, 3)

In [6]:
# Looking at data types
a1.dtype, a2.dtype, a3.dtype

(dtype('int64'), dtype('float64'), dtype('int64'))

In [7]:
# Looking at size
a1.size, a2.size, a3.size

(3, 6, 18)

## Creating Numpy Arrays

Numpy Arrays can be created multiple different ways. This includes creating an array of `ones`, `zeros`, `range`, `random ints`, `random floats`, etc.

In [8]:
# Standard
sample_array= np.array([1,2 ,3])
sample_array, sample_array.dtype

(array([1, 2, 3]), dtype('int64'))

In [9]:
# ones array
ones = np.ones((2, 3))
ones, ones.dtype

(array([[1., 1., 1.],
        [1., 1., 1.]]),
 dtype('float64'))

In [10]:
# zeros array
zeros = np.zeros((2, 3))
zeros, zeros.dtype

(array([[0., 0., 0.],
        [0., 0., 0.]]),
 dtype('float64'))

In [11]:
# Range array
range_array = np.arange(0, 10, 2)
range_array, range_array.dtype

(array([0, 2, 4, 6, 8]), dtype('int64'))

In [12]:
# Random array of integers
random_array = np.random.randint(0, 10, size=(3, 5))
random_array

array([[2, 5, 8, 9, 6],
       [9, 0, 6, 1, 9],
       [8, 4, 9, 9, 5]])

In [13]:
# Random array of floats between 0 and 1
random_array_2 = np.random.random((5, 3))
random_array_2

array([[0.82364352, 0.65896297, 0.87164273],
       [0.12933642, 0.45519985, 0.4832369 ],
       [0.4799328 , 0.25805752, 0.75278828],
       [0.85523284, 0.22318622, 0.84809305],
       [0.11176497, 0.33330878, 0.31411775]])

In [14]:
# Random array between 0 and 1
random_array_3 = np.random.rand(5, 3)
random_array_3

array([[0.89545396, 0.93234378, 0.15541472],
       [0.45324928, 0.07021571, 0.02837654],
       [0.33309628, 0.21927237, 0.28311   ],
       [0.02564652, 0.85574769, 0.37875308],
       [0.1354138 , 0.06420949, 0.90081021]])

In [15]:
# Setting random seed (random is actualy pseudo-random numbers)
# Setting a seed makes the number reproduceable.
np.random.seed(5)
random_array_4 = np.random.randint(10, size=(5, 3))
random_array_4

array([[3, 6, 6],
       [0, 9, 8],
       [4, 7, 0],
       [0, 7, 1],
       [5, 7, 0]])

## Viewing Arrays and Matrices

In [16]:
# Resetting the random_array
random_array_4 = np.random.randint(10, size=(5, 3))
random_array_4

array([[1, 4, 6],
       [2, 9, 9],
       [9, 9, 1],
       [2, 7, 0],
       [5, 0, 0]])

In [17]:
# Listing all unique values in an array/matrix
np.unique(random_array_4)

array([0, 1, 2, 4, 5, 6, 7, 9])

In [18]:
a1, a2, a3

(array([1, 2, 3]),
 array([[1. , 2. , 3.3],
        [4. , 5. , 6.5]]),
 array([[[ 1,  2,  3],
         [ 4,  5,  6],
         [ 7,  8,  9]],
 
        [[10, 11, 12],
         [13, 14, 15],
         [16, 17, 18]]]))

In [19]:
# Indexing
a1[0], a2[0], a3[0]

(1,
 array([1. , 2. , 3.3]),
 array([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]]))

In [20]:
a1[1], a2[1,2], a3[1,2,2]

(2, 6.5, 18)

In [21]:
# Slicing
a1[:2], a2[:2, :2], a3[:2, :2, :2]

(array([1, 2]),
 array([[1., 2.],
        [4., 5.]]),
 array([[[ 1,  2],
         [ 4,  5]],
 
        [[10, 11],
         [13, 14]]]))

## Manipulating & Comparing Arrays

### Arithmetic

In [22]:
a1, ones

(array([1, 2, 3]),
 array([[1., 1., 1.],
        [1., 1., 1.]]))

In [23]:
# Summing arrays
a1 + ones

array([[2., 3., 4.],
       [2., 3., 4.]])

In [24]:
# Subtracting
a1 - ones

array([[0., 1., 2.],
       [0., 1., 2.]])

In [25]:
# Multiplying (element wise)
a1 * ones

array([[1., 2., 3.],
       [1., 2., 3.]])

In [26]:
# Dividing (element wise)
a1 / ones

array([[1., 2., 3.],
       [1., 2., 3.]])

In [27]:
# Broadcasting error (improper shapes when arithmetic)
## a2 + a3

# The above code results in the following error
# ValueError: operands could not be broadcast together with shapes (2,3) (2,3,3)

# this is because (2, 3) != (2, 3, 3) (the shape of a2 does not equal to a3

In [28]:
# Powers (element wise)
a1 ** 2

array([1, 4, 9])

In [29]:
# Modulo (element wise)
a1 % 2

array([1, 0, 1])

In [30]:
# Exponential (element wise)
np.exp(a1)

array([ 2.71828183,  7.3890561 , 20.08553692])

In [31]:
# log (element wise)
np.log(a1)

array([0.        , 0.69314718, 1.09861229])

### Aggregation

In [32]:
# Summing elements in a list
# NOTE: sum(a1) and np.sum(a1) return the same thing, but when working with numpy, use numpy methods.
#       and when working with python data, use python methods.
# Reason being for this, the numpy sum is signifcantly faster for numpy arrays.

massive_array = np.random.random(100000)
%timeit sum(massive_array)  # Python's sum()
%timeit np.sum(massive_array)  # Numpy's sum()

3.91 ms ± 2.55 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
20.2 µs ± 118 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [33]:
# Mean
np.mean(a2)

3.6333333333333333

In [34]:
# Max & min
np.max(a2),  np.min(a2)

(6.5, 1.0)

In [35]:
# Standard Deviation & Variance
np.std(a2), np.var(a2)

(1.8226964152656422, 3.3222222222222224)

### Reshaping & Transposing

In [36]:
a2, a2.shape

(array([[1. , 2. , 3.3],
        [4. , 5. , 6.5]]),
 (2, 3))

In [37]:
# Reshape
a2_reshape = a2.reshape((2, 3, 1))
a2_reshape

array([[[1. ],
        [2. ],
        [3.3]],

       [[4. ],
        [5. ],
        [6.5]]])

In [38]:
# Refer back to Broadcasting Error code bit above, and let's see what happens after reshaping the a2 array to work
a2_reshape + a3

array([[[ 2. ,  3. ,  4. ],
        [ 6. ,  7. ,  8. ],
        [10.3, 11.3, 12.3]],

       [[14. , 15. , 16. ],
        [18. , 19. , 20. ],
        [22.5, 23.5, 24.5]]])

In [39]:
# Transpose
# Same transpose as linalg
a2, a2.transpose()

(array([[1. , 2. , 3.3],
        [4. , 5. , 6.5]]),
 array([[1. , 4. ],
        [2. , 5. ],
        [3.3, 6.5]]))

### Matrix Multiplication (Dot Product vs. Element wise)

In [40]:
np.random.seed(0)
mat1 = np.random.randint(10, size=(5,3))
mat2 = np.random.randint(10, size=(5,3))
mat1, mat2

(array([[5, 0, 3],
        [3, 7, 9],
        [3, 5, 2],
        [4, 7, 6],
        [8, 8, 1]]),
 array([[6, 7, 7],
        [8, 1, 5],
        [9, 8, 9],
        [4, 3, 0],
        [3, 5, 0]]))

In [41]:
# Matrix multiplication means the inner shapes must match 
# mat1.shape = (5, 3) & mat2.shape = (5, 3)
# To make inner shapes match, transpose mat2 -> mat1.shape = (5, 3) & mat2.tranpose().shape = (3, 5)
# (5, 3) @ (3, 5) -> (5, 5)
np.dot(mat1, mat2.transpose())

array([[ 51,  55,  72,  20,  15],
       [130,  76, 164,  33,  44],
       [ 67,  39,  85,  27,  34],
       [115,  69, 146,  37,  47],
       [111,  77, 145,  56,  64]])

In [42]:
# Using the @ operator to perform np.dot (matrix multiplication)
mat1 @ mat2.transpose()

array([[ 51,  55,  72,  20,  15],
       [130,  76, 164,  33,  44],
       [ 67,  39,  85,  27,  34],
       [115,  69, 146,  37,  47],
       [111,  77, 145,  56,  64]])

### Exercise: Nut Butter Store Sales

In [43]:
# Getting the total sales of 3 different nut butters for 5 days
np.random.seed(0)
total_sales = np.random.randint(20, size=(5,3))
pd.DataFrame(total_sales, columns=['Almond Butter', 'Peanut Butter', 'Cashew Butter'])

Unnamed: 0,Almond Butter,Peanut Butter,Cashew Butter
0,12,15,0
1,3,3,7
2,9,19,18
3,4,6,12
4,1,6,7


In [44]:
# Setting the prices for the butters
prices = np.array([10, 8, 12])

In [45]:
# Getting the total sales amounts for each day (Note: prices.T is shorthand way of prices.transpose())
total_sales @ prices.T

array([240, 138, 458, 232, 142])

### Comparison Operators

In [46]:
a1, a2

(array([1, 2, 3]),
 array([[1. , 2. , 3.3],
        [4. , 5. , 6.5]]))

In [47]:
# Checks if a1 cell is greater than a2 (element wise)
a1 > a2, a1 >= a2, a1 == a2, a1 != a2, a1 < a2, a1 <= a2

(array([[False, False, False],
        [False, False, False]]),
 array([[ True,  True, False],
        [False, False, False]]),
 array([[ True,  True, False],
        [False, False, False]]),
 array([[False, False,  True],
        [ True,  True,  True]]),
 array([[False, False,  True],
        [ True,  True,  True]]),
 array([[ True,  True,  True],
        [ True,  True,  True]]))

### Sorting Arrays

In [48]:
random_array = np.random.randint(10, size=(3, 5))
random_array

array([[7, 8, 1, 5, 9],
       [8, 9, 4, 3, 0],
       [3, 5, 0, 2, 3]])

In [49]:
# Sorts each element in each row in the array
np.sort(random_array)

array([[1, 5, 7, 8, 9],
       [0, 3, 4, 8, 9],
       [0, 2, 3, 3, 5]])

In [50]:
# Sort the values and return the index of that array
np.argsort(random_array)

array([[2, 3, 0, 1, 4],
       [4, 3, 2, 0, 1],
       [2, 3, 0, 4, 1]])

In [51]:
# Get the index of the minimum value and the maximum value
np.argmin(a1), np.argmax(a1)

(0, 2)

In [52]:
# What about multi-dimensional arrays

# NOTE: the return value below is an int, which is the index of the max value in a multi-dimensional array, if that
#       multi-dimensional array was a single dimensional array.
np.argmax(random_array)

4

## Practical Example - Numpy in Action

In [53]:
image_data_dir_path = './data/numpy-images'

<img src='./data/numpy-images/panda.png'/>

In [54]:
# Turn an image into a Numpy array (The array returned grabs the color values for each pixel in the image)
# The shape is (num_y_pixels, num_x_pixels, rgb_color_of_pixel)
panda = imread(f'{image_data_dir_path}/panda.png')
panda[:1]

array([[[0.05490196, 0.10588235, 0.06666667],
        [0.05490196, 0.10588235, 0.06666667],
        [0.05490196, 0.10588235, 0.06666667],
        ...,
        [0.16470589, 0.12941177, 0.09411765],
        [0.16470589, 0.12941177, 0.09411765],
        [0.16470589, 0.12941177, 0.09411765]]], dtype=float32)

In [55]:
panda.size, panda.shape, panda.ndim

(24465000, (2330, 3500, 3), 3)

<img src='./data/numpy-images/car-photo.png'/>

In [56]:
# Getting numpy array of car photo
car = imread(f'{image_data_dir_path}/car-photo.png')
car[:1]

array([[[0.5019608 , 0.50980395, 0.4862745 , 1.        ],
        [0.3372549 , 0.34509805, 0.30588236, 1.        ],
        [0.20392157, 0.21568628, 0.14901961, 1.        ],
        ...,
        [0.64705884, 0.7058824 , 0.54901963, 1.        ],
        [0.59607846, 0.63529414, 0.45882353, 1.        ],
        [0.44705883, 0.47058824, 0.3372549 , 1.        ]]], dtype=float32)

<img src='./data/numpy-images/dog-photo.png'/>

In [57]:
# Getting numpy array of dog photo
dog = imread(f'{image_data_dir_path}/dog-photo.png')
dog[:1]

array([[[0.70980394, 0.80784315, 0.88235295, 1.        ],
        [0.72156864, 0.8117647 , 0.8862745 , 1.        ],
        [0.7411765 , 0.8156863 , 0.8862745 , 1.        ],
        ...,
        [0.49803922, 0.6862745 , 0.8392157 , 1.        ],
        [0.49411765, 0.68235296, 0.8392157 , 1.        ],
        [0.49411765, 0.68235296, 0.8352941 , 1.        ]]], dtype=float32)