# Numpy: Numerical Data Manipulation 

## What is NumPy?

[NumPy](https://numpy.org/doc/stable/index.html) stands for numerical Python. It's the backbone of all kinds of scientific and numerical computing in Python.

And since machine learning is all about turning data into numbers and then figuring out the patterns, NumPy often comes into play.

## Why NumPy?

You can do numerical calculations using pure Python. In the beginning, you might think Python is fast but once your data gets large, you'll start to notice slow downs.

One of the main reasons you use NumPy is because it's fast. Behind the scenes, the code has been optimized to run using C. Which is another programming language, which can do things much faster than Python.

The benefit of this being behind the scenes is you don't need to know any C to take advantage of it. You can write your numerical computations in Python using NumPy and get the added speed benefits.

If your curious as to what causes this speed benefit, it's a process called vectorization. [Vectorization](https://en.wikipedia.org/wiki/Vectorization) aims to do calculations by avoiding loops as loops can create potential bottlenecks.

NumPy achieves vectorization through a process called [broadcasting](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html#module-numpy.doc.broadcasting).

## 0. Importing NumPy

To get started using NumPy, the first step is to import it. 

The most common way (and method you should use) is to import NumPy as the abbreviation `np`.

If you see the letters `np` used anywhere in machine learning or data science, it's probably referring to the NumPy library.

In [1]:
import numpy as np

# Check the version
print(np.__version__)

2.2.4


## 1. DataTypes and attributes

> **Note:** Important to remember the main type in NumPy is `ndarray`, even seemingly different kinds of arrays are still `ndarray`'s. This means an operation you do on one array, will work on another.

In [2]:
# 1-dimensonal array, also referred to as a vector
a1 = np.array([1, 2, 3])
a1

array([1, 2, 3])

In [3]:
a1.shape

(3,)

In [4]:
a1.ndim

1

In [5]:
a1.dtype

dtype('int64')

In [6]:
a1.size

3

In [7]:
type(a1)

numpy.ndarray

In [8]:
# 2-dimensional array, also referred to as matrix
a2 = np.array([[1, 2.0, 3.3],
               [4, 5, 6.5]])
a2

array([[1. , 2. , 3.3],
       [4. , 5. , 6.5]])

In [9]:
a2.shape

(2, 3)

In [10]:
a2.ndim

2

In [11]:
a2.dtype

dtype('float64')

In [12]:
a2.size

6

In [13]:
type(a2)

numpy.ndarray

In [14]:
# 3-dimensional array, also referred to as a matrix
a3 = np.array([[[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]],
                [[10, 11, 12],
                 [13, 14, 15],
                 [16, 17, 18]]])
a3

array([[[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9]],

       [[10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]]])

In [15]:
a3.shape

(2, 3, 3)

In [16]:
a3.ndim

3

In [17]:
a3.dtype

dtype('int64')

In [18]:
a3.size

18

In [19]:
type(a3)

numpy.ndarray

### Anatomy of an array
Key terms:
* **Array** - A list of numbers, can be multi-dimensional.
* **Scalar** - A single number (e.g. `7`).
* **Vector** - A list of numbers with 1-dimension (e.g. `np.array([1, 2, 3])`).
* **Matrix** - A (usually) multi-dimensional list of numbers (e.g. `np.array([[1, 2, 3], [4, 5, 6]])`).

### pandas DataFrame out of NumPy arrays

This is to examplify how NumPy is the backbone of many other libraries.

In [20]:
np.random.randint(10, size=(5, 3))

array([[8, 1, 6],
       [4, 0, 5],
       [7, 6, 4],
       [3, 9, 7],
       [8, 3, 0]], dtype=int32)

In [21]:
import pandas as pd
df = pd.DataFrame(np.random.randint(10, size=(5, 3)), 
                                    columns=['a', 'b', 'c'])
df

Unnamed: 0,a,b,c
0,7,7,2
1,4,9,5
2,4,9,3
3,7,3,8
4,8,9,3


In [22]:
a = np.array([[1,2.9,3.3],
              [4,5,6.5]])
a

array([[1. , 2.9, 3.3],
       [4. , 5. , 6.5]])

In [23]:
df = pd.DataFrame(a, columns = ['a', 'b', 'c'])
df

Unnamed: 0,a,b,c
0,1.0,2.9,3.3
1,4.0,5.0,6.5


## 2. Creating arrays

* `np.array()`
* `np.ones()`
* `np.zeros()`
* `np.random.rand(5, 3)`
* `np.random.randint(10, size=5)`
* `np.random.seed()` - pseudo random numbers
* Searching the documentation example (finding `np.unique()` and using it)

### np.array()

In [24]:
simple_array = np.array([1,2,3])
simple_array

array([1, 2, 3])

In [25]:
simple_array.dtype

dtype('int64')

### np.ones()
* np.ones(shape, dtype=None, order='C', *, like=None)
* Return a new array of given shape and type, filled with ones.s.

In [26]:
ones = np.ones((10,2))
ones

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

#### ones.dtype # float by defalut

In [27]:
ones.astype(int)

array([[1, 1],
       [1, 1],
       [1, 1],
       [1, 1],
       [1, 1],
       [1, 1],
       [1, 1],
       [1, 1],
       [1, 1],
       [1, 1]])

### np.zeros()
* zeros(shape, dtype=float, order='C', *, like=None)
* 
Return a new array of given shape and type, filled with zeros.

In [28]:
zeros = np.zeros((3,2,3))
zeros

array([[[0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.]]])

In [29]:
zeros.dtype

dtype('float64')

In [30]:
zeros.astype(int)

array([[[0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0]]])

### np.arange()
* arange([start,] stop[, step,], dtype=None, *, like=None)
* 
Return evenly spaced values within a given interval.

In [31]:
range_array = np.arange(0,10,2)
range_array

array([0, 2, 4, 6, 8])

### np.random.rand()
* rand(d0, d1, ..., dn)
* 
Random values in a given shap
* Create an array of the given shape and populate it with
random samples from a uniform distribution
over ``[0, 1)``.e.

In [32]:
random_array = np.random.rand(5,3)
random_array

array([[0.1321418 , 0.08514691, 0.00580593],
       [0.96738656, 0.77166382, 0.45866095],
       [0.32568666, 0.79747258, 0.67999189],
       [0.95043905, 0.61818531, 0.56668677],
       [0.67735529, 0.95823987, 0.41881862]])

### np.random.randint(10, size=5)
* randint(low, high=None, size=None, dtype=int)
* 
Return random integers from `low` (inclusive) to `high` (exclusive
* Return random integers from the "discrete uniform" distribution of the specified dtype in the "half-open" interval [`low`, `high`). If `high` is None (the default), then results are from [0, `low`).low`).

In [33]:
random_int_array = np.random.randint(10, size = 5) 
random_int_array

array([7, 1, 2, 4, 1], dtype=int32)

In [34]:
random_int_array = np.random.randint(10, size = (2,3)) 
random_int_array

array([[9, 1, 7],
       [2, 7, 1]], dtype=int32)

### np.random.seed() 
* pseudo random number, which means, the numbers look random but aren't really, they're predetermined.
* For consistency, you might want to keep the random numbers you generate similar throughout experiments.
* What this does is it tells NumPy, "Hey, I want you to create random numbers but keep them aligned with the seed."

In [35]:
np.random.seed(0)
np.random.randint(10, size=(5,3))

array([[5, 0, 3],
       [3, 7, 9],
       [3, 5, 2],
       [4, 7, 6],
       [8, 8, 1]], dtype=int32)

With np.random.seed() set, every time you run the cell above, the same random numbers will be generated.

What if np.random.seed() wasn't set?

Every time you run the cell below, a new set of numbers will appear.

In [36]:
np.random.randint(10, size=(5, 3))

array([[6, 7, 7],
       [8, 1, 5],
       [9, 8, 9],
       [4, 3, 0],
       [3, 5, 0]], dtype=int32)

Because np.random.seed() is set to 0, the random numbers are the same as the cell with np.random.seed() set to 0 as well.

Setting np.random.seed() is not 100% necessary but it's helpful to keep numbers the same throughout your experiments.

For example, say you wanted to split your data randomly into training and test sets.

Every time you randomly split, you might get different rows in each set.

If you shared your work with someone else, they'd get different rows in each set too.

Setting np.random.seed() ensures there's still randomness, it just makes the randomness repeatable. Hence the 'pseudo-random' numbers.

In [37]:
np.random.seed(0)
df = pd.DataFrame(np.random.randint(10, size=(5, 3)))
df

Unnamed: 0,0,1,2
0,5,0,3
1,3,7,9
2,3,5,2
3,4,7,6
4,8,8,1


### np.unique()
* syntax: np.unique(ar, return_index=False, return_inverse=False, resturn_counts=False, axis=None)
* find the unique elements of an array
* returns the sorted unique elements of an array.

In [38]:
a = np.array([[1, 2, 3],
               [4, 2, 6],
               [1, 2, 9]])

unique_values = np.unique(a)
unique_values

array([1, 2, 3, 4, 6, 9])

## 3. Viewing arrays and matrices (indexing)

Remember, because arrays and matrices are both `ndarray`'s, they can be viewed in similar ways.
Array shapes are always listed in the format `(row, column, n, n, n...)` where `n` is optional extra dimensions.

Let's check out our 3 arrays again.

In [39]:
a1 = np.array([1,2,3])

a2 = np.array([[1, 2.0, 3.3],
              [4, 5, 6.5]])

a3 = np.array([[[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]],
               [[10, 11, 12],
                [13, 14, 15],
                [16, 17, 18]]])


In [40]:
a1

array([1, 2, 3])

In [41]:
a1[0]

np.int64(1)

In [42]:
a2

array([[1. , 2. , 3.3],
       [4. , 5. , 6.5]])

In [43]:
a2[0]

array([1. , 2. , 3.3])

In [44]:
a2[0][1]

np.float64(2.0)

In [45]:
a2[0,1]

np.float64(2.0)

In [46]:
a2[0, :2]

array([1., 2.])

In [47]:
a3

array([[[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9]],

       [[10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]]])

In [48]:
a3[0]

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [49]:
a3[0][2]

array([7, 8, 9])

In [50]:
a3[0][2][1]

np.int64(8)

In [51]:
a3[0, 2, 1]

np.int64(8)

In [52]:
a3[0, 0, :2]

array([1, 2])

In [53]:
a3[0, 2, :2]

array([7, 8])

In [54]:
a3[0, :2, :2]

array([[1, 2],
       [4, 5]])

In [55]:
a3[:2, :2, :2]

array([[[ 1,  2],
        [ 4,  5]],

       [[10, 11],
        [13, 14]]])

This takes a bit of practice, especially when the dimensions get higher. Usually, it takes me a little trial and error of trying to get certain values, viewing the output in the notebook and trying again.

NumPy arrays get printed from outside to inside. This means the number at the end of the shape comes first, and the number at the start of the shape comes last.

## 4. Manipulating and comparing arrays
* Arithmetic
    * `+`, `-`, `*`, `/`, `//`, `**`, `%`
    * `np.exp()`
    * `np.log()`
    * [Dot product](https://www.mathsisfun.com/algebra/matrix-multiplying.html) - `np.dot()`
    * Broadcasting
* Reshaping
    * `np.reshape()`
* Transposing
    * `a3.T`
* Aggregation
    * `np.sum()` - faster than Python's `.sum()` for NumPy arrays
    * `np.mean()`
    * `np.std()`- measure of hoe spread out a group of numbers is fromm the mean. (is just teh square root of the variance)
    * `np.var()` - measure of the average degree to which each number is different to the mean. (Higher variance = wider range of numbers & Hlower variance - narrower range of numbers) 
    * `np.min()`
    * `np.max()`
    * `np.argmin()` - find index of minimum value
    * `np.argmax()` - find index of maximum value
    * These work on all `ndarray`'s
        * `a4.min(axis=0)` -- you can use axis as well
* Comparison operators
    * `>`
    * `<`
    * `<=`
    * `>=`
    * `x != 3`
    * `x == 3`
    * `np.sum(x > 3)`

### Arithmetic

In [56]:
a1 = np.array([1, 2, 3])
a1

array([1, 2, 3])

In [57]:
ones = np.ones(3)
ones

array([1., 1., 1.])

In [58]:
a1 + ones

array([2., 3., 4.])

In [59]:
np.add(a1, ones)

array([2., 3., 4.])

In [60]:
a1 - ones

array([0., 1., 2.])

In [61]:
a2 = np.array([3, 4, 9])
a2

array([3, 4, 9])

In [62]:
a1 * a2

array([ 3,  8, 27])

In [63]:
a2 / a1

array([3., 2., 3.])

In [64]:
a2 // a1

array([3, 2, 3])

In [65]:
a2 % a1

array([0, 0, 0])

In [66]:
a1 ** 2

array([1, 4, 9])

In [67]:
np.square(a1)

array([1, 4, 9])

In [68]:
np.exp(a1)

array([ 2.71828183,  7.3890561 , 20.08553692])

In [69]:
np.log(a1)

array([0.        , 0.69314718, 1.09861229])

Note: np.dot()
* Dot product of two arrays. Specifically,
   * If both `a` and `b` are 1-D arrays, it is inner product of vectors
  (without complex conjugation).
   * If both `a` and `b` are 2-D arrays, it is matrix multiplication,
  but using :func:`matmul` or ``a @ b`` is preferred.
   * If either `a` or `b` is 0-D (scalar), it is equivalent to
  :func:`multiply` and using ``numpy.multiply(a, b)`` or ``a * b`` is
  preferred.

In [70]:
d1 = np.array([1, 2, 3])
d2 = np.array([4, 5, 6])
np.dot(d1, d2)

np.int64(32)

In [71]:
d1 = np.array([[1, 2, 3],
               [4, 5, 6]])
d2 = np.array([[7, 8, 9],
               [10, 11, 12]])
np.dot(d1, d2)

ValueError: shapes (2,3) and (2,3) not aligned: 3 (dim 1) != 2 (dim 0)

In [73]:
d1 * d2

array([[ 7, 16, 27],
       [40, 55, 72]])

In [74]:
d1 = np.array([[1, 4],
               [2, 5],
               [3, 6]])
np.dot(d1, d2)

array([[47, 52, 57],
       [64, 71, 78],
       [81, 90, 99]])

### Note:

In [75]:
a3 = np.array([[4, 5, 6],
               [7, 8, 9]])
a3

array([[4, 5, 6],
       [7, 8, 9]])

In [76]:
a2 * a3

array([[12, 20, 54],
       [21, 32, 81]])

In [77]:
a2.shape, a3.shape

((3,), (2, 3))

In [78]:
a4 = np.array([[[1, 2, 3],
               [4, 6, 9],
               [7, 8, 12]],
              [[7, 8, 12],
               [10, 12, 15],
               [4, 6, 10]]])
a4

array([[[ 1,  2,  3],
        [ 4,  6,  9],
        [ 7,  8, 12]],

       [[ 7,  8, 12],
        [10, 12, 15],
        [ 4,  6, 10]]])

In [79]:
a3 * a4

ValueError: operands could not be broadcast together with shapes (2,3) (2,3,3) 

In [80]:
a3.shape, a4.shape

((2, 3), (2, 3, 3))

In [81]:
a5 = np.array([[1,2],
              [3, 4],
              [5, 6]])
a5

array([[1, 2],
       [3, 4],
       [5, 6]])

In [82]:
a3 * a5

ValueError: operands could not be broadcast together with shapes (2,3) (3,2) 

In [83]:
a3.shape, a5.shape

((2, 3), (3, 2))

### Broadcasting

- What is broadcasting?
    - Broadcasting is a feature of NumPy which performs an operation across multiple dimensions of data without replicating the data. This saves time and space. For example, if you have a 3x3 array (A) and want to add a 1x3 array (B), NumPy will add the row of (B) to every row of (A).

- Rules of Broadcasting
    1. If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
    2. If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
    3. If in any dimension the sizes disagree and neither is equal to 1, an error is raised.
    
    
**The broadcasting rule:**
In order to broadcast, the size of the trailing axes for both arrays in an operation must be either the same size or one of them must be one.

### Reshaping

In [84]:
a3

array([[4, 5, 6],
       [7, 8, 9]])

In [85]:
a4

array([[[ 1,  2,  3],
        [ 4,  6,  9],
        [ 7,  8, 12]],

       [[ 7,  8, 12],
        [10, 12, 15],
        [ 4,  6, 10]]])

In [86]:
a3.shape, a4.shape

((2, 3), (2, 3, 3))

In [87]:
a3.reshape(2,3,1)

array([[[4],
        [5],
        [6]],

       [[7],
        [8],
        [9]]])

In [88]:
a3.reshape(2,3,1) * a4

array([[[  4,   8,  12],
        [ 20,  30,  45],
        [ 42,  48,  72]],

       [[ 49,  56,  84],
        [ 80,  96, 120],
        [ 36,  54,  90]]])

### Transpose

A tranpose reverses the order of the axes. 

For example, an array with shape `(2, 3)` becomes `(3, 2)`.

In [89]:
a3.shape, a5.shape

((2, 3), (3, 2))

In [90]:
a3.T

array([[4, 7],
       [5, 8],
       [6, 9]])

In [91]:
a3.T.shape

(3, 2)

In [92]:
a3.T * a5

array([[ 4, 14],
       [15, 32],
       [30, 54]])

For larger arrays, the default value of a tranpose is to swap the first and last axes.

> Add blockquote



For example, `(5, 3, 2)` -> `(2, 3, 5)`. 

In [93]:
matrix = np.random.randint(10, size=(5, 3, 2))
matrix, matrix.shape

(array([[[6, 7],
         [7, 8],
         [1, 5]],
 
        [[9, 8],
         [9, 4],
         [3, 0]],
 
        [[3, 5],
         [0, 2],
         [3, 8]],
 
        [[1, 3],
         [3, 3],
         [7, 0]],
 
        [[1, 9],
         [9, 0],
         [4, 7]]], dtype=int32),
 (5, 3, 2))

In [94]:
matrix.T, matrix.T.shape

(array([[[6, 9, 3, 1, 1],
         [7, 9, 0, 3, 9],
         [1, 3, 3, 7, 4]],
 
        [[7, 8, 5, 3, 9],
         [8, 4, 2, 3, 0],
         [5, 0, 8, 0, 7]]], dtype=int32),
 (2, 3, 5))

### Aggregation

Aggregation - bringing things together, doing a similar thing on a number of things.

In [95]:
list_1 = [1, 2, 3]
type(list_1)

list

In [96]:
sum(list_1)

6

In [97]:
a1 = np.array([1, 2, 3])
type(a1)

numpy.ndarray

In [98]:
sum(a1)

np.int64(6)

In [99]:
np.sum(a1)

np.int64(6)

**Tip:** Use NumPy's methods(`np.sum()`) on NumPy arrays and Python's methods (`sum()`) on Python datatypes(`list`s).

In [100]:
massive_array = np.random.random(100000)
massive_array.size

100000

In [101]:
massive_array[:10]

array([0.26455561, 0.77423369, 0.45615033, 0.56843395, 0.0187898 ,
       0.6176355 , 0.61209572, 0.616934  , 0.94374808, 0.6818203 ])

Note: `%timeit` times how long a particular line of code takes to run.

In [102]:
%timeit sum(massive_array)
%timeit np.sum(massive_array)

14.7 ms ± 2.97 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
35.4 μs ± 3.64 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


Notice `np.sum()` is faster on the Numpy array (`numpy.ndarray`) than Python's `sum()`.

In [103]:
a2 = np.array([[1, 2.0, 3.3],
               [4, 5, 6.5]])
a2

array([[1. , 2. , 3.3],
       [4. , 5. , 6.5]])

In [104]:
np.max(a2)

np.float64(6.5)

In [105]:
np.argmax(a2)

np.int64(5)

In [106]:
np.min(a2)

np.float64(1.0)

In [107]:
np.argmin(a2)

np.int64(0)

**What's mean?**

Mean is the same as average. You can find the average of a set of numbers by adding them up and dividing them by how many there are.

In [108]:
a2.mean()

np.float64(3.6333333333333333)

In [109]:
high_var_array = np.array([1, 100, 200, 300, 4000, 5000])
low_var_array = np.array([2, 4, 6, 8, 10])

In [110]:
np.mean(high_var_array), np.mean(low_var_array)

(np.float64(1600.1666666666667), np.float64(6.0))

**What's standard deviation?**

[Standard deviation](https://www.mathsisfun.com/data/standard-deviation.html) is a measure of how spread out numbers are.

In [111]:
np.std(a2)

np.float64(1.8226964152656422)

In [112]:
high_var_array = np.array([1, 100, 200, 300, 4000, 5000])
low_var_array = np.array([2, 4, 6, 8, 10])

In [113]:
np.std(high_var_array), np.std(low_var_array)

(np.float64(2072.711623024829), np.float64(2.8284271247461903))

**What's variance?**

The [variance](https://www.mathsisfun.com/data/standard-deviation.html) is the averaged squared differences of the mean.

To work it out, you:
1. Work out the mean
2. For each number, subtract the mean and square the result
3. Find the average of the squared differences`

In [114]:
np.var(a2)

np.float64(3.3222222222222224)

In [115]:
high_var_array = np.array([1, 100, 200, 300, 4000, 5000])
low_var_array = np.array([2, 4, 6, 8, 10])

In [116]:
np.var(high_var_array), np.var(low_var_array)

(np.float64(4296133.472222221), np.float64(8.0))

In [117]:
np.sqrt(np.var(a2)) # = np.std(a2)

np.float64(1.8226964152656422)

**What's sqrt?**

Return the non-negative square-root of an array, element-wise.

In [118]:
a = np.array([16, 4, 9])
np.sqrt(a)

array([4., 2., 3.])

### Dot product

The main two rules for dot product to remember are:

1. The **inner dimensions** must match:
  * `(3, 2) @ (3, 2)` won't work
  * `(2, 3) @ (3, 2)` will work
  * `(3, 2) @ (2, 3)` will work
  
2. The resulting matrix has the shape of the **outer dimensions**:
 * `(2, 3) @ (3, 2)` -> `(2, 2)`
 * `(3, 2) @ (2, 3)` -> `(3, 3)`
 
**Note:** In NumPy, `np.dot()` and `@` can be used to acheive the same result for 1-2 dimension arrays. However, their behaviour begins to differ at arrays with 3+ dimensions.

In [119]:
np.random.seed(0)
mat1 = np.random.randint(10, size=(3,3))
mat2 = np.random.randint(10, size=(3,2))

In [120]:
mat1, mat1.shape

(array([[5, 0, 3],
        [3, 7, 9],
        [3, 5, 2]], dtype=int32),
 (3, 3))

In [121]:
mat2, mat2.shape

(array([[4, 7],
        [6, 8],
        [8, 1]], dtype=int32),
 (3, 2))

In [122]:
np.dot(mat1, mat2)

array([[ 44,  38],
       [126,  86],
       [ 58,  63]], dtype=int32)

In [123]:
# Can also achieve np.dot() with "@" 
# (however, they may behave differently at 3D+ arrays)
mat1 @ mat2

array([[ 44,  38],
       [126,  86],
       [ 58,  63]], dtype=int32)

In [124]:
np.random.seed(0)
mat3 = np.random.randint(10, size=(4,3))
mat4 = np.random.randint(10, size=(4,3))

In [125]:
mat3, mat3.shape

(array([[5, 0, 3],
        [3, 7, 9],
        [3, 5, 2],
        [4, 7, 6]], dtype=int32),
 (4, 3))

In [126]:
mat4, mat4.shape

(array([[8, 8, 1],
        [6, 7, 7],
        [8, 1, 5],
        [9, 8, 9]], dtype=int32),
 (4, 3))

In [127]:
np.dot(mat3, mat4)

ValueError: shapes (4,3) and (4,3) not aligned: 3 (dim 1) != 4 (dim 0)

In [128]:
mat3.T, mat3.T.shape

(array([[5, 3, 3, 4],
        [0, 7, 5, 7],
        [3, 9, 2, 6]], dtype=int32),
 (3, 4))

In [129]:
np.dot(mat3.T, mat4)

array([[118,  96,  77],
       [145, 110, 137],
       [148, 137, 130]], dtype=int32)

### Element wise / Hadamard product
The main two rules for hadamard product to remember are:
1. The **dimensions** must match exactly:
  * `(3, 2) * (3, 2)` will work
  * `(2, 3) * (2, 3)` will work
  * `(3, 2) * (2, 3)` won't work
  
2. The resulting matrix has the shape of the **exact dimensions**:
 * `(3, 2) * (3, 2)` -> `(3, 2)`
 * `(2, 3) * (2, 3)` -> `(2, 3)`

In [130]:
mat3, mat3.shape

(array([[5, 0, 3],
        [3, 7, 9],
        [3, 5, 2],
        [4, 7, 6]], dtype=int32),
 (4, 3))

In [131]:
mat4, mat4.shape

(array([[8, 8, 1],
        [6, 7, 7],
        [8, 1, 5],
        [9, 8, 9]], dtype=int32),
 (4, 3))

In [132]:
mat3 * mat4

array([[40,  0,  3],
       [18, 49, 63],
       [24,  5, 10],
       [36, 56, 54]], dtype=int32)

### Comparison operators 

Finding out if one array is larger, smaller or equal to another.

In [133]:
a1 = np.array([1, 2, 3])
a1

array([1, 2, 3])

In [134]:
a2 = np.array([[1, 2.0, 3.3],
               [4, 5, 6.5]])
a2

array([[1. , 2. , 3.3],
       [4. , 5. , 6.5]])

In [135]:
a1>a2

array([[False, False, False],
       [False, False, False]])

In [136]:
a1>=a2

array([[ True,  True, False],
       [False, False, False]])

In [137]:
a1<a2

array([[False, False,  True],
       [ True,  True,  True]])

In [138]:
a1<=a2

array([[ True,  True,  True],
       [ True,  True,  True]])

In [139]:
a1 == a2

array([[ True,  True, False],
       [False, False, False]])

In [140]:
a1 != a2

array([[False, False,  True],
       [ True,  True,  True]])

 ## 5. Sorting arrays

In [142]:
np.random.seed(0)
random_array = np.random.randint(10, size = (3,5))
random_array

array([[5, 0, 3, 3, 7],
       [9, 3, 5, 2, 4],
       [7, 6, 8, 8, 1]], dtype=int32)

### [`np.sort()`](https://numpy.org/doc/stable/reference/generated/numpy.sort.html) 
sort values in a specified dimension of an array.


In [143]:
np.sort(random_array)

array([[0, 3, 3, 5, 7],
       [2, 3, 4, 5, 9],
       [1, 6, 7, 8, 8]], dtype=int32)

### [`np.argsort()`](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html) 
return the indices to sort the array on a given axis.

In [144]:
np.argsort(random_array)

array([[1, 2, 3, 0, 4],
       [3, 1, 4, 2, 0],
       [4, 1, 0, 2, 3]])

### [`np.argmax()`](https://numpy.org/doc/stable/reference/generated/numpy.argmax.html) 
return the index/indicies which gives the highest value(s) along an axis.

In [145]:
np.argmax(random_array)

np.int64(5)

In [149]:
np.argmax(random_array, axis = 0)

array([1, 2, 2, 2, 0])

In [150]:
np.argmax(random_array, axis=1)

array([4, 0, 2])

### [`np.argmin()`](https://numpy.org/doc/stable/reference/generated/numpy.argmin.html) 
return the index/indices which gives the lowest value(s) along an axis.

In [146]:
np.argmin(random_array)

np.int64(1)

In [151]:
np.argmin(random_array, axis = 0)

array([0, 0, 0, 1, 2])

In [152]:
np.argmin(random_array, axis = 1)

array([1, 3, 4])

## 6. Use case

Turning an image into a NumPy array.

Why?

Because computers can use the numbers in the NumPy array to find patterns in the image and in turn use those patterns to figure out what's in the image.

This is what happens in modern computer vision algorithms.

Let's start with this beautiful image of a panda:

<img src="../images/panda.jpg" alt="photo of a panda waving" width=450/>

In [163]:
# turn image into NumPy array
from matplotlib.image import imread

panda = imread('images/panda.jpg')
print(type(panda))

<class 'numpy.ndarray'>


In [166]:
panda.shape, panda.size

((148, 148, 3), 65712)

In [165]:
panda

array([[[ 66,  74,  53],
        [ 74,  84,  60],
        [ 76,  86,  59],
        ...,
        [ 85, 154,  74],
        [ 77, 145,  68],
        [ 76, 144,  67]],

       [[ 53,  60,  42],
        [ 56,  64,  43],
        [ 53,  63,  38],
        ...,
        [ 84, 152,  79],
        [ 76, 144,  71],
        [ 74, 142,  69]],

       [[ 55,  59,  44],
        [ 50,  54,  37],
        [ 42,  50,  27],
        ...,
        [ 97, 162,  98],
        [ 90, 155,  91],
        [ 85, 150,  86]],

       ...,

       [[ 69, 112,  58],
        [ 75, 115,  62],
        [ 75, 112,  60],
        ...,
        [ 39,  84,  45],
        [ 36,  81,  42],
        [ 33,  77,  41]],

       [[ 74, 117,  63],
        [ 82, 122,  69],
        [ 84, 121,  69],
        ...,
        [ 40,  82,  44],
        [ 41,  83,  45],
        [ 41,  83,  47]],

       [[ 82, 125,  71],
        [ 90, 130,  77],
        [ 95, 132,  78],
        ...,
        [ 40,  80,  43],
        [ 45,  85,  48],
        [ 48,  88,  53]]