
## Code Dependencies

In [1]:
import numpy as np

from IPython.display import Image

## Matrix Multiplication is all you need

Let's start off with a quote from the famous/infamous [Mark Saroufim](https://marksaroufim.substack.com/) in his post [Machine Learning: The Great Stagnation](https://towardsdatascience.com/machine-learning-the-great-stagnation-3a0f044e17e0)

> I often get asked by young students new to Machine Learning, what math do I need to know for Deep Learning and my answer is Matrix Multiplication and Derivatives of square functions. 
>
> LSTMs a bunch of matrix multiplications, Transformers a whole bunch of matrix multiplications, CNNs use convolutions which are a generalization of matrix multiplication.
> 
> Deep Neural Networks are a composition of matrix multiplications with the occasional non-linearity in between.

![](https://drive.google.com/uc?id=1CMOpPOEPSqHL3N_CCjD4Ms94xxQJdEeP)

 

## Dot Product and Vector Addition

When multiplying **vectors**, you either perform a **dot product** or a **cross product**. 

A **cross product** results in a **vector** while a **dot product** results in a **scalar** (a single value/number).

### Cross Product

The cross product is defined only for two 3-element vectors, and the result is another 3-element vector. 

That's as far as we're going to go with the Cross Product; it is very rarely used in machine-learning (I have never seen it used).

### Dot Product


Dot product of two vectors:

![](https://drive.google.com/uc?id=1QrMzksXFeQvkCO5ksHXm7N6kwoj1xcIf)

In [2]:
a = [1, 2, 3]
b = [2, 3, 4]

In [3]:
dot_product = (a[0] * b[0]) + (a[1] * b[1]) + (a[2] * b[2])

dot_product

20

<br><br>

### Dot Product using Numpy

![](https://drive.google.com/uc?id=1g5r9kZpdgzbcInSwi88VEONi-O0wIjnl)

In [4]:
inputs  = [1.0, 2.0,  3.0]
weights = [0.2, 0.8, -0.5]
bias = 2.0

In [5]:
dot_product = np.dot(weights, inputs)
dot_product

0.30000000000000004

In [6]:
output = dot_product + bias
output

2.3

### Vector Addition

The **addition** of the two **vectors**: 
- is performed **element-wise**
- both vectors have to be of the same size
- the result is a vector of this size as well

![](https://drive.google.com/uc?id=1eZ-KV1T62fJci_7een4iqcBliuNOlsmG)

In [7]:
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])

In [8]:
added_vectors = a + b

added_vectors

array([3, 5, 7])

## A Layer of Neurons with NumPy

Let's calculate the output of a layer of 3 neurons, which means the weights will be a matrix or list of weight vectors.

NumPy makes this very easy for us - treating this **matrix** as a list of vectors and performing the dot product **one by one** with the vector
of inputs, returning a list of dot products.

In [9]:
Image(url='https://drive.google.com/uc?id=1kEKnaz_iPO5jfhijnrYSCbFsH0oBAR_-')

![](https://drive.google.com/uc?id=1t8aEIpA1liIbOJd61fp1uwFOWZW06Kq8)

In [10]:
inputs = [1.0, 2.0, 3.0, 2.5]

In [11]:
weights = [
    [0.2, 0.8, -0.5, 1.0],
    [0.5, -0.91, 0.26, -0.5],
    [-0.26, -0.27, 0.17, 0.87]
]


In [12]:
biases = [2.0, 3.0, 0.5]

In [13]:
dot_product = np.dot(weights, inputs)
dot_product

array([ 2.8  , -1.79 ,  1.885])

In [14]:
output = dot_product + biases
output 

array([4.8  , 1.21 , 2.385])

## A Batch of Data

To train, neural networks tend to receive data in **batches**.

In [15]:
inputs = [
    [1.0, 2.0, 3.0, 2.5],
    [2.0, 5.0, -1.0, 2.0],
    [-1.5, 2.7, 3.3, -0.8]
]

weights = [
    [0.2, 0.8, -0.5, 1.0],
    [0.5, -0.91, 0.26, -0.5],
    [-0.26, -0.27, 0.17, 0.87]
]

biases = [2.0, 3.0, 0.5]

We have a **matrix** of inputs and a **matrix** of weights now, and we need to perform the **dot product** on them somehow, but how? 

In this example, we need to manage both **matrices** as lists of vectors and perform dot products on all of them in all combinations, resulting in a list of lists of outputs, or a matrix; this operation is called the **matrix product**.

### Matrix Product

The **matrix product**: 
- an operation with **2 matrices** 
- we are performing **dot products** of all combinations of **rows from the first matrix** and the **columns of the 2nd matrix**
- result in a matrix of those atomic dot products
- the size of the 2nd dimension of the left matrix must match the size of the 1st dimension of the right matrix

In [16]:
Image(url='https://drive.google.com/uc?id=1IXQkoZKctjqavIKl7S_CiJkpVE_eD5m-')

### Transposition for the Matrix Product

**Transposition** simply modifies a matrix in a way that its rows become columns and columns become rows.

![](https://drive.google.com/uc?id=1g__BDqNR2iw2IGrW5hoRZTXtKiIcYofj)

### Behind the scenes

#### row-vector-matrix

A **row-vector-matrix** is a matrix whose first dimension’s size equals 1 and the
second dimension’s size equals `n` - the vector size. 

In [17]:
# Note the double brackets:
a = np.array(
    [
        [1, 2, 3]
    ]
)

a

array([[1, 2, 3]])

In [18]:
a.shape

(1, 3)

In [19]:
# Or, alternatively:
a = np.array(
    [1, 2, 3]
)

np.expand_dims(a, axis=0)

array([[1, 2, 3]])

#### column-vector-matrix

A **column-vector-matrix** is a matrix where the 2nd dimension’s size equals 1, in other words, it’s an array of shape `(n, 1)`

In [20]:
a = np.array([[1, 2, 3]]).T

a

array([[1],
       [2],
       [3]])

In [21]:
a.shape

(3, 1)

#### Dot product of row-vector-matrix and column-vector-matrix

In [22]:
a = np.array([[1, 2, 3]])
b = np.array([[2, 3, 4]]).T

np.dot(a, b)

array([[20]])

> We have achieved the same result as the dot product of two vectors, but performed on matrices and returning a matrix.

It’s worth mentioning that NumPy does not have a dedicated method for performing matrix product - the dot product and
matrix product are both implemented in a single method: `np.dot()`

## Final Code - Single Layer with Input Batch

![](https://drive.google.com/uc?id=1uRg9EgXNh1hhUvMk6IXZaoVxkSL8C7p4)

In [23]:
Image(url='https://drive.google.com/uc?id=1FbELp4cIr94IcgrdQVqX9s6CZ4BmvkPd')

In [24]:
Image(url='https://drive.google.com/uc?id=1U6Lr6JmtFZfVtDIDxSNWOV6rjToRtTjq')

In [25]:
inputs = [
    [1.0, 2.0, 3.0, 2.5],
    [2.0, 5.0, -1.0, 2.0],
    [-1.5, 2.7, 3.3, -0.8]
]

weights = [
    [0.2, 0.8, -0.5, 1.0],
    [0.5, -0.91, 0.26, -0.5],
    [-0.26, -0.27, 0.17, 0.87]
]

biases = [2.0, 3.0, 0.5]

In [26]:
dot_product = np.dot(
    inputs, 
    np.array(weights).T
)

dot_product

array([[ 2.8  , -1.79 ,  1.885],
       [ 6.9  , -4.81 , -0.3  ],
       [-0.59 , -1.949, -0.474]])

In [27]:
layer_outputs = dot_product + biases

layer_outputs

array([[ 4.8  ,  1.21 ,  2.385],
       [ 8.9  , -1.81 ,  0.2  ],
       [ 1.41 ,  1.051,  0.026]])

> **Important Note:** 
> 
> The 2nd argument for `np.dot()` is our transposed weights. Previously it was the inputs. 
> 
> As we’ll soon learn, it’s more useful to have a result consisting of a list of layer **outputs per each sample** than **outputs per each neuron**.
>
> We want the resulting array to be **sample-related** and not **neuron-related**. We want it this way being that, as we pass those samples further through the network, the next layer(s) will expect a batch of inputs.

