# Week 1: Matrices & Vectors 

# 1. Linear Algebra

## 1.1. What is linear algebra?

### 1.1.1. High level idea

"<b>Algebra</b>" means "relationships" and "<b>linear</b>" means "line-like", therefore linear algebra is all about <b>line-like relationships</b>.  

### 1.1.2. What is the difference between 'linear' and 'non-linear' alegbra?

Linear relationships are <b>straight</b> whereas non-linear relationships are <b>not straight</b>:

<img src="../Images\linearNonlinear.png" width=60%>

Linear relationships are <b>predictable</b> whereas non-linear relationships are <b>not predictable</b>:

1. A pyramidal roof is linear: move forward 3 horizontal feet (relative to the ground) and you might rise 1 foot in elevation (i.e. the slope! Rise/run = 1/3). Move forward 6 feet, and you’d expect a rise of 2 feet. 


2. A dome is non-linear: each horizontal foot forward raises you a different amount.


3. The benefit of (1) is predictability.  For instance, if we measure the relationship between feet forward ($x$) and raise upward ($y$):

 a. If 3 feet forward has a 1-foot rise, then going 10x as far forward ($10\cdot x$) should give a 10x higher rise ($10\cdot y$), i.e. 30 feet forward is a 10-foot rise.
 
 b. If 3 feet forward has a 1-foot rise, and 6 feet forward ($6\cdot x$) has a 2-foot rise ($2\cdot y$), then (3 + 6) feet should have a (1 + 2) foot rise.
 
 c. In math terms, an operation $F$ is linear if <b>scaling inputs scales the output, and adding inputs adds the outputs</b>:

\begin{align*}
F(ax) &= a \cdot F(x) \\
F(x + y) &= F(x) + F(y)
\end{align*}

4. In our example, $F(x) = y$ calculates the rise when moving forward $x$ feet, and the properties hold:

\begin{align*}
F(10\cdot 3) &= 10\cdot F(3) = 10
\end{align*}


\begin{align*}
F(3 + 6) &= F(3) + F(6) = 3
\end{align*}

### 1.1.3. So where do matrices and vectors come in?

Matrices and vectors are structures representing the various inputs, operaitons and outputs for linear functions.  These structures allow us to:

1. Store data;


2. Track data; and


3. Tranform data.

### 1.1.4. A very simple example of how matrices can be used to calculate linear functions

Assume we have:

1. <b>Input data:</b> 2 x stock portfolios, one for each of Alice and Bob, with their total holdings for shares in Apple, Google and Microsoft, i.e.

Name | Apple | Google | Microsoft
--- | --- | --- | ---
Alice | 1000 | 1000 | 1000
Bob | 500 | 2000 | 500

<br>
<center>Presented as a matrix:</center>
<br>

<p style='text-align: center;'> 
$I = \begin{bmatrix}
1000 & 1000 & 1000\\500 & 2000 & 500
\end{bmatrix}$
</p>


2. <b>Operations:</b> the changes in company values for each of Apple (a 20% increase), Google (a 5% decrease) and Microsoft (no change) after a news event;

Company | Increase | Decrease | No Change
--- | --- | --- | ---
Apple | 1.2 | 0 | 0
Google | 0 | 0.95 | 0
Microsoft | 0 | 0 | 1
Overall Profit | .20 | -0.05 | 0

<br>
<center>Presented as a matrix:</center>
<br>

<p style='text-align: center;'> 
$O = \begin{bmatrix}
1.2 & 0 & 0\\0 & 0.95 & 0\\ 0 & 0 & 1\\0.20 & -0.05 & 0
\end{bmatrix}$
</p>

3. <b>Output:</b> updated portfolios for each of Alice and Bob, which will include a bonus output - the net profit / loss from the event.


4. To calculate (3), we need to multiply the operations ($O$) by the inputs ($I$).  As $O$ is 4 x 3 dimensional and $I$ is 2 x 3 dimensional we cannot multiply "as is". This is because the number of columns of matrix $a$ must match the number of rows of matrix $b$.  


5. To match up $O$ and $I$ to enable matrix multiplication we can <b>transpose</b> $I$, i.e. swap its rows and columns so that rows become columns and columns become rows.  The transposition of $I$ is denoted by the $^{T}$:

<p style='text-align: center;'> 
$I = \begin{bmatrix}
1000 & 1000 & 1000\\500 & 2000 & 500
\end{bmatrix}^{T}$
</p>

<br>
<center>becomes</center>
<br>

<p style='text-align: center;'> 
$I = \begin{bmatrix}
1000 & 500\\ 1000 & 2000\\ 1000 & 500
\end{bmatrix}$
</p>



6. Therefore, this now means we can complete the calculation described in (4) to achieve the outcome described in (3):

<p style='text-align: center;'> 
$\begin{equation}
    \begin{bmatrix}
    1.2 & 0 & 0\\0 & 0.95 & 0\\ 0 & 0 & 1\\0.20 & -0.05 & 0
    \end{bmatrix}
    \cdot
   \begin{bmatrix}
   1000 & 500\\ 1000 & 2000\\ 1000 & 500
   \end{bmatrix}
    =
    \begin{bmatrix}
1200 & 600\\ 950 & 1900\\ 1000 & 500\\ 150 & 0
\end{bmatrix}
\end{equation}$ 
</p>


6. Which in tabular form can be presented as:

Name | Apple | Google | Microsoft | Profit / Loss
--- | --- | --- | --- | ---
Alice | 1200 | 950 | 1000 | 150
Bob | 600 | 1900 | 500 | 0

### 1.1.3. High level relationship with machine learning

For machine learning the key takeaway is this:

1. linear algebra gives you mini-spreadsheets to represent your maths equations, including their inputs, operations and outputs.

    and


2. in turn this enables you to manipulate large groups of data. 

### 1.1.4. Useful Resources

A great intuition regarding general linear algebra in more detail can be found [here](https://betterexplained.com/articles/linear-algebra-guide/).

## 1.2. How is linear algebra relevant to machine learning?

As explained above, linear algebra allows us to store, track and transform data in turn allowing us to perform complex equations with data simultaneously and efficiently. The principal tools most relevant to machine learning are:

  <img src="../Images\scalarVectorMatrix.png" width=60%>

<b>Vectors</b> and <b>Matrices</b> (as explored in this notebook) are used to represent data, including:

1. input variables $x$;

    
2. output variables $y$; 

    
3. weights (aka parameters and coefficients) $\theta$; and

    
4. operations, e.g. addition ($x + y$), subtraction ($x - y$), multiplcation ($x x y$), division ($x / y$), inversion ($x^{-1}$), transposition ($x^{T}$) and so on).

And in doing so, allow complex computations to be performed simultaneously and efficiently.

# 2. Matrices

## 2.1. What are they?

A matrix:

1. Is an ordered <b>array of numbers</b>, e.g. $\left\lgroup \matrix{1 & 2\cr 3 & 4} \right\rgroup$; 

    and
    
    
2. Has <b>two indices</b>, the first one points to a <b>row</b> and the second points to a <b>column</b>.  See below notation regarding how to express the correct indices.

## 2.2. Notation

Matrices notation is as follows:

1. Matrices are described by their dimensions: number of rows x number of columns, i.e. R<sup>[rows x columns]</sup>.  This is often represented in terms of $m$ rows and $n$ columns.  For example, R<sup>4 x 2</sup> means a matrix with x 4 rows and x 2 columns, e.g. 

<p style='text-align: center;'> 
$\begin{bmatrix}
1 & 2\\3 & 4\\ 5 & 6 \\ 7 & 8
\end{bmatrix}$
</p>


2. Matrix elements are identified like this: $A_\text{(i, j)}$, where:

  (a) i = the i<sup>th</sup> <b>row</b>; and

  (b) j = the j<sup>th</sup> <b>column</b>.

  e.g. the below, which is an R<sup>4 x 2</sup> matrix:

  <p align = "center">
  <img src="../Images\matrix1.png" width=60%>
  </p>

3. Matrices usually denoted by an <b>uppercase</b> letter like in the above image.  When referring to a matrix of features we used the capital letter, e.g. $A$ in the above example, and when referring to a particular feature within the matrix we either identify it specifically, e.g. <b>A<sub>1, 1</sub></b>, or generally with the lower case letter, e.g. $a$ in the above example.


4. $A^{T}$ indicates the <b>transpose</b> of the matrix (explained in more detail below).


5. $A^{-1}$ indicates the <b>inverse</b> of a matrix (explained in more detail below).

# 3. Vectors

## 3.1. What are they?

A vector:

1. Is an ordered array of numbers and can be either: 

    (a) a column, in which case it is a $n \text{ x } 1$ matrix because it always has $1$ column and some number ($n$) of rows:
    
    <p style='text-align: center;'> 
    $\begin{bmatrix}
    1 \\2 \\ 3
    \end{bmatrix}$
    </p>
    
    <br><center>or</center>
    
    (b) a row, in which case it is a $1 \text{ x } n$ matrix becauseit always has $1$ row and some number ($n$) of columns:
    
    <p style='text-align: center;'> 
    $\begin{bmatrix}
    1 & 2 & 3
    \end{bmatrix}$
    </p>
    
2. Is <b>Single indexed</b>, i.e. because of the <b>one dimensionality</b>.

## 3.2. Notation

1. Vector elements are identified like this: $v_{(i)}$, where $i$ = the $i^{th}$ <b>row</b> (if a column vector) or <b>column</b> (if a row vector).


2. Some vectors are <b>1-indexed</b> (i.e. index starts at 1), but others are <b>0-indexed</b> (i.e. index starts at 0).


3. Unlike matrices, vectors usually denoted by an <b>lowercase</b> letter.

## 3.3. Useful Resources

- https://study.com/academy/lesson/difference-between-a-row-column-vector.html

# 4. Operations

## 4.1. Intro

Matrices and vectors can be added, subtracted, multiplied, divided, inverted and transposed.  I've not made many of my own notes as other people have detailed and diagrammed this already in concise but comprehensive fashion - see links where referenced.

## 4.1. Addition, Subtraction, Negatives, Division, Multiplication (Constant) & Transposition

See [here](https://www.mathsisfun.com/algebra/matrix-introduction.html).

## 4.2. Multiplication (of Matrices)

### 4.2.1. Generally

See [here](https://www.mathsisfun.com/algebra/matrix-multiplying.html).  The two most important rules are:

### 4.2.2. The most important point

1. The number of <b>columns of the 1st matrix must equal the number of rows of the 2nd matrix</b>.


    and
    

2. The result will have:

    (a) the <b>same number of rows as the 1st matrix</b>; and 
    
    (b) the <b>same number of columns as the 2nd matrix</b>.

In other words, <b>dimensionality</b> of the underlying matrices matters! See further below.

### 4.2.3. Order of Multiplication

In arithmetic multiplication is <b>"Commutative"</b>, that is to say: 3 x 5 = 5 x 3.  This is not generally true of matrices where: AB ≠ BA. In other words, changing the order of multiplication usually changes the result:

<p align = "center">
  <img src="../Images\orderOfMultiplication.png" width=40%>
  </p>

### 4.2.4. Identity Matrix

The <b>"Identity Matrix"</b> is the <b>matrix equivalent</b> of the number $1$.  Concretely this the identity matrix has the following properties:

1. It is "square", i.e. has same number of rows as columns).


2. It can be large or small, i.e. 2×2, 100×100, ... whatever).


3. It has 1s on the diagonal and 0s everywhere else:

  <p align = "center">
  <img src="../Images\identityMatrix.gif" width=20%>
  </p>


4. Its symbol is the capital letter $I$.

## 4.3. Inversion

See [here](https://www.mathsisfun.com/algebra/matrix-inverse.html).


## 4.4. Order of Operations

1. Multiplication
2. Division
3. Addition
4. Subtraction

## 4.5. Useful Resources

- https://www.mathsisfun.com/algebra/systems-linear-equations.html

- https://www.mathsisfun.com/algebra/linear-equations.html

- https://www.mathsisfun.com/algebra/matrix-introduction.html

# 5. Dimensionality is Paramount!

## 5.1. Why is dimensionality important?

Dimensionality is <b>paramount</b> because it determines both: 

1. Whether two matrices can be multipled together "as is", i.e. as they are currently organised, either with features as columns and samples as rows or vice versa.


2. If the answer to (1) is "no", the existing dimensionality will inform how we go about fixing the dimensionality to enable multiplication of those two matrices.

3. However, where dimensionality permits multiplication this does not necessarily mean it produces the correct results for our hypothesis.  E.g. see the below examples as to where this can go wrong!

See below regarding the different options for univariate and multivariate linear regression and distinction between multiplying a single sample $x^{(i)}$ by $\theta$ values or the entire matrix $X$ of all samples by $\theta$.

## 5.2. Dimensionality Options

### 5.2.1. In General

Features and samples can be arranged in two different dimensions:
    
  <p align = "center">
  <img src="../Images\featuresAsColumnsOrRows.png" width=80%>
  </p>

### 5.2.2. Conventions used in Andrew Ng's Introduction to Machine Learning course

The Andrew Ng course, Introduction to Machine Learning, when multiplying a single sample of features by corresponding $\theta$ values, favours matrices with <b>features arranged as rows</b> and <b>samples arranged as columns</b>.  

However, it's very easy to become confused for several reasons.  These are as follows.

#### Confusion 1 - Table Data (features as columns) vs. Sample Vector (features as rows)

The Andrew Ng course routinely introduces datasets via tabular presentations for the examples.  E.g. the below with <b>features as columns</b> and <b>samples as rows</b>:

  <p align = "center">
  <img src="../Images\multivariateNotation.png" width=80%>
  </p>
  
However, when computing the hypothesis for a single sample, the sample data for $x^{(i)}$ is organised with <b>features as rows</b> in a single column vector.  For instance, the sample $x^{1}$ is arranged as an $(n + 1)$ x $1$ column vector like so:

<p style='text-align: center;'> 
$x^{1} = \begin{bmatrix}
1 \\2104 \\5 \\ 1 \\45
\end{bmatrix}$
</p>

And likewise, the $\theta$ values is arranged as an $(n + 1)$ x $1$ column vector like so:

<p style='text-align: center;'> 
$\theta = \begin{bmatrix}
1 \\2 \\3 \\ 4 \\5
\end{bmatrix}$
</p>

This caused me confusion because I intuitively began representing samples as rows and features as columns, probably because this logically follows the form of the table layout.  Doing so means it's very easy to then get dimensionality mixed up, and in turn the solutions to fix dimensionality problems are likewise confused.

#### Confusion 2 - numpy.matrix (features as columns) vs. Andrew Ng Sample Vector (features as rows)

In various python implementations of the coding exercises, including the one I've adapted, numpy matrices are often used to organise the input variables $x$ and the $\theta$ values.  However, it took me some time to appreciate that used this way, numpy matrices organise features as columns and samples as rows vs. the Andrew Ng course's orientation, i.e. samplees as columns and features as rows.  For instance:

1. Create a numpy.matrix of the X values like so: `X = np.matrix(X.values)`


2. Doing so organises the samples into rows and features into columns like so: `X[1]` returns `matrix([[1.    , 2104    , 5    , 1    , 45]])`


3. The same applies if the $\theta$ values are organised in a numpy matrix, e.g. `matrix([[1.    , 2    , 3    , 4    , 5]])`

This caused confusion when jumping between my python implementations and Andrew Ng's course materials, including the MatLab/OCTAVE coding exercises.  The key difference being that if using numpy matrices per the above, it's necessary to <b>transpose</b> the $\theta$ row vector into a column vector to ensure correct dimensionality and allow matrix multiplication.

## 5.2. Univariate Linear Regression

### 5.2.1. The Hypothesis

$h_\theta(x) = \theta_0 + \theta_1x_1$

### 5.2.2. The Data & Theta Values

$\theta_0 = -40$ and $\theta_1 = 0.25$.

<img src="../Images\univariateLinRegData.png" width=40%>

### 5.2.3. Single Sample

#### How the data and $\theta$ values are arranged

In Andrew Ng's course, a single sample for univariate linear regression is presented as a $1$ x $1$ dimensional column vector:

<p style='text-align: center;'> 
$x^{1} = \begin{bmatrix}
2104
\end{bmatrix}$
</p>

Whereas the $\theta$ vector is presented as an $(n + 1)$ x $1$ column vector:

<p style='text-align: center;'> 
$\theta = \begin{bmatrix}
-40 \\ 0.25
\end{bmatrix}$
</p>

#### Multiplication of a single sample by $\theta$ (Andrew Ng style)

"As is" we could multiply $\theta$ by $x^{1}$, i.e. because the number of columns of $\theta$ (1) would match the number of rows of $x^{1}$ (also 1).  However, this would arrive at the incorrect result.  For instance:

<p style='text-align: center;'> 
$\begin{equation}
    \begin{bmatrix}
    -40 \\ 0.25
    \end{bmatrix}
    \cdot
   \begin{bmatrix}
   2104
   \end{bmatrix}
    =
    \begin{bmatrix}
    -84160 \\ 526
    \end{bmatrix}
\end{equation}$ 
</p>
Which provides a different result to:

<p style='text-align: center;'>
$h_\theta(x) = -40 + 0.25 \cdot 2104 = -40 + 526 = 486$
</p>

Instead, we need to do the following:

1. Add a <b>bias unit</b>, $x_0 = 0$, to every sample $x^{(i)}$.  This transforms:

    a. the hypothesis from $h_\theta(x) = \theta_0 + \theta_1x_1$ into $h_\theta(x) = \theta_0x_0 + \theta_1x_1$; and
    
    b. the $x^{(i)}$ matrix from a $1$ x $1$ matrix to a $(n + 1)$ x $1$  matrix, i.e.

<p style='text-align: center;'> 
$x^{1} = \begin{bmatrix}
2104
\end{bmatrix}$
becomes
$x^{1} = \begin{bmatrix}
1 \\ 2104
\end{bmatrix}$
</p>

    and
   

2. Transpose the $\theta$ matrix to transform it from am $(n + 1)$ x $1$ matrix to a $1$ x $(n + 1)$ matrix, i.e.

<p style='text-align: center;'> 
$\theta = \begin{bmatrix}
-40\\ 0.25
\end{bmatrix}$
becomes
$\theta^{T} = \begin{bmatrix}
-40 & 0.25
\end{bmatrix}$
</p>

3.  Therefore we can rewrite the equation $\theta^{T}x^{(i)}$:

<p style='text-align: center;'> 
$\begin{equation}
    \begin{bmatrix}
    -40 & 0.25
    \end{bmatrix}
    \cdot
   \begin{bmatrix}
   1 \\ 2104
   \end{bmatrix}
    =
    \begin{bmatrix}
    486
    \end{bmatrix}
\end{equation}$ 
</p>

Which provides the same result as $h_\theta(x) = -40 + 0.25 \cdot 2104 = -40 + 526 = 486$!

#### Multiplication of sample by $\theta$ (numpy matrices)

Unlike the above, using numpy matrices has the effect of ogranising features / $\theta$ values as columns and samples as rows.  For instance, using numpy matrices the above $x^{1}$ and $\theta$ values are presented as follows, where $x^{1}$ is $1$ x $1$ dimensional and $\theta$ is $1$ x $(n + 1)$ dimensional:

<p style='text-align: center;'> 
$x^{1} = \begin{bmatrix}
2104
\end{bmatrix}$
</p>

<p style='text-align: center;'> 
$\theta = \begin{bmatrix}
-40 & 0.25
\end{bmatrix}$
</p>

Although the number of columns of $x^{1}$ (1) already match the number of rows of $\theta$ (also 1), mulitplying $x^{1}$ by $\theta$ produces an incorrect result:

<p style='text-align: center;'> 
$\begin{equation}
    \begin{bmatrix}
    2104
    \end{bmatrix}
    \cdot
   \begin{bmatrix}
   -40 & 0.25
   \end{bmatrix}
    =
    \begin{bmatrix}
    -84160 & 526
    \end{bmatrix}
\end{equation}$ 
</p>

So to fix the issue we do the following:

1. Add a <b>bias unit</b>, $x_0 = 0$, to every sample $x^{(i)}$.  This transforms the $x^{(i)}$ matrix from a $1 x 1$ matrix to a $1$ x $(n + 1)$ matrix, i.e.

<p style='text-align: center;'> 
$x^{1} = \begin{bmatrix}
2104
\end{bmatrix}$
becomes
$x^{1} = \begin{bmatrix}
1 & 2104
\end{bmatrix}$
</p>

    and
   

2. Transpose the $\theta$ matrix to transform it from a $1$ x $(n + 1)$ matrix to an $(n + 1)$ x $1$ matrix, i.e.

<p style='text-align: center;'> 
$\theta = \begin{bmatrix}
-40 & 0.25
\end{bmatrix}$
becomes
$\theta^{T} = \begin{bmatrix}
-40 \\ 0.25
\end{bmatrix}$
</p>

3.  Therefore, multiplying $x^{1}$ by $\theta^{T}$:

<p style='text-align: center;'> 
$\begin{equation}
    \begin{bmatrix}
    1 & 2104
    \end{bmatrix}
    \cdot
    \begin{bmatrix}
    -40 \\ 0.25
    \end{bmatrix}
    =
    \begin{bmatrix}
    486
    \end{bmatrix}
\end{equation}$ 
</p>

4. Note multiplying $\theta^{T}$ by $x^{1}$ returns an incorrect result despite the dimensionality matching.

### 5.2.3. All Samples

#### How the data and $\theta$ values are arranged

In Andrew Ng's course, an entire dataset of input variables $x$ is represented in the upper case, i.e. $X$.  This represents a matrix of $m$ x $n$ dimensions, which for the above dataset $X$ is $4$ x $1$:

<p style='text-align: center;'> 
$X = \begin{bmatrix}
2104 \\1416 \\ 1534 \\852
\end{bmatrix}$
</p>

Whereas the $\theta$ vector is presented as an $(n + 1)$ x $1$ column vector, which is $2$ x $1$ for the above scenario:

<p style='text-align: center;'> 
$\theta = \begin{bmatrix}
-40 \\ 0.25
\end{bmatrix}$
</p>

"As is" we cannot multiply $X$ x $\theta$ or $\theta$ x $X$ because in either scenario the columns of matrix A don't match the rows of matrix B. 

As above, the solution is to add a <b>bias unit</b>, $x_0 = 0$, to every sample in $X$.  This transforms the $X$ matrix from a $4$ x $1$ matrix to a $4$ x $2$ matrix, i.e.

<p style='text-align: center;'> 
$X = \begin{bmatrix}
2104 \\1416 \\ 1534 \\852
\end{bmatrix}$
becomes
$X = \begin{bmatrix}
1 & 2104\\ 1 & 1416\\ 1 & 1534\\ 1 & 852
\end{bmatrix}$
</p>

Therefore, the number of columns of $X$ (i.e. 2) now matches the number of rows of $\theta$ (i.e. also 2), meaning we can do the multiplication $X$ x $\theta$ like so:

<p style='text-align: center;'> 
$\begin{equation}
    \begin{bmatrix}
    1 & 2104\\ 1 & 1416\\ 1 & 1534\\ 1 & 852
    \end{bmatrix}
    \cdot
   \begin{bmatrix}
   -40 \\ 0.25
   \end{bmatrix}
    =
    \begin{bmatrix}
    486 \\ 314 \\ 343.5 \\ 173
    \end{bmatrix}
\end{equation}$ 
</p>


## 5.3. Multivariate Linear Regression

### 5.3.1. The hypothesis

$h_\theta(x) = \theta_0 + \theta_1x_1$ + $\theta_2x_2$

### 5.3.2. The Data & Theta Values
$\theta_0$ = -40

$\theta_1$ = 0.25

$\theta_2$ = 0.25

<img src="../Images\multiVariateData.png" width=40%>



### 5.3.2. Single Sample

Same as above for univariate linear regression, i.e. add the bias unit $x_0$ to each sample $x^{(i)}$ and transpose $\theta$ to allow us to multiply $\theta^{T}$ by $x^{1}$ as follows:

<p style='text-align: center;'> 
$\begin{equation}
    \begin{bmatrix}
    -40 & 0.25 & 0.25
    \end{bmatrix}
    \cdot
   \begin{bmatrix}
   1 \\ 2104 \\ 5
   \end{bmatrix}
    =
    \begin{bmatrix}
    487.5
    \end{bmatrix}
\end{equation}$ 
</p>

Which provides the same result as $h_\theta(x) = -40 + 0.25 \cdot 2104 + 0.25 \cdot 5 = -40 + 526 + 1.25 = 487.25$!

### 5.3.2. All Samples

Same as above for univariate linear regression, i.e. add the bias unit $x_0$ to each sample $x^{(i)}$ to allow us to multiply $X$ by $\theta$ as follows:

<p style='text-align: center;'> 
$\begin{equation}
    \begin{bmatrix}
    1 & 2104 & 5\\ 1 & 1416 & 3\\ 1 & 1534 & 3\\ 1 & 852 & 2
    \end{bmatrix}
    \cdot
   \begin{bmatrix}
   -40 \\ 0.25 \\ 0.25
   \end{bmatrix}
    =
    \begin{bmatrix}
    487.5 \\ 314.75 \\ 344.25 \\ 173.5
    \end{bmatrix}
\end{equation}$ 
</p>

# 6. Useful Resources

- https://machinelearningmastery.com/matrix-operations-for-machine-learning/
- https://machinelearningmastery.com/introduction-matrices-machine-learning/
- https://towardsdatascience.com/linear-algebra-cheat-sheet-for-deep-learning-cd67aba4526c
- https://betterexplained.com/articles/linear-algebra-guide/
- https://blog.stata.com/2011/03/03/understanding-matrices-intuitively-part-1/
- https://towardsdatascience.com/linear-algebra-for-deep-learning-f21d7e7d7f23
- https://medium.com/from-the-scratch/deep-learning-deep-guide-for-all-your-matrix-dimensions-and-calculations-415012de1568
- https://www.mathsisfun.com/algebra/matrix-introduction.html