# Mathematics for Machine Learning

Compressed material based on **"Mathematics for Machine Learning"** by Marc Peter Deisenroth, A Aldo Faisal, and Cheng Soon Ong. To be published by Cambridge University Press. https://mml-book.com

📌 **Additional resources:**

- [3Blue1Brown](https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw)
- [MathTheBeautiful](https://www.youtube.com/channel/UCr22xikWUK2yUW4YxOKXclQ)

**Machine learning** is about designing and training systems to extract valuable information from data and, make predictions and generalize well on unseen data. There are three concepts at the core of machine learning: **data, a model, and learning**. Learning can be understood as a process to automatically find patterns and structure in data by optimizing the parameters of the model.


The main concepts in the MML book:
- We represent data as vectors (${\rm I\!R}$<sup>n</sup> – n-dimensional vector space of real numbers)
- We choose an appropriate model, either using the probabilistic or optimization view
- We learn from available data by using numerical optimization methods with the aim that the model performs well on data not used for training


# Linear Algebra

Linear algebra is the study of vectors and rules to manipulate vectors.

In general, vectors are objects that can be **added together** and **multiplied by scalars** to produce another object of the same kind. Any object that satisfies these two properties can be considered a vector (e.g. geometric vectory, polynomials, audio signals, elements of ${\rm I\!R}$<sup>n</sup>).


&nbsp;

📌 **YouTube Video (~10min)** 

### Vectors, what even are they? | Essence of linear algebra, chapter 1
3BLUE1BROWN SERIES  S1 • E1 [[Watch video]](https://www.youtube.com/watch?v=fNk_zzaMoSs)

[<p align="left"><img src="Images/vectors.png" width="400"></p>](https://www.youtube.com/watch?v=fNk_zzaMoSs)

*Image source: 3Blue1Brown*

### Geometric vectors

Geometric vectors (see Figure a) are directed segments, which can be drawn (at least in two dimensions). Two geometric vectors $\vec{x}$, $\vec{y}$ can be added, such that $\vec{x}$+$\vec{y}$ = $\vec{z}$ is another geometric vector. Furthermore, multiplication by a scalar λ $\vec{x}$, λ ∈ ${\rm I\!R}$, is also a geometric vector. In fact, it is the original vector scaled by λ.

<p align="left"><img src="Images/geom_vectors.png" width="250"></p>

*Image source: Deisenroth et al. (2019)*

&nbsp;

📌 **YouTube Video (~10min)** 

### Linear Algebra 2d: Addition of Geometric Vectors
Geometric vectors are directed segments. **MathTheBeautiful [[Watch video]](https://www.youtube.com/watch?v=kMP-Q0PF_7g&list=PLlXfTHzgMRUKXD88IdzS14F4NxAZudSmv&index=7&t=0s)**

### Elements of ${\rm I\!R}$<sup>n</sup> (tuples of n real numbers)

${\rm I\!R}$<sup>n</sup> is the concept we focus on. For instance,
1
a = 2 ∈ R3 (2.1)
3
is an example of a triplet of numbers. Adding two vectors a, b ∈ Rn component-wise results in another vector: a + b = c ∈ Rn. Moreover, multiplying a ∈ Rn by λ ∈ R results in a scaled vector λa ∈ Rn. Considering vectors as elements of Rn has an additional benefit that it loosely corresponds to arrays of real numbers on a computer. Many programming languages support array operations, which allow for con- venient implementation of algorithms that involve vector operations.

<p align="left"><img src="Images/vector.png" width="150"></p>

*Image source: Deisenroth et al. (2019)*

&nbsp;

One major idea in mathematics is the idea of "closure". In the case of vectors: **What is the set of vectors that can result by starting with a small set of vectors, and adding them to each other and scaling them?** This results in a vector space.

<p align="left"><img src="Images/systems_lin_eq.png" width="700"></p>

*Image source: Deisenroth et al. (2019)*

### System of linear Equation

Many problems can be formulated as a system of linear equation. for e.g.
$$ 4x_1 + 4x_2 = 5, \\
2x_1 + 4x_2 = 1 \tag{1}$$
where, $x_1$ and $x_2$ can be any 2 variables.

we can also, convert our system of linear equality into matrix form which enables us to leverage the tools of provided by linear algebra.

$$ 
\begin{bmatrix}
4 & 4 \\
2 & 4 \\
\end{bmatrix}
\begin{bmatrix}
 x_1 \\
 x_2 \\
\end{bmatrix}
=
\begin{bmatrix}
 5 \\
 1 \\
\end{bmatrix}
$$

Geometrically, these 2 equation represents 2 lines in $x_1,x_2$ plane respectively and the solution is a unique point,shown as a black **dot**:-

<div align="center" text-align="center">
<img src="Images/linear_eqn_2d.png" width="300">
    <i>fig 1. source: MML Book</i>
</div>


Since a solution to a system of linear equations must satisfy all equations simultaneously, the solution set is the intersection of these lines. In the case of 2 variables, intersection set can be:
*  an entire line (if the linear equations are identical ), 
*  a unique point , 
*  or empty (when the lines are parallel). 
    
<div align="center" text-align="center">
<img src="Images/2_var_sols.png" width="600">
    <i> source: linear algebra and applications by gilbert strang.</i>
</div>

we can easily, extend these observations for $m$ number of equations having $n$ variables  like,  if we have $n = m = 3$ then, the possible solutions are going to be:-
    
<div align="center" text-align="center">
<img src="Images/3_var_sols.png" width="600" >
    <i>source: linear algebra and applications by gilbert strang.</i>
</div>

for example, let there be 2 equation and 3 unkowns/variables:-

$$ 
\begin{bmatrix}4 & 4 & 4 \\1 & 4 & 2 \\\end{bmatrix} 
\begin{bmatrix}x_1 \\ x_2 \\ x_3 \\ \end{bmatrix} 
= 
\begin{bmatrix} 5 \\ 1 \\ \end{bmatrix}
$$

geometrically, both the equation represents a plane and the solution to this system is going to be an entire line which is show below.

<div align="center" text-align="center">
    <img src="Images/linear_eqn_3d.png" width="400">
    <i>source:<a href="https://www.geogebra.org/3d">geogebra</a></i>
</div>


## Matrices

Matrices play a central role in linear algebra. They can be used to com-pactly represent systems of linear equations, but they also represent linearfunctions (linear mappings) as we will see later.

Defination : matrix $A$ is a m by n-tuple of elements $a_{ij}$  $\quad i=1,...,m, j= 1,...,n$ which is ordered according to a rectangular scheme consisting of $m$ rows and $n$ columns:-

$$
A_{m,n} = 
\begin{bmatrix}
a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\
a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
\vdots  & \vdots  & \ddots & \vdots  \\
a_{m,1} & a_{m,2} & \cdots & a_{m,n} 
\end{bmatrix}
$$


* **Addition**:-
    $$
    A + B := 
    \begin{bmatrix}
    a_{1,1} + b_{1,1} &  a_{1,2} + b_{1,2} & \cdots & a_{1,n} + b_{1,n} \\
    a_{2,1} + b_{2,1} &  a_{2,2} + b_{2,2} & \cdots & a_{2,n} + b_{2,n} \\
    \vdots  & \vdots  & \ddots & \vdots  \\
    a_{m,1} + b_{m, 1} & a_{m,2} + b_{m,2} & \cdots & a_{m,n} + b_{m,n}
    \end{bmatrix}
    $$
    as we can see, adding 2 matrices is simply adding individual elements of matrix. but we need to make sure that the dimensions of 2 matrices must be equal.


* **Multiplication**:-

    Although, there are several kinds of multiplications but usually, when we talk about matrix multiplication we generally, refer to Dot product.

    * Dot Product:-

        In dot product, to compute a particular element $c_{ij}$, we multiply the elements of the $i$th row of $A$ matrix with the $j$th column of $B$ matrix and sum them up.

        mathematically,
        $$ c_{i,j} = \sum_{i=1}^{n}{a_{il}}{b_{lj}}, \qquad i=1....,m, \quad j=1,...,k.$$

        suppose we have a 3x2 matrix **A** of dim [3 x 2] and another matrix of dim [2 x 3], using what we have learned uptill now, we can calculate the dot product like this:-

        $$
        \begin{aligned}
        A \cdot B  :&= 
        \begin{bmatrix}
        a_{1,1} & a_{1,2} \\
        a_{2,1} & a_{2,2} \\
        a_{3,1} & a_{3,2} 
        \end{bmatrix}_{3 \times 2}
        \cdot
        \begin{bmatrix}
        b_{1,1} & b_{1,2} & b_{1,3} \\
        b_{2,1} & b_{2,2} & b_{2,3} \\
        \end{bmatrix}_{2 \times 3}\\
        &=
        \begin{bmatrix}
        a_{1,1} * b_{1,1} + a_{1,2} * b_{2,1} & a_{1,1} * b_{1,2} + a_{1,2} * b_{2,2} & a_{1,1} * b_{1,3} + a_{1,2} * b_{2,3} \\
        a_{2,1} * b_{1,1} + a_{2,2} * b_{2,1} & a_{1,1} * b_{1,2} + a_{2,2} * b_{2,2} & a_{2,1} * b_{1,3} + a_{2,2} * b_{2,3} \\
        a_{3,1} * b_{1,1} + a_{3,2} * b_{2,1} & a_{3,1} * b_{1,2} + a_{3,2} * b_{2,2} & a_{3,1} * b_{1,3} + a_{3,2} * b_{2,3} \\
        \end{bmatrix}_{3 \times 3}
        \\
         &= 
        \begin{bmatrix}
        c_{1,1} & c_{1,2} & c_{1,3} \\
        c_{2,1} & c_{2,2} & c_{2,3} \\
        c_{3,1} & c_{3,2} & c_{3,3} 
        \end{bmatrix}
        = C_{3 \times 3}
        \end{aligned}
        $$

        here, we have to make sure that the inner dimensions of $A$ must be equal to the outer dimension of $B$ in short, the neighbouring dimensions must match.

        this also means that,  $AB \neq BA$ if $m \neq n$ since because the neighbouring dimensions do not match.


In [29]:
import numpy as np

# generating 2 random matrix
A = np.random.randn(3, 2)
B = np.random.randn(3,2)

np.matmul(A,B.T) # we can also use np.dot


array([[-0.02201333,  0.23914072,  0.29974213],
       [-2.79514236,  0.9765006 ,  0.46384985],
       [-1.54254287,  4.02567552,  4.71653968]])


* hadamard multiplication:-
    
    Elementwise multiplication is kown as **hadamard multiplication**. which also means that, the dimensions of both the matrix must be equal.

In [30]:
hadamard_multiplication = A * B # here, the shape of both A and B are (3,2) respectively.

hadamard_multiplication
# Try playing around with the shape and see what happens ;)

array([[-0.13727103,  0.1152577 ],
       [ 1.83779452, -0.86129392],
       [ 3.11786829,  1.59867139]])


* Identity Matrix

    identity matrix is like matrix form of '1'in our real number intuition, in a sense that by multiplying a vector $M$ by $I$ must give $M$ like $2*1 = 2$.

    we define Identity matrix as the n x n-matrix contating 1 on the diagonal and 0 everywhere else.


$$
I_{n,n} := 
\begin{bmatrix}
1 & 0 & \cdots & 0 \\
0 & 1 & \cdots & 0 \\
\vdots  & \vdots  & \ddots & \vdots  \\
0 & 0 & \cdots & 1
\end{bmatrix} \in \mathbb{R}^{ (n \times n )}
$$


In [31]:
n = 10
np.eye(n) # creates nxn identity matrix

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])


### properties of matrices:

Associativity:
$$ \forall A \in \mathbb{R}^{m \times n}  ,B \in \mathbb{R}^{n \times p}, C \in \mathbb{R}^{p \times q} : (AB)C = A(BC)$$

Distributivity:
$$
\begin{aligned}
\forall A,B \in \mathbb{R}^{m \times n}  ,C \in \mathbb{R}^{n \times p}, D \in \mathbb{R}^{n \times p} : (A+B)C &= AC + BC \\
A(C+D) &= AC + AD
\end{aligned}
$$


Multiplication with identity matrix:

$$ 
\forall A \in \mathbb{R}^{m \times n} : I_mA = AI_n = A
$$

   Note that $I_m \neq I_n$   for $m \neq n.$