# Math for Machine Learning

## What is Machine Learning?

Arthur Aamuel, 1959, "Field of study that gives computers the abiity to learn without being explicitly programmed".

## The Machine Learning Pipeline in Mathematics

1. **Data Processing**
    * Uses linear algebra to format the data in a way algorithms can ingest.
2. **Feature Engineering and Selection**
    * Uses vectores and matrices to transform data to make it easy for algorithms to understand.
3. **Modeling**
    * Uses geometry, probability, norms, and statistics to define the problem in a way the algorithm can optimize.
4. **Optimization**
    * Uses vector calculus to interate until certain conditions are met. Then you choose the best model.


## Vectors

### Norm

A measure of distance. 

### Norm Properties

1. All distances are non-negative. $\Vert \vec{v} \Vert \geq 0$
2. Distances multiply with scalar multiplication. $\Vert a \vec{v} \Vert = \vert a \vert \cdot \Vert v \Vert$
3. *Triangle Inequality*. If I travel from $A$ to $B$ then $B$ to $C$, that is at least as far as going from $A$ to $C$. $\Vert \vec{v} + \vec{w} \Vert \leq \Vert \vec{v} \Vert + \Vert \vec{w} \Vert$. 
    * If $A$, $B$, and $C$ all lie on the same line, then $\Vert \vec{v} + \vec{w} \Vert = \Vert \vec{v} \Vert + \Vert \vec{w} \Vert$

    
### Types of Norms

For $\vec{v} = \begin{pmatrix} v_1\\ v_2\\ ...\\ v_n\end{pmatrix}$

1. **Euclidean Norm**. 
\begin{align*}
\Vert \vec{v} \Vert_2 &= \sqrt{v_1^2 + v_2^2 + ... + v_n^2} \\
                      &= \sqrt{\sum_{i=1}^{n} {v_i^2}}
\end{align*}
2. $L_p-\text{Norm}$ 
\begin{equation*}
\Vert \vec{v} \Vert_p = \Big( \sum_{i=1}^{n} \vert v_i \vert ^p \Big)^{1/p}
\end{equation*}
3. $L_1-\text{Norm}$
\begin{equation*}
\Vert \vec{v} \Vert_1 = \sum_{i=1}^{n} \vert v_i \vert 
\end{equation*}
Other names are TAXICAB METRIC, MANHATTAN NORM, ...
4. $L_\infty-\text{Norm}$
\begin{equation*}
\Vert \vec{v} \Vert_\infty =  \lim_{p \to \infty} \Vert \vec{v} \Vert_p = \lim_{p \to \infty} \Big( \sum_{i=1}^{n} \vert v_i \vert ^p \Big)^{1/p} 
\end{equation*}
5. $L_0-\text{Norm}$, which is not a norm, is the number of non-zero elements.

## Vectors in Code

* We have used LaTeX to write down mathematical equations.
* Let's use Python now

In [5]:
# this defines a row
v = [1, 2, 3]
w = [1, 1, 1]

# this defines a matrix
A = [[1, 2, 3], [-1, 0, 1], [1, 1, 1]]

In [6]:
v + w

[1, 2, 3, 1, 1, 1]

In [7]:
import numpy as np

np.array(v) + np.array(w)

array([2, 3, 4])

In [10]:
2*v

[1, 2, 3, 1, 2, 3]

In [11]:
2*np.array(v)

array([2, 4, 6])

In [12]:
# L_p-Norms

print(np.linalg.norm(v, ord=1))
print(np.linalg.norm(v, ord=2))
print(np.linalg.norm(v, ord=np.inf))

6.0
3.7416573867739413
3.0


## Matrices