# Linear Algebra

Linear Algebra is a branch of mathematics that is essential for understanding and working with many machine learning algorithms, especially deep learning algorithms. This chapter focuses on the topics within Linear Algebra that are relevant to machine learning and does not try to act as an exhaustive source of information of on the topic.

## 2.1 Scalars, Vectors, Matrics and Tensors

The principal mathemtical objects involved in linear algebra are:

1. **Scalars** - A scalar is a single number that can take up many different values (real values, integers etc)

2. **Vectors** - A vector is an array of numbers. An elemennt in the
array is identified using its index in the ordering of the array. We can think 
of vectors as identifying points in space, each element in the
vector giving the coordinates along a different axis. We say that a vector 
$x \in \mathbb{R}^{n}$ if $x$ has n elements.

TODO: Add picture of a vector

3. **Matrices** - A matrix is a 2-D array of numbers, so each
element is identified by two indices instead of just one. In general for a matrix
$A$ we say that $A \in \mathbb{R}^{m \times n}$. A row of a matrix is identified
by $A_{i, :}$ and similarly a columb by $A_{:, j}$ 

TODO: Add picture of a vector

4. **Tensors** - An array of numbers on a regular gird with a variable number of axes is known as a tensor. 
A tensor "A" is denoted by __A__. In general we tend to use tensors with 3 axes such that __A__ $\in \mathbb{R}^{m\times n \times k}$ and an element of this tesnor is dentoed using three coordinates.

Some important operations on matrices are:

The __Transpose__ of a matrix which is the mirror image of the matrix across a
diagonal line called the _main diagonal_, starting from the upper-left
corner and going down and to the right. It essentially puts the
element at position *(i, j)* of a matrix to position
*(j, i)* of the transpose. __Note__: the transpose of the transpose of a
matrix is the matrix itself.

TODO: Add picture of transpose.

The __Addition__ of two matrices, _A + B_ corresponds to the addition of two
matrices. This is only possible as long as the two matrices have
the same shape. This operation corresponds to adding elements
at position *(i, j)* of _A and B_ and assigning their sum to the
*(i, j)*-th element of the resultant matrix, let's say _C_.

__Adding or multiplying a scalar__ to a matrix corresponds to performing
the addition/multiplication by the scalar to each element of the matrix.

__Broadcasting__ is a less conventional method used in deep learning
wherein we allow the addition of a matrix to a vector yielding another
matrix. Lets say we perform $C = A + b$ where b is a column vector (a
vector with just one column). This corresponds to adding to each element
in $A_{i, :}$ the value at $b_{i}$. In a similar vain, if we were to
add a row vector to $A$ it would result in the addition of $b_{j}$
to every element of column $j$ of $A$.

TODO: Add image of broadcasting here


## 2.2 Multiplying Matrices and Vectors

__Matrix multiplication__ is the product of matrices $A$ and $B$ resulting in a matrix $C$.
In order for this product to be defined the number of columns in $A$ need to be the same as
the number of rows in $B$. If $A$ is of shape $m \times n$ and $B$ is of shape 
$n \times p$ then $C$ will be of shape $m \times p$. The notation for
matrix product is simply $C = AB$ where the product operation
is defined by $C_{i, j} = \sum_{k} A_{i, k} B_{k, j} $.

The __Hadamard product__ of two matrices is denoted by A ⊙ B and is the element-wise product
of two matrices. This corresponds to multiplying element _i, j_ of $A$ with
element _i, j_ of $B$.

The __dot product__ between two vectors $x$ and $y$ of the same
dimensionality is the matrix product $x^{T}y$. We can think of the matrix
product $C = AB$ as computing $C_{i, j}$ the dot product between row i of A
and column j of B.

Matrix products have the following mathematical properties:

* Distributive Nature - $A(B + C) = AB + AC$
* Associativity - $A(BC) = (AB)C$

Matrix multiplication is not commutative but the dot product between two
vectors is commutative since it results in a scalar.

The transpose of a matrix product has the simple form:

$(AB)^{T} = B^{T}A^{T}$

A system of linear equations can be denoted by: $Ax = b$, where $A \in \mathbb{R}^{m \times n}$, $b \in \mathbb{R}^{m}$ and $x \in \mathbb{R}^n$ is the vector of unknows we would like to solve for. To solve this system of linear equations linear algebra provides us with the matrix inversion operation.


## 2.3 Identity and Inverse Matrices

The __identity matrix__ is a matrix that does not change any vector when we multiply
that vector by that matrix. We denote the idenity matrix that preserves n-dimensional
vectors as $I_{n}$j. The entries along the main diagonal of an identity matrix 
are 1 whilst the rest of the entries are 0.

TODO: add picture of the identity matrix.

The __matrix inverse__ of A is denoted as $A^{-1}$ and it is defined as:

$$ A^{-1} A = I_{n} $$ 

Multiplying both sides of $Ax = b$ by $A^{-1}$ on the left gives us x = $A^{-1}b $. Of course only this process only works if matrix $A$ has an inverse.

## 2.4 Linear Dependence and Span

For $A^{-1}$ to exist equation the equation $Ax = b$ must have exactly one solution for every value of
$b$. It is also possible for the system of equations to have no solutions or infinitely
many solutions. It is not possible to have more than one and less than an infinite number of solutions since if $x$ and $y$ are solutions for $a$ particular $b$ then:

$$ z = \alpha x + (1 - \alpha) y $$

is also a solution. To analyze how many solution $Ax = b$ has we can think of columns of $A$ as specifying different directions we can travel in from the __origin__, then count the number of ways there are of reaching $b$. Let's look at this in a bit more detail:

\begin{equation}
Ax  = 
\begin{bmatrix} 
A_{1, 1}x_{1} + ... + A_{1, n}x_{n} \\
A_{2, 1}x_{1} + ... + A_{2, n}x_{n} \\
. \\
. \\
. \\
A_{m, 1}x_{1} + ... + A_{m, n}x_{n} \\
\end{bmatrix}
\end{equation}
\begin{equation}
Ax = [A_{1, 1} + ... + A_{m, 1}]x_{1} + ... + [A_{1, n}  + ... + A_{m, n}]x_{n}
\end{equation}

This shows us that each $x_{i}$ specifies how far we should move in the direction of the column $A_{:, i}$ of $A$. Such an operation is called a __linear combination__ where given a set of vectors ${v^{(i)},....,v^{(n)}}$ we define $\sum_{i} c_{i}v^{(i)}$.

The __span__ of a set of vectors is the set of all points obtainable by linear combination of
all the original vectors. Testing whether $Ax = b$ comes down to checking whether $b$ is in
the span of the columns of $A$. This particular span is known as the column space, or the
range of $A$. To be able to cover $R^{m}$ we need at least m vectors in A. This is a necessary
condition but not a sufficient one. In particular we are required to have m vectors that are
linearly independent of each other to be able to represent $R^{m}$. A set of vectors is linearly
independent if no vector in the set is a linear combination of the other vectors. A set of m
linearly independent vectors will guarantee that we cover all $b \in \mathbb{R}^{m}$.

But to have an inverse a matrix must have at most one solution for each value of b.
To do this we must make sure that that the matrix must have at most m columns.

This means that a matrix needs to be square and the columns need to be linearly
independent for an inverse to  








    
    
    
    
## Suggested Resources

[The Matrix Cookbook](https://www.ics.uci.edu/~welling/teaching\/KernelsICS273B/MatrixCookBook.pdf)

[Shilov](https://cosmathclub.files.wordpress.com/2014/10/georgi-shilov-linear-algebra4.pdf)

