**Information:** *A brief review of Linear Algebra used in Machine Learning.*

**Written by:** *Zihao Xu*

**Last update date:**: *05.19.2020*

In [3]:
import numpy as np
import matplotlib.pyplot as plt

# Basic Concepts

## Scalars, Vectors, Matrices and Tensors

### Scalars
- Single number
- Denoted as italic lowercase letter such as $a$, $b$, $c$

### Vectors
- Array of numbers
- Usually consider vectors to be "column vectors"
- Denoted as lowercase letter (often bolded)
    > $\textbf{x}=\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\x_d \end{bmatrix}$
- Dimension is often denoted by $d$, $D$, or $p$
    > $\textbf{x} \in \mathbb{R}^d$
- Access elements via subscript
    > $x_i$ is the $i$-th element

### Matrices
- 2D array of numbers
- Denoted as uppercase letter (often bolded)
    > $\mathbf{A}=\begin{bmatrix}
    A_{1,1} & \cdots & A_{1,n}\\
    \vdots & \ddots & \vdots \\
    A_{m,1} & \cdots & A_{m,n}
    \end{bmatrix}$
- Dimension is often denoted by $m\times n$
    > $\textbf{A} \in \mathbb{R}^{m \times n}$
- Access elements by double subscript 
    > $X_{i,j}$ or $x_{i,j}$ is the $i,j$-th entry of the matrix
- Access rows or columns via subscript or numpy notation
    > $X_{i,:}$ is the $i$-th row, $X_{:,j}$ is the $j$-th column

### Tensors
- n-D array, array with more than two axes
    > $\textbf{A}\in \mathbb{R}^{c\times w\times h}$
- Other notations are similar with Matrices

### Addition of matrices, scalar multiplication and addition
- When $\textbf{A}=[A_{i,j}]$ and $\textbf{B}=[B_{i,j}]$ have the same shape, the sum of them is written as $\textbf{C}=\textbf{A}+\textbf{B}$ where $C_{i,j}=A_{i,j}+B_{i,j}$.
    - In general, matrices of different sizes cannot be added.
    - However, in the context of Deep Learning, notations like $\textbf{C}=\textbf{A}+\textbf{b}$ is allowed where $C_{i,j}=A_{i,j}+b_{j}$, which means the vector $\mathbf{b}$ is added to each row of the matrix. This is to avoid the need to define a matrix with $\mathbf{b}$ copied into each row before doing the addition, This implicit copying is called **broadcasting**.
- The product of any $m\times n$ matrix $\mathbf{A}=[A_{i,j}]$ and any scalar $c$ is written as $\mathbf{C}=c\mathbf{A}$ where $C_{i,j}=c\cdot A_{i,j}$.
- Similarly, the addition of any $m\times n$ matrix $\mathbf{A}=[A_{i,j}]$ and any scalar $b$ is written as $\mathbf{C}=\mathbf{A}+b$ where $C_{i,j}=A_{i,j}+b$.
- Common calculation rules
    - $\mathbf{A}+\mathbf{B}=\mathbf{B}+\mathbf{A}$
    - $(\mathbf{A}+\mathbf{B})+\mathbf{C}=\mathbf{A}+(\mathbf{B}+\mathbf{C})$
    - $c(\mathbf{A}+\mathbf{B})=c\mathbf{A}+c\mathbf{B}$
    - $(c+k)\mathbf{A}=c\mathbf{A}+k\mathbf{A}$
    - $c(k\mathbf{A})=ck\mathbf{A}$

### Multiplication (Standard Product)
- The product $\mathbf{C}=\mathbf{A}\mathbf{B}$ of an $m\times n_1$ matrix $\mathbf{A}=[A_{i,j}]$ times an $n_2\times p$ matrix $\mathbf{B}=[B_{i,j}]$ is defined if and only if $n_1=n_2$ and then $\mathbf{C}$ will be an $m\times p$ matrix $\mathbf{C}$ with entries
$$C_{i,j}=\overset{n}{\underset{k}{\Sigma}}A_{i,k}B{k,j}$$
- Called standard product or matrix product.
- Common calculation rules
    - $(k\mathbf{A})\mathbf{B}=k(\mathbf{A}\mathbf{B})=\mathbf{A}(k\mathbf{B})$
    - $\mathbf{A}(\mathbf{B}\mathbf{C})=(\mathbf{A}\mathbf{B})\mathbf{C}$
    - $(\mathbf{A}+\mathbf{B})\mathbf{C}=\mathbf{A}\mathbf{C}+\mathbf{B}\mathbf{C}$
    - $\mathbf{C}(\mathbf{A}+\mathbf{B})=\mathbf{C}\mathbf{A}+\mathbf{C}\mathbf{B}$

### Element-wise product
- A matrix containing the product of the individual elements from two matrix have the same size.
- Denoted by $\mathbf{C}=\mathbf{A}\odot\mathbf{B}$ where $C_{i,j}=A_{i,j}\cdot B_{i,j}$
- Also called Hadamard product