**Information:** *A brief review of Linear Algebra needed in Machine Learning.*

**Written by:** *Zihao Xu*

**Last update date:**: *05.20.2020*

# Basic Concepts

## Scalars, Vectors, Matrices and Tensors

### Scalars
- Single number
- Denoted as italic lowercase letter such as $a$, $b$, $c$

### Vectors
- Array of numbers
- Usually consider vectors to be "column vectors"
- Denoted as lowercase letter (often bolded)
    > $\textbf{x}=\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\x_d \end{bmatrix}$
- Dimension is often denoted by $d$, $D$, or $p$
    > $\textbf{x} \in \mathbb{R}^d$
- Access elements via subscript
    > $x_i$ is the $i$-th element

### Matrices
- 2D array of numbers
- Denoted as uppercase letter (often bolded)
    > $\mathbf{A}=\begin{bmatrix}A_{1,1} & \cdots & A_{1,n}\\\vdots & \ddots & \vdots \\A_{m,1} & \cdots & A_{m,n}\end{bmatrix}$
- Dimension is often denoted by $m\times n$
    > $\textbf{A} \in \mathbb{R}^{m \times n}$
- Access elements by double subscript 
    > $X_{i,j}$ or $x_{i,j}$ is the $i,j$-th entry of the matrix
- Access rows or columns via subscript or numpy notation
    > $X_{i,:}$ is the $i$-th row, $X_{:,j}$ is the $j$-th column

### Tensors
- n-D array, array with more than two axes
    > $\textbf{A}\in \mathbb{R}^{c\times w\times h}$
- Other notations are similar with Matrices

### Addition of matrices, scalar multiplication and addition
- When $\textbf{A}=[A_{i,j}]$ and $\textbf{B}=[B_{i,j}]$ have the same shape, the sum of them is written as $\textbf{C}=\textbf{A}+\textbf{B}$ where $C_{i,j}=A_{i,j}+B_{i,j}$.
    - In general, matrices of different sizes cannot be added.
    - However, in the context of Deep Learning, notations like $\textbf{C}=\textbf{A}+\textbf{b}$ is allowed where $C_{i,j}=A_{i,j}+b_{j}$, which means the vector $\mathbf{b}$ is added to each row of the matrix. This is to avoid the need to define a matrix with $\mathbf{b}$ copied into each row before doing the addition, This implicit copying is called **broadcasting**.
- The product of any $m\times n$ matrix $\mathbf{A}=[A_{i,j}]$ and any scalar $c$ is written as $\mathbf{C}=c\mathbf{A}$ where $C_{i,j}=c\cdot A_{i,j}$.
- Similarly, the addition of any $m\times n$ matrix $\mathbf{A}=[A_{i,j}]$ and any scalar $b$ is written as $\mathbf{C}=\mathbf{A}+b$ where $C_{i,j}=A_{i,j}+b$.
- Common calculation rules
    - $\mathbf{A}+\mathbf{B}=\mathbf{B}+\mathbf{A}$
    - $(\mathbf{A}+\mathbf{B})+\mathbf{C}=\mathbf{A}+(\mathbf{B}+\mathbf{C})$
    - $c(\mathbf{A}+\mathbf{B})=c\mathbf{A}+c\mathbf{B}$
    - $(c+k)\mathbf{A}=c\mathbf{A}+k\mathbf{A}$
    - $c(k\mathbf{A})=ck\mathbf{A}$

### Multiplication (Standard Product)
- The product $\mathbf{C}=\mathbf{A}\mathbf{B}$ of an $m\times n_1$ matrix $\mathbf{A}=[A_{i,j}]$ times an $n_2\times p$ matrix $\mathbf{B}=[B_{i,j}]$ is defined if and only if $n_1=n_2$ and then $\mathbf{C}$ will be an $m\times p$ matrix $\mathbf{C}$ with entries
$$C_{i,j}=\overset{n}{\underset{k}{\Sigma}}A_{i,k}B{k,j}$$
- Called standard product or matrix product.
- Common calculation rules
    - $(k\mathbf{A})\mathbf{B}=k(\mathbf{A}\mathbf{B})=\mathbf{A}(k\mathbf{B})$
    - $\mathbf{A}(\mathbf{B}\mathbf{C})=(\mathbf{A}\mathbf{B})\mathbf{C}$
    - $(\mathbf{A}+\mathbf{B})\mathbf{C}=\mathbf{A}\mathbf{C}+\mathbf{B}\mathbf{C}$
    - $\mathbf{C}(\mathbf{A}+\mathbf{B})=\mathbf{C}\mathbf{A}+\mathbf{C}\mathbf{B}$

### Element-wise product
- A matrix containing the product of the individual elements from two matrix have the same size.
- Denoted by $\mathbf{C}=\mathbf{A}\odot\mathbf{B}$ where $C_{i,j}=A_{i,j}\cdot B_{i,j}$
- Also called Hadamard product

### Transposition of Matrices and Vectors
- Denoted as $\mathbf{A}^T$
- The transpose of an $m\times n$ matrix $\mathbf{A}=[A_{i,j}]$ is the $n\times m$ matrix $\mathbf{A}^T$ that has the first row of $\mathbf{A}$ as its first column, the second row as its second column, and so on.
    > $\mathbf{A}^T=[A_{j,i}]=\begin{bmatrix}A_{1,1} & \cdots & A_{m,1}\\\vdots & \ddots & \vdots \\A_{1,n} & \cdots & A_{m,n}\end{bmatrix}$
- For vector $\mathbf{v}$, the transpose changes it from a column vector to a row vector.
    > $\mathbf{x}=\begin{bmatrix}x_1 \\x_2 \\\vdots \\x_d\end{bmatrix}$, 
    $\mathbf{x}^T=\begin{bmatrix}x_1 & x_2 & \cdots & x_d\end{bmatrix}$
- Rules for transposition
    - $(\mathbf{A}^T)^T=\mathbf{A}$
    - $(\mathbf{A}+\mathbf{B})^T=\mathbf{A}^T+\mathbf{B}^T$
    - $(c\mathbf{A})^T=c\mathbf{A}^T$
    - $(\mathbf{A}\mathbf{B})^T=\mathbf{B}^T\mathbf{A}^T$

### Special Matrices
- Symmetric matrix: $\mathbf{A}^T=\mathbf{A},A_{i,j}=A_{j,i}$
- Skew-symmetric matrix: $\mathbf{A}^T=-\mathbf{A}$
- Triangular matrix:
    - Upper triangular matrix can have non-zero entries only **on and above** the diagonal
    - Lower triangular matrix can have non-zero entries only **on and below** the diagonal
- Identity matrix:
    - Identity matrix of size $n$ is the $n\times n$ square matrix with ones on the main diagonal and zeros elsewhere. It is denoted by $\mathbf{I}_n$ or simply by $\mathbf{I}$ if the size is immaterial or can be trivially determined by the context.
    - Some times called unit matrix (depends on the context).
- Scalar matrix:
    - Any multiple of an Identity matrix.
- Diagonal matrix:
    - A square matrix in which the entries outside the diagonal are all zero.

## Linear System of equations
### Represent linear set of equations in matrix equations
- Linear set of equations can be compactly represented as matrix equation
- In general:
    $$\begin{aligned}
    a_{1,1}x_1+a_{1,2}x_2+ &\cdots+ a_{1,n}x_n=b_1\\
    a_{2,1}x_1+a_{2,2}x_2+ &\cdots+ a_{2,n}x_n=b_2\\
    &\vdots\\
    a_{m,1}x_1+a_{m,2}x_2+ &\cdots+ a_{m,n}x_n=b_m
    \end{aligned}
    $$
    is **equivalent** to:
    $$\mathbf{Ax}=\mathbf{b}$$
    where $\mathbf{A}\in \mathbb{R}^{m\times n}$,$\mathbf{x}\in \mathbb{R}^n$,$\mathbf{b}\in \mathbb{R}^m$
- Augmented matrix
    $$\tilde{\mathbf{A}}=[\mathbf{A},\mathbf{b}]$$

### Gaussian Elimination
- **Goal**: Bring system to a triangular form
- **Step**: 
    - Elementary operations on equations $\longleftrightarrow$ Operation on matrices
    - Interchange of two equations $\longleftrightarrow$ Interchange two rows in a matrix
    - Addition of a constant $\longleftrightarrow$ Addition of a constant
- Row equivalent
    - We call a linear system $S_1$ row-equivalent to a linear system $S_2$ if $S_1$ can be obtained by finitely many row operations from $S_2$.
- Theorem
    - Row-equivalent linear systems have the same set of solutions.
- Solution by Gaussian Elimination
    - System:
        - $\mathbf{Ax}=\mathbf{b}$ with augmented matrix $\tilde{\mathbf{A}}=[\mathbf{A},\mathbf{b}]$
        - $\mathbf{A}\in \mathbb{R}^{m\times n}$,$\mathbf{x}\in \mathbb{R}^n$,$\mathbf{b}\in \mathbb{R}^m$
    - Step 1: 
        - Pivot row: First row of $\tilde{\mathbf{A}}$
        - Pivot: Coefficient of the $x_1$ term in pivot row
        - Use pivot row to eliminate $x_1$ term in all other rows below
    - Step 2:
        - First equation remains as it is
        - Pivot row: Second row of $\tilde{\mathbf{A}}$
        - Pivot: Coefficient of the $x_2$ term in pivot row
        - Use pivot row to eliminate $x_2$ term in all other rows below
    - Step 3:
        - Repeat the procedure which moves the pivot row from $s$ to $s+1$ and set pivot to be the coefficient of $x_{s+1}$ term in pivot row in each step, until $\mathbf{A}$ is in upper triangular form
    - Step 4: 
        - Back-substitution to get $x_n$, $x_{n-1}$, ..., $x_2$, $x_1$ sequentially

### Classification of solutions of Linear Systems
- At the end of Gaussian elimination, $\mathbf{A}$ is in upper triangular form (row echelon form)
    - $r$ = number of non-zero rows in $\tilde{\mathbf{A}}$ = **rank** of $\tilde{\mathbf{A}}$, $r \le m$
- In general, three possible cases
    - Consistent if $r=m$ or $r<m$ but $\tilde{b}_{r+1}$, ..., $\tilde{b}_{m}$ are all zero
        - One unique solution if consistent and $r=n$
        - Infinite many solution if consistent and $r<n$. In this case, choose $x_{r+1}$, ...,$x_n$ arbitrarily.
    - Inconsistent if $r<m$ and at least one of $\tilde{b}_{r+1}$, ..., $\tilde{b}_{m}$ is non-zero
        - No solution 

## Linear Independence, Rank of Matrix, Vector Space
### Linear Independence
- Given: Set of vectors {$\mathbf{v}^{(1)},\mathbf{v}^{(2)},\cdots,\mathbf{v}^{(n)}$}
- With $c_1,c_2,\cdots,c_n$ are scalars, a linear combination of these vectors is of the form:
$$c_1\mathbf{v}^{(1)}+c_2\mathbf{v}^{(2)}+\cdots+c_n\mathbf{v}^{(n)}$$
- Consider $c_1\mathbf{v}^{(1)}+c_2\mathbf{v}^{(2)}+\cdots+c_n\mathbf{v}^{(n)}=0$ true for $c_1=c_2=\cdots=c_n=0$
    - If this is the only solution:**This set of vectors form a linear independent set**
    - Otherwise: **Linear Dependent**

### Rank of a matrix
- The rank of a matrix $\mathbf{A}$ is the number of linearly independent row vectors of $\mathbf{A}$,
- Denoted by **rank A**
- Determine the rank of a matrix
    - Observation: Number of linearly independent row vectors does not change by elementary row operations
    - **Theorem 1**: 
        - Row equivalent matrices have the same rank
    - Strategy: Reduce the matrix to row-echelon form (upper triangular form) and read off the rank directly
    - **Theorem 2**: 
        - $p$ vectors with $n$ components each are independent if the matrix with these vectors as row vectors has rank $p$, but linearly dependent if that rank is less than $p$
    - **Theorem 3**: 
        - The rank of a matrix $\mathbf{A}$ equals the maximum number of linearly independent column vectors of $\mathbf{A}$. Hence $\mathbf{A}$ and its transpose $\mathbf{A}^T$ have the same rank
    - **Theorem 4**:
        - $p$ vectors with $n<p$ components are always linearly dependent.

### Vector Space
- **Vector Space**:
    - Denoted by $V$
    - Also called a **linear space**
    - Nonempty set of vectors with the same number of components such that with any two vectors $\mathbf{a}$ and $\mathbf{b}$, all linear combinations $\alpha\mathbf{a}+\beta\mathbf{b}$ ($\alpha,\beta$ are real numbers) are elements of $V$ and these vectors satisfy the rules for vector addition and scalar multiplication.
- **Dimension** of $V$:
    - Maximal number of linearly independent vectors
- **Basis**:
    - Linear independent set of maximally possible vectors
    - Number of vectors in the basis = dim $V$
- **Span**:
    - Set of all linear combinations given vectors $\mathbf{a_1},\mathbf{a_2},\cdots,\mathbf{a_p}$
- **Subspace**:
    - Nonempty set of vectors which forms itself a vector space with respect to addition and scalar multiplication
- **Theorem 5**:
    - The vector space $\mathbb{R}^n$ consisting of all vectors with $n$ components (real) has dimension $n$ 
- **Theorem 6**:
    - The row space and the column space of a matrix $\mathbf{A}$ have the same dimension, equal to rank $\mathbf{A}$

## Solution of linear systems: Existence, Uniqueness
### Submatrix of a matrix $\mathbf{A}$
- Any matrix obtained from $\mathbf{A}$ by omitting some rows or columns

### Theorems for linear systems(homogeneous systems)
- **Homogeneous systems**
    - A linear system of $m$ equations and $n$ unknowns in the form $$\mathbf{Ax}=0$$
    where $\mathbf{A}\in \mathbb{R}^{m\times n}$, $\mathbf{x}\in \mathbb{R}^n$
- Always has the trivial solution $\mathbf{x}=0$
- Nontrivial solutions exist if and only if rank $\mathbf{A}=r<n$ 
- If $r<n$, the solution, together with $\mathbf{x}=0$, form a vector space of dimension $n-r$, called the solution space of the system
- In particular, if $\mathbf{x}_1$ and $\mathbf{x}_2$ are solution vectors, so is $\mathbf{x}=c_1\mathbf{x}_1+c_2\mathbf{x}_2$
- Solution space of the system is called **Null Space**, $\mathbf{Ax}=0$ for every $\mathbf{x}$ from this solution space $N$
    - dim $N$ = **Nullity**
    - rank $\mathbf{A}$ + nullity $\mathbf{A}$ = $n$
- A homogeneous system with fewer equations than unknowns always has non-trivial solution
    - rank $\mathbf{A}=r\le m <n$

### Theorems for linear systems (non-homogeneous systems)
- **Non-homogeneous systems**:
    - A linear system of $m$ equations and $n$ unknowns in the form $$\mathbf{Ax}=\mathbf{b}$$
    where $\mathbf{A}\in \mathbb{R}^{m\times n}$, $\mathbf{x}\in \mathbb{R}^n$, $\mathbf{b}\in \mathbb{R}^m$ and $\mathbf{b}\ne 0$
- **Existence**:
    - A non-homogeneous linear system is consistent (i.e. has solutions) if and only if the coefficient matrix $\mathbf{A}$ and the augmented matrix $\tilde{\mathbf{A}}$ have the same rank.
- **Uniqueness**:
    - The system has precisely one solution if and only if the common rank $r$ of $\mathbf{A}$ and $\tilde{\mathbf{A}}$ equals $n$
- **Infinite many solutions**
    - If this common rank is less than $n$, the system has infinitely many solutions. All the solutions can be obtained by determining $r$ unknowns in terms of the remaining $n-r$ unknowns.
- **Solution**
    - If a non-homogeneous system is consistent, then all the solutions are obtained as $\mathbf{x}=\mathbf{x}_o+\mathbf{x}_h$
        - $\mathbf{x}_o$: Fixed solution of $\mathbf{Ax}=\mathbf{b}$
        - $\mathbf{x}_h$: Run through all solutions of $\mathbf{Ax}=0$

## Determinants, Cramer's Rule
### Determinant of order n
- **Only defined for a square matrix**
- $D=\mathrm{det}\mathbf{A}=\begin{vmatrix}a_{1,1}&a_{1,2}&\cdots&a_{1,n}\\a_{2,1}&a_{2,2}&\cdots&a_{2,n}\\\vdots&\vdots&\ddots&\cdots\\a_{n,1}&a_{n,2}&\cdots&a_{n,n}\end{vmatrix}$
- $n=1$, $D=a_1$
- $n\ge 2$,expand by $i-th$ rows ($i=1,2,\cdots,n$)
    - $D = a_{i,1}C_{i,1}+a_{i,2}C_{i,2}+\cdots+a_{i,n}C_{i,n}$
        - $C_{i,j}=(-1)^{i+j}M_{i,j}$
        - $M_{i,j}$ is the determinant of order $n-1$, of a submatrix of $\mathbf{A}$ obtained from $\mathbf{A}$ by deleting the $i$-th row and the $j$-th column as indicated by the entry $a_{i,j}$
    - $D = \underset{j=1}{\overset{n}{\Sigma}}a_{i,j}C_{i,j}$ 
    - Or alternatively expand by $j-th$ column: $D = \underset{i=1}{\overset{n}{\Sigma}}a_{i,j}C_{i,j}$ where $j=1,2,\cdots,n$
    - **Remark**: Easier for n upper triangular matrix

### General properties of determinants
- Behavior of $n$-th order determinant under elementary row operations
    - Interchange of two rows or two columns multiplies the determinant by $-1$
    - Addition of a multiple of one row/column to another row/column doesn't alter the value of the determinant
    - Multiplication of a row/column by a constant $c$ multiplies the value of the determinant by $c$
        - $\mathrm{det}(c\mathbf{A})=c^n\mathrm{det}(\mathbf{A})$
        - $\mathrm{det}(\mathbf{A}^T)=\mathrm{det}(\mathbf{A})$
        - $\mathrm{det}(\mathbf{AB})=\mathrm{det}(\mathbf{A})\mathrm{det}(\mathbf{B})$
        - $\mathrm{det}(\mathbf{A+B})\ne \mathrm{det}(\mathbf{A})+\mathrm{det}(\mathbf{B})$ (In general)
    - Transposition leaves determinant the same
    - A zero row or zero column renders the value of $\mathrm{det}=0$
    - Proportional rows or columns render the value of $\mathrm{det}=0$
- For practical purposes, to evaluate a determinant of $n$-th order:
    - reduce the matrix to upper triangular form, which need to keep track of operations that change the determinant
    - multiply the elements on the diagonal to calculate the determinant
- Relationship between **Rank** and **Determinant**
    - An $m\times n$ matrix $\mathbf{A}=[A_{i,j}]$ has rank $r\ge 1$ if and only if it has an $r\times r$ submatrix with non-zero determinant
    - In particular, if $\mathbf{A}$ is square with size $n\times n$, it has $\mathrm{rank}=n$ if and only if $\mathrm{det}\ne 0$

### Cramer's rule (Solution of linear system by determinants)
- If a linear system of $n$ equations for $n$ unknowns:$$\mathbf{Ax}=\mathbf{b}$$
    where $\mathbf{A}\in \mathbb{R}^{n\times n}$, $\mathbf{x}\in \mathbb{R}^n$, $\mathbf{b}\in \mathbb{R}^n$ has non-zero coefficient determinant ($\mathrm{det}(\mathbf{A})=D\ne 0$), it has precisely one solution
- The solution is given by $x_1=\frac{D_1}{D},x_2=\frac{D_2}{D},\cdots,x_n=\frac{D_n}{D}$ where $D_k$ is the determinant of a matrix obtained from $\mathbf{A}$ by replacing the $j$-th column by a column with entries $b_1,b_2,\cdots,b_n$
- If the system is homogeneous and $D\ne 0$, it has only the trivial solution. If $D=0$, the system has non-trivial solutions.

## Inverse of matrix, Gauss-Jordan eliminations
### Inverse of matrix
- Consider only square matrices
- Inverse of an $n\times n$ matrix $\mathbf{A}=[A_{i,j}]$ is $\mathbf{A}^{-1}$ such that:
$$\mathbf{A}\mathbf{A}^{-1}=\mathbf{A}^{-1}\mathbf{A}=\mathbf{I}_n$$
- If $\mathbf{A}$ has inverse: $\mathbf{A}$ is non-singular, otherwise $\mathbf{A}$ is singular
    - Singular matrices are similar to zeros (similar to the idea that $0$ does not have an inverse)
    - Called "singular" because a random matrix is unlikely to be singular just like choosing a random number is unlikely to be $0$
- Motivation:
    - $\mathbf{Ax}=\mathbf{b}\Rightarrow \mathbf{x}=\mathbf{A}^{-1}\mathbf{b}$ (usually not suitable for numerical calculation)
- **Theorem**: Existence of $\mathbf{A}^{-1}$
    - The inverse $\mathbf{A}^{-1}$ of an $n\times n$ matrix $\mathbf{A}$ exists if and only if the $\mathrm{rank}\mathbf{A}=n$, thus if and only if $\mathrm{det}\mathbf{A}\ne 0$ 
- Formula for Inverse of $\mathbf{A}$
    - $\mathbf{A}^{-1}=\frac{1}{\mathrm{det}\mathbf{A}}[C_{i,j}]^T$
    - $C_{i,j}=(-1)^{i+j}M_{i,j}$
    - $M_{i,j}$ is the determinant of order $n-1$, of a submatrix of $\mathbf{A}$ obtained from $\mathbf{A}$ by deleting the $i$-th row and the $j$-th column as indicated by the entry $a_{i,j}$
    - Usually used on only $2\times 2$ matrix

### Gauss-Jordan elimination
- Method to find the inverse
- Build an matrix $[\mathbf{A}|\mathbf{I}]$ containing $\mathbf{A}$ and identity matrix $\mathbf{I}$
- Perform Gaussian elimination on $\mathbf{A}$, but do the same steps on $\mathbf{I}$, until get the result $[\mathbf{I}|\mathbf{B}]$. Thus, $\mathbf{B}=\mathbf{A}^{-1}$

## Norms
### Definition
- The "size" of a vector or matrix. 
- Intuitively,the norm of a vector $\mathbf{x}$ measures the distance from the origin to the point $\mathbf{a}$.
- Functions mapping vectors or matrices to non-negative values
- Formally, a norm is any function $f$ that satisfies the following properties:
    - $f(\mathbf{x})=0\Rightarrow \mathbf{x}=0$
    - $f(\mathbf{x}+\mathbf{y})\le f(\mathbf{x})+f(\mathbf{y})$ (the triangle inequality)
    - $\forall \alpha \in \mathbb{R},f(\alpha\mathbf{x})=|\alpha|f(\mathbf{x})$

### $L^p$ norm
- $||\mathbf{x}||_p=\left(\underset{i}{\Sigma}|x_i|^p\right)^{\frac{1}{p}}$
- $p=2$: **Euclidean Norm**, used so frequently in machine learning that it is often denoted simply as $||\mathbf{x}||$ with the subscript $2$ omitted. It is also common to measure the size of a vector using the squared $L^2$ norm, which can be calculated simply as $\mathbf{x}^T\mathbf{x}$
    - In most machine learning cases, the squared $L^2$ norm is more convenient to work with mathematically and computationally than the $L^2$ norm itself. On example is that each derivative of the squared $L^2$ norm with respect to each element of $\mathbf{x}$ depends only on the corresponding element of $\mathbf{x}$.
- $p=1$: commonly used in machine learning when the difference between zero and nonzero elements is very important. Every time an element of $\mathbf{x}$ moves away from 0 by $\epsilon$, the $L^1$ norm increases by $\epsilon$
    - Sometimes used to count the number of nonzero entries
- $p=\infty$:**Max Norm**, simplifies to the absolute value of the element with the largest magnitude in the vector 
    $$
    ||\mathbf{x}||_{\infty}=\underset{i}{\mathrm{max}}|\mathbf{x}_i|
    $$

### Frobenius Norm
- Used to measure the size of a matrix
- $||\mathbf{A}||_{F}=\sqrt{\underset{i,j}{\Sigma}A^2_{i,j}}$
- Analogous to the $L^2$ norm of a vector

## Inner Product Space, Linear Transformations
### Inner Product
- A binary operation associates each pair of vectors in the space with a scalar quantity known as the inner product of the vectors, often denoted using angle brackets (as in ${\displaystyle \langle \mathbf{a},\mathbf{b}\rangle}$).
- **Dot product**: One widely used inner product on a finite dimensional Euclidean space. Apply for two vectors with the same length.
    - $\langle\mathbf{a},\mathbf{b}\rangle=(\mathbf{a},\mathbf{b})=\mathbf{a}\bullet\mathbf{b}=\mathbf{a}^T\mathbf{b}=\underset{i=1}{\overset{n}{\Sigma}}a_ib_i$
    - Two vectors $\mathbf{a}, \mathbf{b}$ are called **orthogonal** if $\mathbf{a}\bullet\mathbf{b}=0$
    - Can be written in terms of norms: $\mathbf{a}^T\mathbf{b}=||a||_2||b||_2\mathrm{cos}\theta$

### Abstract Real Inner Product Space
- Real vector space $V$ is called real inner product space $V$ together with an inner product $(\mathbf{a},\mathbf{b})$ satisfying
    - Linearity: $(q_1\mathbf{a}+q_2\mathbf{b},\mathbf{c})=q_1(\mathbf{a},\mathbf{c})+q_2(\mathbf{b},\mathbf{c})$ where $\mathbf{a},\mathbf{b}\in V, q_1,q_2\in \mathbb{R}$
    - Symmetry: $(\mathbf{a},\mathbf{b})=(\mathbf{b},\mathbf{a})$
    - Positive-definite: $(\mathbf{a},\mathbf{a})\ge 0$, $(\mathbf{a},\mathbf{a})=0$ if and only if $\mathbf{a}=0$