# Linear Algebra Concepts:

- Dot Product: $\langle u, Av \rangle = u^TAv = \sum_{i=1}^{d}u_i(Av)_i = \sum_{i=1}^{d}\sum_{j=1}^{d}u_iA_{ij}v_j$
- Norm: $\|u\| = \langle u, u \rangle = u_1^2 + \dots + u_d^2$
- Determinants: $det(AB) = det(A) \cdot det(B)$, $det(A^{-1}) = (det(A))^{-1}$
- Eigenvalues: exists a nonzero vector $v$ such that $Av = \lambda v$
    - Only exists for square matrices
- Orthogonal Matrices:
    - $\|Ov\| = \|v\|$
    - $O^{-1} = O^T$

## Positive Semi-definite Matrices (psd)
- Matrix is positive semi-definite (psd) if $\langle x, Mx\rangle \geq 0$
- Matrix is positive definite (pd) if $\langle x, Mx\rangle > 0$

## Symmetric Matrices
- Can be **diagonalised**: $S = ODO^T$ where $O$ is an orthogonal matrix and $D = Diag(\lambda_1,\dots,\lambda_d)$ is a diagonal matrix
- Is **psd** iff ALL $\lambda \geq 0$
- Is **pd** iff ALL $\lambda > 0$

## Invertible Matrices

# Calculus Concepts:

### First Order Derivative:
$$
f(x+\epsilon) \approx f(x) + \langle \nabla f(x), \epsilon \rangle
$$

### Second Order Derivative
$$
f(x+\epsilon) \approx f(x) + \langle \nabla f(x), \epsilon \rangle + \frac{1}{2}\langle\epsilon, Hess(x) \epsilon\rangle
$$

# Gradient Descent

## Convexity
Function is **convex** if:
$$
f(\lambda x + (1 - \lambda) y) \leq \lambda f(x) + (1-\lambda)f(y)
$$
Second Derivative Criterion:
- If $\text{Hess}_F(x)$ is **psd** then function is **convex**
- If $\text{Hess}_F(x)$ is **pd** then function is **strictly convex**
    - For a strictly convex function, local minimum = global minimum
    
### Convexity Preserving Operations:
- Nonnegative Weighted Sums
    - If $f_1,\dots,f_n$ are convex functions and $w_1, \dots, w_n \geq 0$ then,  
    $f(x) = w_1f_1(x) + \dots + w_nf_n(x)$ is also convex
- Composition with an affine mapping
    - $g(x) = f(Ax + b)$, if $f$ is convex, then $g$ is also convex
- Pointwise Maximum
- Restriction to a line

## Advantages of Probabilistic Output
- Making an error of one type may be more **"expensive"** than another (e.g cancer vs no cancer)
- Can tell if we are **uncertain** about our prediction and refuse to classify it if so, and pass the case to a human expert
- Easier to **combine several probabilistic outputs** given by several models

### Logistic Regression

## Classification
### Logisitic Regression
$$
P(y=y_0|x) = \frac{1}{1+exp(-y_0\langle\beta,x\rangle)}\\
\text{Log-likelihood} = - \sum^N_{i=1}log(1 + exp(-y_0\langle\beta,x\rangle))\\
\text{MLE }\beta_* = argmin\left\{\sum^N_{i=1}log(1 + exp(-y_0\langle\beta,x\rangle))\right\}
$$

## Gradient Descent
### Rate of Convergence