In [1]:
import numpy as np
import matplotlib.pyplot as plt
from numpy import linalg as LA

### Singular Value Decomposition

Decomposes a matrix into 3 other matrices.  

**Applications:**  
- lossy image compression (decent approximation using considerably less space)  
- reducing noise in images  
- eigenfaces for facial recognition
- reduce high dimensional data into fewer dimensions

In [None]:
U,s,V_T = LA.svd(A)

### Principal Component Analysis

### Bias vs Variance (w.r.t prediction error)

**Bias** - difference between the average prediction of model and the correct value  
High bias results in large error on training and test

**Variance** - variability of model prediction for a given data point; value which tells the spread of the data.  
High variance results in low error for training high error for test  

**Underfitting** - high bias, low variance  
**Overfitting** - low bias, high variance

![image.png](attachment:image.png)

### Support Vector Machines

### Naive Bayes

### Bayes Theorem

$$P(A|B)=\frac{P(B|A)\cdot P(A)}{P(B)}$$

$P(B)=P(B|A)\cdot P(A)+P(B|!A)\cdot P(!A)$

$P(A)+P(!A) = 1$  
$P(B)+P(!B) = 1$  

| $P(A)+P(!A)$ | $P(B)$ | $P(!B)$ |
| :---: | :---: | :---: |
|$P(A)$  |  $P(B|A)$ - True Positive  |  $P(!B|A)$ - False Negative |
| $P(!A)$ | $P(B|!A)$ - False Positive  | $P(!B|!A)$ - True Negative |

#### Example

- 1% of pop. have a disease: $P(A)$
- 3% false positive rate: $P(B|!A)$
- 6% false negative rate: $P(!B|A)$
- What is probability of actually having the disease after a positive test result: $P(A|B)$

|Disease| Test (+) | Test (-) |
|:---:|:---:|:---:|
|1% (+) |94%   |6%  |
|99% (-) |3%  |97%   |

In [15]:
(.94*.01)/(.94*.01 + .03*.99)

0.24040920716112535

### Monte Carlo approximation

### Fast Fourier Transform

### Statistically Independent vs Mutually Exclusive

- $\cap$ = and
- $\cup$ = or

|	|  If statistically independent   |	If mutually exclusive |
|:---:|:---:|:---:|
|$P(A|B)=$ | $P(A)$ | 0 |
|$P(B|A)=$ | $P(B)$ | 0 |
|$P(A\cap B)=$ | $P(A)\cdot P(B)$ | 0 |
|$P(A\cup B)=$ | $P(A) + P(B) - P(A\cap B)$ | $P(A) + P(B)$ |

### Precision vs Recall

**Precision** - percentage of results which are relevant  
**Recall** - percentage of relevant results correctly classified  

![image.png](attachment:image.png)

### ROC (receiver operating characteristic) curve

plotting the true positive rate (y axis) against the false positive rate (x axis) at various threshold settings

### Regularization

**Dropout** - throw away random percentage of activations

**L2** - adds square of the weights to the loss (penalizes model complexity)

**Weight Decay** - added to the weight update