# Deep Learning Tutorials - Chapter 2 - ML

# 2.1 - Loss and Risk

# Notes
- The general objective in ML is to capture regularity in data to make predictions
- **Learning** consists of finding in a set $F$ of functionals a "good" $f*$ usually defined through a loss

### Important definitions:
- **loss** - average errror over training data
- **risk** - average error over all data

$$ l: F \text{x} L -> R$$

- We are looking for an f with a small expected risk. So if we minimize risk, then we have obtained an optimal $f*$

$$f* = argmin R(f)$$


# 2.2 Over and under fitting

- Overfitting, when model is too specific to data.
- Underfitting, where model is too generic to data.


# 2.3 Bias Variance Dilemma

![c](./images/bias-variance-dil.png)

- **Degrees of Freedom** in Machine Learning. In predictive modeling, the degrees of freedom often refers to the number of parameters in the model that are estimated from data. This can also include both the coefficients of the model and the data used in the calculation of the error of the model.

- Conceptually model-fitting and regularisation can be interpreted as bayesian inference.

# 2.4 Evaluation Protocols

- Learning algoritghms, in DL , require the tuning of many meta-parameters
- These parameteres have a strong impact on the performance, resulting in a meta over-fitting through experiments
- we must be extra careful with performance estimation

![dev](./images/dev-cycle.png)

**However the standard for us is to have a separate validation set for the tuning.**

- When data is scarce, one can use cross validation: average through multiple random splits of the data in a train and validation sets
- There is no unbiased stimator of the variance of cross-validation valid under all distributions

#### Things to avoid in ML

- Early stopping evaluation
- meta-parameter (over) tuning
- data-set selection
- algorithm data-set specific clauses
- seed selection

The ML community pushes toward accessible implmenetations, reference data-sets, leader boards, and constant upgrades of benchmarks


# 2.5 Basic Embeddings

Deep Learning models combine embeddings and dimesnion reduction operations. They parameterize and re-parametreize mutliple times the input signal into representations that get more and more invariant and noise free. To get an intutition how this is possible, we consider here two standard algorithms:

1. K-means (Lloyd's algorithm)
2. PCA

#### Lloyd's algorithm

Algorithm:

![lloyd](./images/lloyd.png)

What K means does in practice with MNST dataset:

![kmean](./images/pca-kmeans.png)

#### Principal Components Analysis

Eigendecomposition method that tries to represent the data in a linear way by finding the orthogonal points. Principal components are a hyperparameter. 

![pca](./images/PCA-mnst.png)

#### Significance

- these results show that even crude embeddings capture something meaningful. Changes in pixel intensity as expected but also deformations in the "indexing" space (the image plane).
- However, translations and deformations damage the representation badly and "composition" (object on background) is not handled at all

#### Use cases of embeddings in DL

We would like
- to use many encodings " of these sorts" for small local structures with limited variability
- have different "channels" for different components
- process at multiple scales

Computationally, we would like to deal with large signals and large training sets, so we need to avoid super-linear cost in one or the other