# Measuring Performance
- With sufficient capacity, a neural network model will often perform perfectly on training data, but it doesnt mean it will generalize well to new data
## Sources of error
- Noise:
  - Stochastic element to data generating process
  - Further exploratory variables that were not observed
  - Inherent uncertainty in the true mapping from input to output
- Bias:
  - Model is not flexible enough to fit to the true function
  - Systematic deviation of the model from the function we are modelling
- Variance:
  - Limited training examples, and there is no way of distinguishing systematic changes in the underlying function from the noise in the underlying data
  - Uncertainty in the fitted model due to the particular training set we have


$$ \mathbb{E}_D[\mathbb{E}_y[L[x]]] = \mathbb{E}_D[(f[x,\phi[D]] - f_{\mu}[x])^2] + (f_{\mu}[x] - \mu_[x])^2 + \sigma^{2} $$
  - The expected loss after considering the uncertainty in the training data $D$ and the test data $y$ consists of three components
    - $\mathbb{E}_D[(f[x,\phi[D]] - f_{\mu}[x])^2]$ is the variance
    - $(f_{\mu}[x] - \mu_[x])^2$ is the bias
    - $\sigma^2$ is the noise
  - They combine linearly in linear regression with $MSE$, but their interaction can be more complex for other types of problems

## Reducing error
- Noise error is irreducible
- Reducing variance
  - Increasing quantity of data
- Reducing bias
  - Increasing model capacity
- Bias-variance tradeoff
  - For a fixed-size training dataset, the variance term increases as the model capacity increases
- Overfitting
  - Tries the model the noise in the data

## Double descent
- Test loss starts to increase when the model is fitting the training data perfectly, then starts to decrease again
- Interaction of two phenomena
  - Test performance becomes temporarily worse when the model has just enough capacity to memorize the data
    - Exactly as predicted by the bias-variance trade-off
  - Test performance continues to improve with capacity even after the training performance is perfect
- After the model fits the training data perfectly, further capacity does not help the model. Any change must occur between the training points
- *Inductive bias*
  - Tendency of a model to prioritize one solution over another as it extrapolates between data points
- As we add capacity to the model, it interpolates between the nearest points increasingly smoothly

## Choosing hyperparameters
- Chosen empirically
- Measure their performance on a validation set
- For every choice of hyperparameters, train the model with the training set and evaluate it on the validation set

## Curse of dimensionality
- Two randomly sampled points from a standard normal distribution are close to ortogonal to each other
- Distance from the origin of samples from a standard normal distribution is roughly constant
- Most of the volume of a high-dimensional sphere is adjacent to its surface
  - **Most of the volume of a highdimensional orange is in the peel, not the pulp**

## Real-world performance
- Data drift
  - Statistics of real world data may change over time
- Covariate shift
  - Observing part of the function that wasnt seen during training
- Concept shift
  - Relationship between input and output may change over time