# Learning paradigms

The **no free lunch theorem** states that there is no single model that works optimally for every problem.
We therefore need to develop a range of techniques that are well-adapted to different types of problem and data.

As a result, there are many different techniques that fall under the ***machine learning*** umbrella.
There are also multiple dimensions across which we can segment them. Understanding these paradigms - and which where each technique falls along each dimension - can help to intuit much about a technique before we have delved into its specifics.

Below are some of the most important dimensions along which to categorise different techniques.

## Supervised Vs Unsupervised (Vs Reinforcement) learning


## Parametric Vs Non-parametric learning

```{important}

Parametric models have a fixed number of parameters.

The number of parameters in a non-parametric model varies with the size of the training data.

```

**Parametric models** are able to hold the number of parameters constant by making strong(er) assumptions about the distribution of the data.
The number of parameters is determined by the assumptions that we make about the model / data distribution.
This notion of using our understanding of the problem to tightly constrain the functional form seems to be key.

eg. linear regression, logistic regression

- As a result of making strong assumptions about the data distribution, the models are restricted in the functional form they're able to learn.
- When the assumptions are appropriate, this can help us to learn a model that generalises well with relatively little data.
- This is especially true in regions of the input space that were sparsely covered by the training data.
- However, if the assumptions are not appropriate then this all goes out of the window - so the use of parametric methods places a stronger burden on the practitioner to understand the domain before they model it.

- A smaller number of parameters can also make the model faster to run at inference time and massively reduces the storage footprint of the model.

In contrast, **non-parametric techniques** do not use rigid assumptions about the data distribution to constrain the functional form as tightly.
The number of parameters is driven by the algorithm / data rather than our assumptions about the model.
The notion of letting the data tell us what functional form & parameterisation we should take seems to be at the essence of the non-parametric school of thought.

eg. K-nearest neighbours, CART / random forests, support-vector machines

- Due to the absence of assumptions on the data distribution, non-parametric methods can have a greater capacity to learn a wider range of functional forms.
- As a result of the greater flexibility, they often require more data to achieve high performance
- But given the data, their expressiveness often facilitates greater performance
- The intense focus on the data also makes them more susceptible to overfitting to the data
- In many cases, the time complexity of inference and the storage footprint scales poorly