# Hyperparameters

## Learning objectives

Understand what is going on with:

- Hyperparameters


One of the steps in the Machine Learning system design is to optimize the model. But wait, we were mentioning that the model learns the parameters by performing an optimization of the loss function, what can we do further than that? Some of the parameters that the model can't learn are called hyperparameters. We can tune the hyperparameters to find the best model for our problem.

## Hyperparameters

Up to this point we were only using __parameters__ (for example parameters of linear regression are it's weight(s)).

> Hyperparameters control the learning process. In contrast to parameters, they either can not or should not be learned from data. They are usually set and fixed before training process starts.

E.g. learning rate should not be learned from the data
- to find the best learning rate would require to train the model completely, using many different learning rates.
- this would be computationally expensive

E.g. the order of polynomial features to pass to the model (e.g. $x^2$, $x^3$) should not be learned from the data
- Hyperparameters like this control "representational capacity" of the model - the complexity between inputs and outputs which the model can represent. E.g. can the model represent wavy relationships or only straight line relationships?
- If this hyperparameter were to be optimised by the model, it would obviously choose to be able to represent more complex input-output relationships so that it can perform best on the data, but this will cause it to overfit to the data and not generalise (more on this later).

Other hyperparameters include:
- learning rate
- batch size
- regularisation parameter (more on this later)
- order of the polynomial features included 

Those are essential and prevalent in machine learning, see code cell below:

In [None]:
from sklearn.ensemble import RandomForestRegressor

regressors = [
    RandomForestRegressor(n_estimators=10, criterion="mae"),
    RandomForestRegressor(n_estimators=50, min_samples_leaf=2),
    RandomForestRegressor(),
]

Above we have a single classification machine learning method called Random Forest (we will get to how it works in next module).

What interests us are `__init__` parameters we provided (`n_estimators`, `criterion`, `min_samples_leaf`). __They are examples of hyperparameters__ you can set before fitting them to data.

What can happen after setting them incorrectly?
- our algorithm may __under/overfit__ (more details later)
- it might not converge __at all__ in some cases

When we do it right (at least more or less) we can observe:
- improved convergence & faster training time
- lower loss & better performance on test data

You can probably tell by now how crucial those things are. The question is how to find them?

Other examples of hyperparameters include:
- batch size
- polynomial for linear/logistic regression (will see more about those)

## Finding hyperparameters

There are a couple ways to find those:
- experience, after some time you get an idea of what should work and what might not, present especially in deep learning
- algorithmic (we will focus on this one) - we try a set of possible hyperparameters
and choose the best one
- mix of both - you know the boundaries that should yield good results (say `64 < batch_size < 1024`) but you are not certain about exact value so you employ algorithmic approach to find them

Later you will get enough info to get you started using the last approach

## Summary

- Hyperparameters are parameters controlling behaviour of our algorithm __which cannot be learned__ (or cannot be at the current stage of our knowledge)
