# Video: Validation Sets for Early Stopping

This video demonstrates the use of validation sets to decide when to stop training a model.

## Epoch Training

Some models are able to keep improving by repeatedly fitting the training data.
* Neural networks keep updating weights to reduce errors.
* Gradient boosted trees add more trees focused on fixing errors.
* Each round of fitting the whole train set is called an "epoch".


Script:
* Some kinds of models can keep improving their training error until they perfectly fit their training data if you keep trying to fit them.
* Neural networks are the best known example of this.
* If you repeatedly try to fit a training set and the neural network has enough parameters, it will usually fit the training set perfectly if you are patient enough.
* Other algorithms use a technique called "boosting" where they repeatedly focus on fitting small models to the current errors, and will slowly reduce overall errors.
* But the overall model gets slower as the number of small models that it uses gets bigger.
* These methods will be covered in later modules, so you don't need to remember these details now.
* The one bit you need to know is that these kinds of models let you keep trying to improve training error, and will improve more as you try longer.
* Perhaps fittingly, each round of trying is called an epoch in reference to the unit of geological time, as this process can take a very long time.

## What is Early Stopping?

* Training in epochs reduces training error over time.
* Early epochs tend to fix big/broad mistakes.
  * Usually generalizing.
* Later epochs tend to fix small/narrow mistakes.
  * Usually overfitting.
* Early stopping tries to stop at the transition from generalizing to overfitting.


Script:
* Early stopping aims to try harder as long as the model continues to generalize better, but stop when it is just overfitting.
* Usually these epoch-based training process tend to improve their training loss and generalization early on, but eventually generalization flattens out, or even gets worse.
* Early stopping tries to catch this transition rather than hoping a preset number of epochs is just right.
* How does it work?

## How Does Early Stopping Work?

Validation sets!
* Check validation loss after each epoch.
* If a few epochs pass without a meaningful improvement, stop.

Script:
* Of course, the way this work is through validation sets.
* After each epoch, the validation loss is checked.
* And if it stops improving meaningfully for a few epochs, the training process is stopped.
* The assumption is that the test loss will stop improving around the same number of epochs, so checking the validation loss will help us get close to the ideal number of epochs.

## Scikit-Learn Support

Preview from later modules:
* `GradientBoostingRegressor` => `n_iter_no_change`, `tol`
* `MLPRegressor` => `early_stopping`, `n_iter_no_change`, `tol`
* Automatic validation set if early stopping is enabled.

Script:
* Wrapping up, scikit-learn has support for this early stopping behavior.
* And it automatically creates a validation set if you use early stopping.
* Again, you do not need to learn these classes now, but I will show you a quick example now.

In [None]:
import pandas as pd

from sklearn.model_selection import train_test_split

In [None]:
abalone = pd.read_csv("https://raw.githubusercontent.com/bu-cds-omds/dx602-examples/main/data/abalone.tsv", sep="\t")

In [None]:
abalone_target = abalone["Rings"]
abalone_features = abalone.drop(["Rings", "Sex"], axis=1)

In [None]:
train_features, test_features, train_target, test_target = train_test_split(abalone_features, abalone_target, test_size=0.2, random_state=2024)

Script:
* I'll show you a quick example with gradient boosting.

In [None]:
import sklearn.ensemble

In [None]:
no_stop_model = sklearn.ensemble.GradientBoostingRegressor(n_estimators=1000, max_depth=5, random_state=42)
no_stop_model.fit(train_features, train_target)
no_stop_model.score(test_features, test_target)

0.3996592381771311

Script:
* This is better than most of the decision tree models that we've seen this week, but substantially worse than linear regressions which we have seen get an $R^2$ score of 48% on this dataset.
* Let's repeat with early stopping.

In [None]:
early_stop_model = sklearn.ensemble.GradientBoostingRegressor(n_estimators=1000, max_depth=5, random_state=42, n_iter_no_change=10)
early_stop_model.fit(train_features, train_target)
early_stop_model.score(test_features, test_target)

0.4988235292110631

Script:
* And the early stopping model got an $R^2$ of 50%, slightly beating out the linear regression.
* How many trees did it use?

In [None]:
early_stop_model.n_estimators_

57

Script:
* 57 trees, vs 1000 of the previous model.
* How much faster is that?

In [None]:
1000 / 57

17.54385964912281

Script:
* So early stopping gave a better model that runs 17 times faster.
* Using the validation set made this work.