# Cross-Fold Validation
In the circumestances where we have little training data, not enough to afford to split off a separate test set. We can use cross-fold validation to get a better estimate of model performance.

This esentially means, split up the dataset into k folds, we then train on k-1 folds and validate on the remaining fold. We repeat this k times, each time with a different fold held out for validation. Finally we average the k validation scores to get a better estimate of model performance.

In [None]:
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline

# We won't bother writing out the whole code.

some_pipeline = Pipeline(steps=[
    ('preprocessor', SimpleImputer()),
    ('model', RandomForestRegressor(n_estimators=50, random_state=0))
])

# Here we use cross-fold validation. cv is the number of folds
scores = cross_val_score(some_pipeline, some_X, some_y,
                              cv=5,
                              scoring='mean_absolute_error')

# We want the mean of all scores
print("Average MAE score:", scores.mean())
