# Using `train_val_test_split` for Rigorous Modeling
Thanks to [this](https://towardsdatascience.com/automatic-hyperparameter-tuning-with-sklearn-gridsearchcv-and-randomizedsearchcv-e94f53a518ee) article for providing some code used below in the automatic hyperparameter tuning.
Load example data from sklearn

In [19]:
from sklearn import datasets
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)

Split the data into training, testing, and validation.

In [20]:
from astartes import train_val_test_split
X_train, X_val, X_test, y_train, y_val, y_test = train_val_test_split(diabetes_X, diabetes_y)

Create a baseline model without tuning it for better performance on our data:

In [26]:
from sklearn.ensemble import RandomForestRegressor
rfr_baseline = RandomForestRegressor(n_estimators=5)
rfr_baseline.fit(X_train, y_train)
rfr_baseline.score(X_val, y_val)

0.3515039651924021

Now try and find some better model parameters by tuning the model, in this case with an automatic tuner:

In [28]:
from sklearn.model_selection import RandomizedSearchCV
import numpy as np

rdr_tuned = RandomForestRegressor()

n_estimators = np.arange(5, 50, step=5)
max_depth = list(np.arange(2, 20, step=2)) + [None]

param_grid = {
    "n_estimators": n_estimators,
    "max_depth": max_depth,
}

random_cv = RandomizedSearchCV(
    rdr_tuned, param_grid, cv=3, n_iter=50, scoring="r2", n_jobs=-1, verbose=1
)
random_cv.fit(X_train, y_train)
random_cv.score(X_val, y_val)

Fitting 3 folds for each of 50 candidates, totalling 150 fits


0.5287961605324687

Before celebrating our substantial improvement, let's make sure that the model performs well on the data it was not tuned to be good on:

In [29]:
random_cv.score(X_test, y_test)

0.47114420960992875

The performance is lower on the out-of-sample data than it is on the tuning data, which is a sign that the results might not be generalizable. If future measurements were to be taken, we cannot be sure that they would work with this data. We should try to improve our model further or re-evaluate our modeling approach!