## Parameter tuning: RandomizedSearchCV

Most machine learning models have so-called hyperparameters that should be tuned for optimum results. There is various utilities for tuning parameters, here i'm going to show how to use the RandomizedSearchCV utility. This utility randomly tries parameter combinations from predefined value ranges to find the best combination. It tries a fixed number of combinations, so the total running time can be limited which can be handy for large datasets and/or models with many parameters. You might not find the absolute best value of course, in a limited number of iterations...

First we load the dataset:

In [1]:
from sklearn import datasets

X,y = datasets.load_diabetes(return_X_y=True)

Next we configure the RandomizedSearchCV utility, by specifying the model to be tuned and the parameter ranges:

In [2]:
from sklearn.model_selection import RandomizedSearchCV, ShuffleSplit
from sklearn.ensemble import RandomForestRegressor
from scipy.stats.distributions import uniform, randint

gsc = RandomizedSearchCV(
    estimator=RandomForestRegressor(),
    param_distributions={
        'n_estimators': randint(10, 50),
        'min_samples_leaf': randint(5, 15),
        'min_samples_split': randint(10, 30),
        'max_depth': randint(2, 7),
    },
    cv=ShuffleSplit(n_splits=10, test_size=0.2),
    n_jobs=4,
    n_iter=100
)

grid_result = gsc.fit(X, y)

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

Best: 0.493507 using {'max_depth': 4, 'min_samples_leaf': 8, 'min_samples_split': 22, 'n_estimators': 42}


So now with trying just 100 combinations the utility has found a suitable combination of parameters. With an exhaustive search overe the same parameter space we would have tried 40x10x20x5 = 40.000 combinations to find the absolute best result. This is the strength of this utility, quickly exploring the parameter space for a decent solution. 