Random search is a method in which random combinations of hyperparameters are selected and used to train a model. The best random hyperparameter combinations are used. There are a few similarities between a random search and a grid search.

In [1]:
import numpy as np
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target

In [2]:
from sklearn.ensemble import RandomForestRegressor

In [3]:
rf = RandomForestRegressor(random_state = 35)

In [4]:
from sklearn.model_selection import RandomizedSearchCV

In [5]:
n_estimators = [int(x) for x in np.linspace(start = 1, stop = 20, num = 20)] # number of trees in the random forest
max_features = ['auto', 'sqrt'] # number of features in consideration at every split
max_depth = [int(x) for x in np.linspace(10, 120, num = 12)] # maximum number of levels allowed in each decision tree
min_samples_split = [2, 6, 10] # minimum sample number to split a node
min_samples_leaf = [1, 3, 4] # minimum sample number that can be stored in a leaf node
bootstrap = [True, False] # method used to sample data points

random_grid = {'n_estimators': n_estimators, 'max_features': max_features, 'max_depth': max_depth, 'min_samples_split': min_samples_split, 'min_samples_leaf': min_samples_leaf, 'bootstrap': bootstrap}

Similarly to our grid search implementation, we will carry out cross-validation in a random search. This is enabled by RandomizedSearchCV. By specifying cv=5, we train a model 5 times using cross-validation.
Furthermore, when we carried out grid search, we had verbose=0 to avoid slowing down our algorithm. In this case, we can use verbose=2 to have a glimpse of the logging information generated.

We have the n_iter parameter that allows us to carry out $n$ different iterations, when n_jobs = -1, all CPUs are put to use.

In [6]:
rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid, n_iter = 100, cv = 5, verbose=2, random_state=35, n_jobs = -1)

In [7]:
rf_random.fit(X,y)

# this prints the contents of the parameters in the random grid
print ('Random grid: ', random_grid, '\n')

# print the best parameters
print ('Best Parameters: ', rf_random.best_params_, ' \n')

Fitting 5 folds for each of 100 candidates, totalling 500 fits
Random grid:  {'n_estimators': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 'max_features': ['auto', 'sqrt'], 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120], 'min_samples_split': [2, 6, 10], 'min_samples_leaf': [1, 3, 4], 'bootstrap': [True, False]} 

Best Parameters:  {'n_estimators': 10, 'min_samples_split': 10, 'min_samples_leaf': 4, 'max_features': 'auto', 'max_depth': 70, 'bootstrap': True}  



  warn(


The output gives the best parameters as 10 for n_estimators and min_samples_split. It also gives 4 for min_samples_leaf, auto for max_features, 70 for max_depth, and true for bootstrap.

**Random Search vs. Grid Search**
(i) Higher dimensionality leads to a greater number of iterations. With grid search, the greater the dimensionality, the greater the number of hyperparameter combinations to search for. Hence, it is better to go with the random search, which will lead to a reduction in the number of parameters.


(ii) The random search model can be trained on the optimized parameters in a much shorter time than when using grid search. This also results in much more efficient computational power being used in comparison to grid search.