# Hyperparameter Tuning

Hyperperameter tuning is the process of finding the best combination of hyperparameters for a given model.
**Types** :
- Grid Search: Exhaustive search over all possible combinations of hyperparameters.
- Random Search: Randomly sample combinations of hyperparameters from a given distribution.
- Bayesian Optimization: Model the objective function and search for the maximum.
- Gradient-based Optimization: Use gradient descent to find the minimum of the objective function.


# Cross Validation
Cross validation is a technique used to evaluate the performance of a model on unseen data. It is used to check how well the
model generalizes to new data.

In [1]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix


In [6]:
# load the data set
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
Y = iris.target

In [7]:
# select the model
model = RandomForestClassifier()

# create the model parameter grid
param_grid = {
    'n_estimators': [50,100,200,300,400,500],
    'max_features': ['auto','sqrt','log2'],
    'max_depth': [4,5,6,7,8,9,10],
    'criterion': ['gini','entropy']
}

# create the grid search object
grid = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    cv=5,
    scoring='accuracy',
    verbose=1,
    n_jobs=-1
    )

# fit the model
grid.fit(X,Y)

#print the best parameters
print(f"Best parameters: {grid.best_params_}")

Fitting 5 folds for each of 252 candidates, totalling 1260 fits


420 fits failed out of a total of 1260.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
344 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\Sumit Sharma\.conda\envs\python_ml\Lib\site-packages\sklearn\model_selection\_validation.py", line 866, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\Sumit Sharma\.conda\envs\python_ml\Lib\site-packages\sklearn\base.py", line 1382, in wrapper
    estimator._validate_params()
  File "c:\Users\Sumit Sharma\.conda\envs\python_ml\Lib\site-packages\sklearn\base.py", line 436, in _validate_params
    validate_parameter_constraints(
  File "c:\Users\Sumit Sharma\.conda\envs\python_ml\Lib\site-packages\sklearn\utils\_param_valid

Best parameters: {'criterion': 'gini', 'max_depth': 4, 'max_features': 'sqrt', 'n_estimators': 50}


In [8]:
# create the model parameter grid
param_grid = {
    'n_estimators': [50,100,200,300,400,500],
    'max_depth': [4,5,6,7,8,9,10],
}

# create the grid search object
grid = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    cv=5,
    scoring='accuracy',
    verbose=1,
    n_jobs=-1
    )

# fit the model
grid.fit(X,Y)

#print the best parameters
print(f"Best parameters: {grid.best_params_}")

Fitting 5 folds for each of 42 candidates, totalling 210 fits
Best parameters: {'max_depth': 4, 'n_estimators': 100}


In [11]:
%%time
# create the model parameter grid
from sklearn.model_selection import RandomizedSearchCV
param_grid = {
    'n_estimators': [50,100,200,300,400,500],
    'max_features': ['auto','sqrt','log2'],
    'max_depth': [4,5,6,7,8,9,10],
    'criterion': ['gini','entropy']
}

# create the grid search object
grid = RandomizedSearchCV(
    estimator=model,
    param_distributions=param_grid,
    cv=5,
    scoring='accuracy',
    verbose=1,
    n_jobs=-1

)

# fit the model
grid.fit(X,Y)

#print the best parameters
print(f"Best parameters: {grid.best_params_}")

Fitting 5 folds for each of 10 candidates, totalling 50 fits


20 fits failed out of a total of 50.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
4 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\Sumit Sharma\.conda\envs\python_ml\Lib\site-packages\sklearn\model_selection\_validation.py", line 866, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\Sumit Sharma\.conda\envs\python_ml\Lib\site-packages\sklearn\base.py", line 1382, in wrapper
    estimator._validate_params()
  File "c:\Users\Sumit Sharma\.conda\envs\python_ml\Lib\site-packages\sklearn\base.py", line 436, in _validate_params
    validate_parameter_constraints(
  File "c:\Users\Sumit Sharma\.conda\envs\python_ml\Lib\site-packages\sklearn\utils\_param_validation

Best parameters: {'n_estimators': 50, 'max_features': 'log2', 'max_depth': 10, 'criterion': 'gini'}
CPU times: total: 766 ms
Wall time: 18 s
