# Hyperparameter Tuning
Hyperparameter tuning is the process of finding the best combination of hyperparameter for a given model. In this notebook, we will use the [Azure Machine Learning Service](https://azure.microsoft.com/en-us/services/machine-learning/) to perform hyperparameter tuning for a [scikit-learn](https://scikit-learn.org/stable/) model.

**Types**
- Grid Search: The model is trained for a set of hyperparameter values and the best combination is selected based on the score.
- Random Search: The model is trained for a set of hyperparameter values and the best combination is selected randomly.
- Bayesian Optimization: The model is trained for a set of hyperparameter values and the best combination is selected based on the score.
- Gradient-Based Optimization: The model is trained for a set of hyperparameter values and the best combination is selected based on the score.

# Cross Validation
Cross Validation is a technique used to evaluate the performance of a model on unseen data. It is used to check how well the model generalize to new data.

In [8]:
# Import libraries
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# To remove warnings
import warnings
warnings.filterwarnings('ignore')


In [9]:
# load the data
from sklearn.datasets import load_iris
iris = load_iris()

X = iris.data # features
y = iris.target # target lables

In [18]:
# Define the model
model = RandomForestClassifier()

# Create the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200, 300, 400, 500],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth': [4, 5, 6], # default = None
    # 'min_samples_split': [2, 5, 10], # default = 2
    # 'min_samples_leaf': [1, 2, 4], # default = 1
    # 'criterion': ['gini', 'entropy']
}

# Create the GridSearchCV or create a grid
grid = GridSearchCV(
    estimator=model, 
    param_grid=param_grid, 
    cv=5, 
    scoring='accuracy', 
    verbose=1, 
    n_jobs=-1)

# Fit the model
grid.fit(X, y)

# print best parameters
print(f"Best Parameters: {grid.best_params_}")


Fitting 5 folds for each of 54 candidates, totalling 270 fits


Best Parameters: {'max_depth': 4, 'max_features': 'sqrt', 'n_estimators': 100}


In [11]:
from sklearn.model_selection import RandomizedSearchCV
# Define the model
model = RandomForestClassifier()

# Create the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200, 300, 400, 500],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth': [None, 4, 5, 6, 7, 8, 9, 10],
    'min_samples_split': [2, 5, 10], # default = 2
    'min_samples_leaf': [1, 2, 4], # default = 1
    'criterion': ['gini', 'entropy']
}

# Create the RandomizedSearchCV or create a grid
grid = RandomizedSearchCV(
    estimator=model, 
    param_distributions=param_grid, 
    cv=5, 
    scoring='accuracy', 
    verbose=1, 
    n_jobs=-1, 
    n_iter=20
)

# Fit the model
grid.fit(X, y)

# print best parameters
print(f"Best Parameters: {grid.best_params_}")

Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Parameters: {'n_estimators': 400, 'min_samples_split': 5, 'min_samples_leaf': 1, 'max_features': 'sqrt', 'max_depth': None, 'criterion': 'gini'}
