# Hyperparameter tuning

### 1. What is hyperparameter tuning?
### 2. How to do it?

### 1. What is hyperparameter?
-> Parameters that are set before the model is trained.

-> These parameters have to be set before you train the model. 

-> These parameters have a direct impact on model's performance. 

### 2. How to do it?

-> Grid Search

-> Random Search

-> Bayesian Optimization

-> Gradient Descent


### `Grid Search`

-> Grid Search is a technique used to find the best hyperparameters for a model.

-> It involves specifying a range of values for each hyperparameter and then training the model on all possible combinations of these values.

-> The best combination of hyperparameters is the one that gives the highest performance.

### `Random Search`

-> Random Search is a technique used to find the best hyperparameters for a model.

-> It involves specifying a range of values for each hyperparameter and then training the model on all possible combinations of these values.

-> The best combination of hyperparameters is the one that gives the highest performance.

### `Bayesian Optimization`

-> Bayesian Optimization is a technique used to find the best hyperparameters for a model.

-> It involves specifying a range of values for each hyperparameter and then training the model on all possible combinations of these values.

-> The best combination of hyperparameters is the one that gives the highest performance.

### `Gradient Descent`

-> Gradient Descent is a technique used to find the best hyperparameters for a model.

-> It involves specifying a range of values for each hyperparameter and then training the model on all possible combinations of these values.

-> The best combination of hyperparameters is the one that gives the highest performance.  

---

# Cross-validation

-> Cross-validation is a technique used to evaluate the performance of a model.

-> It involves splitting the data into training and validation sets and then training the model on the training set and evaluating it on the validation set.

-> The best combination of hyperparameters is the one that gives the highest performance.

In [2]:
# importing libraries

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split , GridSearchCV
from sklearn.metrics import accuracy_score , precision_score , recall_score , f1_score

In [3]:
# load the data

from sklearn.datasets import load_iris

iris = load_iris()


In [4]:
X = iris.data
y = iris.target

In [5]:
# load the model

model = RandomForestClassifier()

param_grid = {
    'n_estimators': [50, 100, 200, 300, 400, 500],
    'max_depth': [4, 5 ,6, 7, 8, 9, 10],
    'max_features': ['auto', 'sqrt', 'log2'],
    'criterion' : ['gini' , 'entropy']
}

In [8]:
# set the grid

grid = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    cv=5,
    scoring='accuracy',
    verbose=1,
    n_jobs=-1
)

In [9]:
# fit the model

grid.fit(X , y)

Fitting 5 folds for each of 252 candidates, totalling 1260 fits


420 fits failed out of a total of 1260.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
301 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\Hamad\miniconda3\envs\python_ml\Lib\site-packages\sklearn\model_selection\_validation.py", line 888, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\Hamad\miniconda3\envs\python_ml\Lib\site-packages\sklearn\base.py", line 1466, in wrapper
    estimator._validate_params()
  File "c:\Users\Hamad\miniconda3\envs\python_ml\Lib\site-packages\sklearn\base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "c:\Users\Hamad\miniconda3\envs\python_ml\Lib\site-packages\sklearn\utils\_param_validation.py", l

In [10]:
print(grid.best_params_)

{'criterion': 'gini', 'max_depth': 4, 'max_features': 'sqrt', 'n_estimators': 50}


In [6]:
from sklearn.model_selection import RandomizedSearchCV

grid = RandomizedSearchCV(
    estimator=model,
    param_distributions=param_grid,
    n_iter=5,
    cv=5,
    scoring='accuracy',
    verbose=1,
    n_jobs=-1
)

grid.fit(X,y)

Fitting 5 folds for each of 5 candidates, totalling 25 fits


10 fits failed out of a total of 25.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
1 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\Hamad\miniconda3\envs\python_ml\Lib\site-packages\sklearn\model_selection\_validation.py", line 888, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\Hamad\miniconda3\envs\python_ml\Lib\site-packages\sklearn\base.py", line 1466, in wrapper
    estimator._validate_params()
  File "c:\Users\Hamad\miniconda3\envs\python_ml\Lib\site-packages\sklearn\base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "c:\Users\Hamad\miniconda3\envs\python_ml\Lib\site-packages\sklearn\utils\_param_validation.py", line 9

In [7]:
print(grid.best_params_)

{'n_estimators': 300, 'max_features': 'sqrt', 'max_depth': 10, 'criterion': 'entropy'}
