# **Hyperparameter Tuning Techniques**
In this notebook we will explore how we can acheive the best accracy for ourmodel by adjusting the hyperparameters for our model. 
We will explore some techniques for hyperparameter tuning but first we need to understand the difference between hyperparameters and parameters of  model.

---
# **Difference between Parameter and Hyperparameter**

## **HYPERPARAMETERS**
- **Predetermined Variable:** A hyperparameter is a variable that is predetermined prior to the start of the training process.
- **Regulation of Learning Behavior:** Hyperparameters, such as learning rate, regularization intensity, number of epochs, etc., regulate how the learning algorithm behaves.
- **Top-Level Parameters:** The prefix "hyper_" implies that these are "top-level" parameters that govern the process of learning and the resulting model parameters.
- **User Selection and Configuration:** Prior to the training phase, the user selects and configures the hyperparameter settings.
- **External to the Model:** Because their values cannot be altered during the training phase, hyperparameters are regarded as being external to the model.


***Basically, anything in machine learning and deep learning that you decide their values or choose their configuration before training begins and whose values or configuration will remain the same when training ends is a hyperparameter.***

#### **SOME COMMON HYPERPARAMETERS IN REGRESSION AND CLASSIFICATION MODELS**
- Train-test split ratio
- Learning rate in optimization algorithms (e.g. gradient descent)
- The choice of cost or loss function the model will use
- Number of iterations (epochs) in training a nn
- Batch size
- Regularization Parameter


## **PARAMETERS**
- **Learned from Data**: During the training phase, a parameter is a variable that is learned from the data.
- **Forecasting and Relationship Representation**: It is employed to forecast fresh data and to depict the underlying relationships in the data.
- **Internal to the Model**: Conversely, parameters are found inside the model.
- **Initialization and Optimization**: Parameters are usually initialized to certain values (random values or set to zeros) before the model is trained. An optimization procedure (such as gradient descent) is used to change the initial values while training or learning proceeds.
- **Continuous Updating**: As learning progresses, the learning algorithm updates the parameter values continuously, but the hyperparameter values that the model designer selected stay the same.

#### **SOME COMMON PARAMETERS**
- The coefficients (or weights) of linear and logistic regression models.
- Weights and biases of a nn
- The cluster centroids in clustering
- The support vectors in a support vector machine.


---
# **Hyperparameter Tuning**
- Hyperparameters directly affect the performance, functionality, and structure of the model. 
- Data scientists can adjust model performance through hyperparameter tuning to achieve the best possible outcomes. 
- Selecting the right hyperparameter values is critical to the success of this process, which is a fundamental component of machine learning.


For example, assume you're using the learning rate of the model as a hyperparameter. If the value is too high, the model may converge too quickly with suboptimal results. Whereas if the rate is too low, training takes too long and results may not converge. A good and balanced choice of hyperparameters results in accurate models and excellent model performance.

## **Hyperparameter Tuning Techniques**
There care several Hyperarameter tuning techniques which are as follosws:

**1. Manual Search**

**2. Random Search**

**3. Grid Search**

**4. Halving**

**5. Bayesian Optimization**

We will go through all the techniques one by one but first we need to prepare our data.
I am using two kinds of datasets, one for classification model, and one for regression models.

In [2]:
import pandas as pd
import numpy as np
from scipy.stats import uniform
from sklearn.datasets import make_regression
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC, SVR
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, mean_squared_error, mean_absolute_error

## **Regression Data**

In [3]:
X_reg, y_reg = make_regression(n_samples = 1000, n_features = 4, n_informative = 2, n_targets = 1)
# Convert to DataFrame
df_reg = pd.DataFrame(X_reg, columns=[f'Feature_{i+1}' for i in range(X_reg.shape[1])])
df_reg['Target'] = y_reg

# Display the first few rows of the DataFrame
df_reg.head()

Unnamed: 0,Feature_1,Feature_2,Feature_3,Feature_4,Target
0,-0.139252,-0.078737,0.803363,0.546235,16.020034
1,0.325973,0.825121,0.686512,0.610158,58.91625
2,-1.988225,0.198176,-0.717921,1.365411,-99.680732
3,-1.482408,0.324907,-1.587709,-0.756835,-164.921763
4,-0.289407,0.222192,1.229131,0.252708,-11.755181


In [4]:
# Feature and target
X_reg = df_reg.drop(columns = 'Target').values
y_reg = df_reg['Target'].values

In [5]:
# Chacking the shape
X_reg.shape, y_reg.shape

((1000, 4), (1000,))

In [6]:
# train-test split
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_reg, y_reg, test_size = 0.2, random_state = 42)

In [7]:
# checking the shape
X_train_reg.shape, X_test_reg.shape, y_train_reg.shape, y_test_reg.shape

((800, 4), (200, 4), (800,), (200,))

## **Classification Data**

In [8]:
X_class, y_class = make_classification(n_samples=1000, n_features=5, n_classes = 3, n_informative=3)

In [9]:
# Convert to DataFrame
df_class = pd.DataFrame(X_class, columns=[f'Feature_{i+1}' for i in range(X_class.shape[1])])
df_class['Target'] = y_class

# Display the first few rows of the DataFrame
df_class.head()

Unnamed: 0,Feature_1,Feature_2,Feature_3,Feature_4,Feature_5,Target
0,1.308973,0.56817,-0.001478,-0.302649,0.445059,1
1,-1.352789,-2.588103,2.210688,-1.275072,-1.671157,0
2,1.350578,-0.198136,-0.04776,-1.585347,1.145719,1
3,0.48799,2.725249,-2.036538,2.408843,0.746541,0
4,-0.927815,0.157856,-0.268651,0.908456,-0.420706,2


In [10]:
# Feature and target
X_class = df_class.drop(columns = 'Target').values
y_class = df_class['Target'].values

In [11]:
# Chacking the shape
X_class.shape, y_class.shape

((1000, 5), (1000,))

In [12]:
# train-test split
X_train_class, X_test_class, y_train_class, y_test_class = train_test_split(X_class, y_class, test_size = 0.2, random_state = 42)

In [13]:
# checking the shape
X_train_class.shape, X_test_class.shape, y_train_class.shape, y_test_class.shape

((800, 5), (200, 5), (800,), (200,))

Now we have prepared our data into train and test sets. Let's explore all the tuning techniues on our data.

# **1. MANUAL SEARCH**
- In manual search, as the name suggests the hyperparameters are manually selcted by the user.
- The hyperparameters are are configured based on the intuition experience or domain knowledge.
- We manually try different hyperparameter values, observe the results, and adjust the values based on performance.
- This approach can be time-consuming and may require many iterations.

For example, if we try to train a Support Vector Classifier on our classification data we need to give the hyperparameters for training. But how can we decide which values to use that will give the best results. 

One solution can be if we manually define all the possible hyparapmeters values and train the model on all values and see the training results.

So first let's train the model on our data.

First we make a dictionary for all the possible hyperparameter values we want our model to train on.

In [14]:
# Sets of hyperparameters
params_list = [
    {'C': 0.1, 'kernel': 'rbf', 'gamma': 0.1},
    {'C': 0.01, 'kernel': 'poly', 'degree': 3},
    {'C': 0.1, 'kernel': 'rbf', 'gamma': 0.01},
    {'C': 0.01, 'kernel': 'poly', 'degree': 2},
    {'C': 0.1, 'kernel': 'sigmoid', 'gamma': 0.1}
]

In [15]:
# Train and evaluate models
for i, params in enumerate(params_list, 1):
    # Create and train the model with the given hyperparameters
    model = SVC(C=params['C'], kernel=params['kernel'], gamma=params.get('gamma', 'scale'), degree=params.get('degree', 3))
    model.fit(X_train_class, y_train_class)
    
    # Predict and evaluate the model
    y_pred = model.predict(X_test_class)
    accuracy = accuracy_score(y_test_class, y_pred)
    
    # Print the accuracy for the current set of hyperparameters
    print(f"Parameters set {i}: {params}")
    print(f"Accuracy: {accuracy:.2f}\n")

Parameters set 1: {'C': 0.1, 'kernel': 'rbf', 'gamma': 0.1}
Accuracy: 0.84

Parameters set 2: {'C': 0.01, 'kernel': 'poly', 'degree': 3}
Accuracy: 0.60

Parameters set 3: {'C': 0.1, 'kernel': 'rbf', 'gamma': 0.01}
Accuracy: 0.70

Parameters set 4: {'C': 0.01, 'kernel': 'poly', 'degree': 2}
Accuracy: 0.46

Parameters set 5: {'C': 0.1, 'kernel': 'sigmoid', 'gamma': 0.1}
Accuracy: 0.65



From the abve output we can see that Parameter set 3 give the better accuracy than all the other sets. But is it the optimal values!

Well we can't decide that because there can be other combinations of hyperparameters too that we haven't tried yet. If we try all the combinations of parameters manually it is going to be a tedious task. So to counter that we will use some automated techniques like Random Search and Grid Search.



# **2. GRID SEARCH**
- Grid Search is a hyperparameter tuning technique used to find the optimal hyperparameters for a machine learning model. 
- It involves exhaustively searching through a specified subset of the hyperparameter space of the learning algorithm. 
- By evaluating the model's performance for each combination of hyperparameters, Grid Search helps in identifying the best set of parameters that maximize the model's accuracy or other performance metrics.
- However, due to its exhaustive nature, Grid Search can become time-consuming and resource-intensive, particularly as the number of hyperparameters increases.

### **Cross-Validation and GridSearchCV**
- In GridSearchCV, along with Grid Search, cross-validation is also performed.
- In each iteration, it will record the performance of the model and at the end give the average of all the performance. Thus, it is also a time-consuming process.
- Thus, GridSearch along with cross-validation takes huge time cumulatively to evaluate the best hyperparameters.

**`GridSearchCV()`** method is available in the **`scikit-learn`** class **`model_selection`**.
The description of arguments is given below:

**1. `estimator`:** A scikit-learn model.

**2. `param_grid`:** Dictionary of parameters.

**3. `scoring`:** Performance measure. 'r2' for regresion models, 'precision' for classification models.

**4. `cv`:** Number of folds for K-fold cross-validation.

Now let's apply GridSearchCV on regression model and calssification model and see the results.

### **Applying GridSearchCV on Regression Model**

In [16]:
# Define the model
model = RandomForestRegressor(random_state = 42)

In [17]:
# Define the parameter grid
param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [None, 10, 20, 30],
    'max_features': ['sqrt', 'log2']
}

So now we have defined our grid with all the possible hyperparameters we want to try. Now we will define an object of grid search and pass the model and parameter grid to it and train our model.

In [18]:
from sklearn.model_selection import GridSearchCV
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3)

In [19]:
grid_search.fit(X_train_reg, y_train_reg)

In [20]:
# Best hyperparameters from Grid Search
best_params = grid_search.best_params_
print(f"Best Hyperparameters: {best_params}")

Best Hyperparameters: {'max_depth': 20, 'max_features': 'sqrt', 'n_estimators': 300}


Now from the above output we can get the best hyperparameter combination for our model, as we can see that for the random forest regressor the best hyperparameters for our data are: 

`max_depth`: 20

`max_features`: `sqrt`

`n_estimators`: 300

In [21]:
# Evaluate the best model on the test set
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test_reg)

# Calculate and print the mean squared error
mse = mean_absolute_error(y_test_reg, y_pred)
print(f"Mean Absolute Error: {mse:.2f}")

Mean Absolute Error: 8.15


### **Applying GridSearchCV on Classification Model**
Now let's apply K-Nearest Neighbours model to our classification data using the Grid Search CV and see the results.

In [22]:
knn_model = KNeighborsClassifier()

In [23]:
param_grid = {
    'n_neighbors': list(range(1,10)),
    'algorithm': ('auto', 'ball_tree', 'kd_tree' , 'brute') 
}

In [24]:
grid_search = GridSearchCV(estimator=knn_model, param_grid=param_grid, cv=3)

In [25]:
grid_search.fit(X_train_class, y_train_class)

In [26]:
# Best hyperparameters from Grid Search
best_params = grid_search.best_params_
print(f"Best Hyperparameters: {best_params}")

Best Hyperparameters: {'algorithm': 'auto', 'n_neighbors': 3}


In [27]:
# Evaluate the best model on the test set
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test_class)

# Calculate and print the mean squared error
acc = accuracy_score(y_test_class, y_pred)
print(f"Accuracy Score: {acc:.2f}")

Accuracy Score: 0.90


# **3. RANDOM SEARCH**
- In Grid Search we had to face the issue of time consumption becuase the algorithm tries all the possible cominations of hyperparameters.
- So randomizaed search is a better option as it is used to train the models based on random hyperparameters and combinations. obviously, the number of training models are small column than grid search.
- This method is particularly useful when the hyperparameter space is large and exhaustive search becomes computationally expensive.

The `RandomizedSearchCV()` method in `scikit-learn` is also found in the `model_selection` module.Here’s a description of the main arguments:

**1. `estimator`:** A scikit-learn model that you want to optimize.

**2. `param_distributions`:** A dictionary where the keys are the parameter names and the values are distributions or lists of parameters to sample from.

**3. `n_iter`:** Number of epochs.

**4. `scoring`:** A single string or callable to evaluate the predictions on the test set. For regression models, common choices are 'r2' or 'neg_mean_squared_error', while for classification models, options include 'accuracy', 'precision', 'recall', etc. If None, the estimator's default scoring method is used.

**5. `cv`:** The number of folds for cross-validation.

### **Applying RadomizedSearchCV on Regression Model**
Let's apply Suppport vector regressor to our regression data.

In [28]:
model = SVR()

In [29]:
# Define parameter distribution
param_dist = {
    'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
    'degree': [2, 3, 4, 5],
    'C': uniform(0.1, 10),
    'gamma': ['scale', 'auto']
}

In [30]:
from sklearn.model_selection import RandomizedSearchCV
# Create the RandomizedSearchCV object
randomized_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=20, cv=5)

In [31]:
randomized_search.fit(X_train_reg, y_train_reg)

In [32]:
# Get the best hyperparameters and model
best_params_rand = randomized_search.best_params_
print(f"Best Hyperparameters: {best_params_rand}")



Best Hyperparameters: {'C': 5.134850008337606, 'degree': 5, 'gamma': 'scale', 'kernel': 'linear'}


In [33]:
best_model_rand = randomized_search.best_estimator_
# Evaluate the best model
y_pred = best_model_rand.predict(X_test_reg)
# Calculate and print the mean squared error
mse = mean_absolute_error(y_test_reg, y_pred)
print(f"Mean Absolute Error: {mse:.2f}")

Mean Absolute Error: 0.03


### **Applying RadomizedSearchCV on Classification Model**


In [34]:
model = SVC()

In [35]:
# Define parameter distribution
param_dist = {
    'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
    'degree': [2, 3, 4, 5],
    'C': uniform(0.1, 10),
    'gamma': ['scale', 'auto']
}

In [36]:
from sklearn.model_selection import RandomizedSearchCV
# Create the RandomizedSearchCV object
randomized_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=20, cv=5)

In [37]:
randomized_search.fit(X_train_class, y_train_class)

In [38]:
# Get the best hyperparameters and model
best_params_rand = randomized_search.best_params_
best_model_rand = randomized_search.best_estimator_

In [39]:
print(f"Best Hyperparameters: {best_params_rand}")

Best Hyperparameters: {'C': 8.151304233319978, 'degree': 4, 'gamma': 'auto', 'kernel': 'rbf'}


In [40]:
best_model_rand = randomized_search.best_estimator_
# Evaluate the best model
y_pred = best_model_rand.predict(X_test_class)
# Calculate and print the mean squared error
acc = accuracy_score(y_test_class, y_pred)
print(f"Accuracy Score: {acc:.2f}")

Accuracy Score: 0.91


From the above code we can get a ver uclear view of how we can tune our model using Randomized Search.

Let’s sum up the advantages of randomized search:

- Randomized search is efficient when dealing with a large number of hyperparameters or a wide range of values because it doesn't require an -exhaustive search.
- It can handle various parameter types, including continuous and discrete values.


# **4. BAYESIAN OPTIMIZATION**
- Bayesian approaches, in contrast to random or grid search, keep track of past evaluation results which they use to form a probabilistic model mapping hyperparameters to a probability of a score on the objective function.
- Bayesian optimization helps reduce the time required to obtain an optimal parameter set.

Bayesian Grid Search relies on three critical factors:

1. A search space for hyperparameters
2. An objective function
3. A surrogate and selection function

Let’s go through an example step by step, where we’ll see how to use Bayesian Optimization on our model.


`BayesSearchCV` in the `skopt` library offers several parameters that allow you to customize the Bayesian optimization process for hyperparameter tuning. Here’s a detailed explanation of the key parameters:

**1. `estimator`:** estimator object

**2. `search_spaces`:** The parameter space over which to search.

**3. `n_iter`:** The total number of parameter settings to sample.

**4. `error_score`:**  The value to assign to the score if an error occurs during fitting. If 'raise', the error is raised. Otherwise, the specified value is used.

In [41]:
classifier = RandomForestClassifier()

In [42]:
param_space = {
 'max_depth': (1, 50),
 'n_estimators': (50, 100),
 'max_features': ['sqrt', 'log2'],  
 'criterion': ['gini', 'entropy', 'log_loss'],
 'max_samples': (100, 500)
}

In [43]:
from skopt import BayesSearchCV
optimizer = BayesSearchCV( estimator = classifier, search_spaces=param_space, scoring='accuracy', cv=3, n_iter=50)

In [44]:
optimizer.fit(X_train_class, y_train_class)




In [45]:
# After fitting, you can get the best parameters and score as follows:
best_params = optimizer.best_params_
best_score = optimizer.best_score_
print(f"Best parameters: {best_params}")
print(f"Best accuracy score: {best_score}")


Best parameters: OrderedDict([('criterion', 'log_loss'), ('max_depth', 29), ('max_features', 'log2'), ('max_samples', 500), ('n_estimators', 68)])
Best accuracy score: 0.9062778669520242


In [46]:
best_model_rand = randomized_search.best_estimator_
# Evaluate the best model
y_pred = best_model_rand.predict(X_test_class)
# Calculate and print the mean squared error
acc = accuracy_score(y_test_class, y_pred)
print(f"Accuracy Score: {acc:.2f}")

Accuracy Score: 0.91


So by using a maximum depth of 29 with taking maximum features using log, maximum samples 500 we can get the best accuracy for our model. 

# **5. HALVING**

- Halving refers to a technique used in hyperparameter tuning and optimization, particularly in the context of model selection and resource allocation. 
- The idea behind halving methods is to progressively allocate more resources (such as computational time or data) to the most promising configurations, while eliminating less promising ones early in the process. This approach is efficient for finding good hyperparameter settings without expending excessive computational resources.

- Successive Halving can significantly reduce the total computational cost compared to grid search or random search by quickly discarding poor configurations.
- The method can adapt to various types of resource constraints, such as time, computational power, or data.

In `scikit-learn`, Successive Halving is implemented via the `HalvingGridSearchCV` and `HalvingRandomSearchCV` classes, which are similar to GridSearchCV and RandomSearchCV but use the halving approach.

For example here is a complpete guide to apply Halving Search to the Random Forest Model.

In [47]:
# Define the model
model = RandomForestClassifier(random_state=42)

In [48]:
# Define the parameter grid
param_grid = {
    'n_estimators': [10, 50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

In [53]:
from sklearn.experimental import enable_halving_search_cv

from sklearn.model_selection import HalvingGridSearchCV
# Set up the HalvingGridSearchCV
halving_search = HalvingGridSearchCV(
    estimator=model,
    param_grid=param_grid,
    factor=2,  # The factor by which resources are increased in each iteration
    scoring='accuracy',
    n_jobs=-1,
    min_resources='exhaust',
    random_state=42
)


In [54]:
# Fit the model
halving_search.fit(X_train_class, y_train_class)

# Print the best parameters and score
print("Best Parameters:", halving_search.best_params_)
print("Best Score:", halving_search.best_score_)

Best Parameters: {'max_depth': 20, 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 100}
Best Score: 0.8833333333333332


In [55]:
best_model = halving_search.best_estimator_
# Evaluate the best model
y_pred = best_model.predict(X_test_class)
# Calculate and print the mean squared error
acc = accuracy_score(y_test_class, y_pred)
print(f"Accuracy Score: {acc:.2f}")

Accuracy Score: 0.90


We can use Randomized halving search in a similar way as we did with Grid Search to get our optimized results.

### Conclusion

In this notebook, we explored various model hyperparameter optimization techniques available in scikit-learn, focusing on improving model performance by selecting the best hyperparameters. We covered traditional methods like `GridSearchCV` and `RandomSearchCV`, as well as advanced techniques such as Bayesian Optimization using `BayesSearchCV` and Successive Halving with `HalvingGridSearchCV`. These methods help efficiently explore the hyperparameter space, balancing the trade-off between computation time and model performance. By understanding and applying these techniques, we can enhance the accuracy, robustness, and efficiency of machine learning models, making them better suited for real-world applications.