### **Hyperparameter Tuning**

In machine learning, model parameters are learned from the data automatically during training. For instance, the weights in a linear regression or a neural network are parameters. Hyperparameters, on the other hand, are configurations external to the model, which can't be estimated from data. They are often set before the learning process begins.

The performance of a machine learning model can be sensitive to the hyperparameters provided. Therefore, finding the optimal hyperparameters is crucial. This process of searching for the ideal model hyperparameters is called hyperparameter tuning or optimization.

#### **Why Is Hyperparameter Tuning Important?**

- **Performance**: Properly tuned hyperparameters can significantly improve the performance of a model—conversely, poorly set hyperparameters can render even a well-structured model ineffective.
  
- **Overfitting vs. Underfitting**: Hyperparameters can influence model complexity. For example, a high polynomial degree might fit training data perfectly but fail on test data. Hyperparameter tuning aids in finding a balanced model that generalizes well.

- **Computational Efficiency**: Some hyperparameters can influence how fast a model trains. For example, the learning rate in many algorithms determines how fast they converge to a solution.

#### **Common Methods for Hyperparameter Tuning**:

1. **Manual Search**: Initially, practitioners often set hyperparameters based on intuition or experience. Though not systematic, it provides a good starting point.

2. **Grid Search**: A brute-force approach, where you specify a subset of the hyperparameter space. It then evaluates the model performance for each point in the grid.

3. **Random Search**: Instead of a comprehensive search like Grid Search, Random Search randomly selects points in the hyperparameter space and checks their performance. It can outperform Grid Search, especially when only a few hyperparameters matter.

4. **Bayesian Optimization**: This probabilistic model-based approach balances exploration and exploitation. It builds a probability model of the objective function and uses it to select hyperparameters that might perform well.

#### **State-of-the-Art and Sensitivity-based Approaches**:

Bayesian Optimization is among the more advanced methods and is considered state-of-the-art for hyperparameter tuning. It's especially useful when evaluating the objective function (like the validation error) is costly.

Another tool to use with hyperparameter tuning is *sensitivity analysis*.  Sensitivity analysis describes the process of determining how important (or sensitive) and output function is to a given parameter. The principles of sensitivity analysis are somewhat applied in methods like Bayesian Optimization, where the aim is to intelligently sample the hyperparameters that are likely to yield the best performance.

In Scikit-Learn, there isn't a direct implementation of sensitivity analysis for hyperparameter tuning. However, there are more specialized libraries or tools that focus on Bayesian Optimization or other advanced optimization techniques. Libraries such as `Scikit-Optimize` and `Hyperopt` are examples of this.

### Examples

1. **Grid Search**

In [4]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
import xgboost as xgb

wine = load_wine()
X = pd.DataFrame(wine.data,columns=wine.feature_names)
y = wine.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [5]:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'learning_rate': [0.01, 0.5],
    'max_depth': [3, 7],
    'min_child_weight': [3, 5],
    'subsample': [0.5, 0.7],
    'colsample_bytree': [0.5, 0.7],
    'n_estimators' : [100, 200],
    'objective': ['multi:softmax']
}

clf = xgb.XGBClassifier()
grid_search = GridSearchCV(clf, param_grid, scoring='accuracy', cv=3, verbose=1)
grid_search.fit(X_train, y_train)

print("Best parameters found: ",grid_search.best_params_)
print("Best score",grid_search.best_score_)



Fitting 3 folds for each of 64 candidates, totalling 192 fits
Best parameters found:  {'colsample_bytree': 0.5, 'learning_rate': 0.5, 'max_depth': 3, 'min_child_weight': 3, 'n_estimators': 100, 'objective': 'multi:softmax', 'subsample': 0.7}
Best score 0.9858156028368793


There are a couple of ways to use this to create a new model.

In [6]:
# To get the best model, you can use:
best_model = grid_search.best_estimator_

# Alternatively, you can initialize a new model like:
new_model = xgb.XGBClassifier(**grid_search.best_params_)

2. **Random Search**


In [7]:

from sklearn.model_selection import RandomizedSearchCV

param_dist = {
    'learning_rate': np.linspace(0.01, 1, 10),
    'max_depth': [3, 4, 5, 6, 7, 8, 9, 10],
    'min_child_weight': [1, 2, 3, 4, 5],
    'subsample': np.linspace(0.5, 1, 6),
    'colsample_bytree': np.linspace(0.5, 1, 6),
    'n_estimators': [100, 200, 300, 400, 500],
    'objective': ['multi:softmax']
}

rand_search = RandomizedSearchCV(clf, param_distributions=param_dist, scoring='accuracy', cv=3, verbose=1, n_iter=50)
rand_search.fit(X_train, y_train)

print("Best parameters found: ",rand_search.best_params_)
print("Best score",rand_search.best_score_)

Fitting 3 folds for each of 50 candidates, totalling 150 fits
Best parameters found:  {'subsample': 0.6, 'objective': 'multi:softmax', 'n_estimators': 400, 'min_child_weight': 1, 'max_depth': 5, 'learning_rate': 0.45, 'colsample_bytree': 0.8}
Best score 0.9929078014184397


In [8]:
rand_search.best_score_

0.9929078014184397

3. **Bayesian Optimization**

Although Bayesian Optimization is not part of the sklearn library, there is a popular standalone Python library called `bayesian-optimization` that is often used in conjunction with `scikit-learn` for this purpose.

   Before we can use `bayesian-optimization`, you must install it:

   ```bash
   pip install bayesian-optimization
   ```


In [9]:
!pip install bayesian-optimization

[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621[0m[33m
[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621[0m[33m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.9 -m pip install --upgrade pip[0m


In [10]:
# Suppress warnings, as there are quite a few here
import warnings
warnings.filterwarnings('ignore')

In [11]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from bayes_opt import BayesianOptimization
from sklearn.metrics import accuracy_score

# Load data
wine = load_wine(as_frame=True)
X = wine.data
y = wine.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Objective function for Bayesian Optimization
def xgb_evaluate(max_depth, gamma, colsample_bytree, learning_rate):
    params = {
        'max_depth': int(max_depth),
        'gamma': gamma,
        'colsample_bytree': colsample_bytree,
        'learning_rate': learning_rate,
        'objective': 'multi:softprob',
        'num_class': 3,
        'eval_metric': 'mlogloss',
        'silent': 1
    }
    model = XGBClassifier(**params)
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    accuracy = accuracy_score(y_test, predictions)
    return accuracy

# Bayesian Optimization
xgb_bo = BayesianOptimization(xgb_evaluate, {
    'max_depth': (3, 10),
    'gamma': (0, 1),
    'colsample_bytree': (0.3, 0.9),
    'learning_rate': (0.01, 0.3)
})

# Maximize the objective function
xgb_bo.maximize(init_points=5, n_iter=10)

# Results
params = xgb_bo.max['params']
params['max_depth'] = int(params['max_depth'])
print("Optimized parameters:", params)


|   iter    |  target   | colsam... |   gamma   | learni... | max_depth |
-------------------------------------------------------------------------


| [0m1        [0m | [0m1.0      [0m | [0m0.5418   [0m | [0m0.3502   [0m | [0m0.04232  [0m | [0m3.67     [0m |
| [0m2        [0m | [0m1.0      [0m | [0m0.7017   [0m | [0m0.03177  [0m | [0m0.1415   [0m | [0m4.407    [0m |
| [0m3        [0m | [0m1.0      [0m | [0m0.4991   [0m | [0m0.435    [0m | [0m0.2662   [0m | [0m6.623    [0m |
| [0m4        [0m | [0m1.0      [0m | [0m0.501    [0m | [0m0.9244   [0m | [0m0.09753  [0m | [0m7.131    [0m |
| [0m5        [0m | [0m0.9722   [0m | [0m0.6923   [0m | [0m0.9427   [0m | [0m0.1009   [0m | [0m6.441    [0m |
| [0m6        [0m | [0m1.0      [0m | [0m0.5029   [0m | [0m0.5361   [0m | [0m0.237    [0m | [0m3.782    [0m |
| [0m7        [0m | [0m1.0      [0m | [0m0.6076   [0m | [0m0.2507   [0m | [0m0.147    [0m | [0m4.057    [0m |
| [0m8        [0m | [0m0.9722   [0m | [0m0.343    [0m | [0m0.3646   [0m | [0m0.2849   [0m | [0m7.124    [0m |
| [0m9        [0m | [

### Tips for working with pipelines in SKLearn

For using parameter tuning in a pipeline with `sklearn`, here are a few considerations:

1. **Name your pipeline steps**: When using `Pipeline` in `sklearn`, each step is given a name (either by you or by default). When specifying hyperparameters for grid search or random search, you'll need to use these names.

    For example, if your pipeline is:
    ```python
    pipeline = Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', XGBClassifier())
    ])
    ```

    Then, a parameter grid for grid search might look like:
    ```python
    param_grid = {
        'classifier__n_estimators': [50, 100, 200],
        'classifier__learning_rate': [0.01, 0.05, 0.1]
    }
    ```

2. **Pipeline in `GridSearchCV` and `RandomizedSearchCV`**: You can indeed directly pass a pipeline into these search methods in `sklearn`. This is very handy as it allows you to search not just over model parameters, but also over preprocessing steps and their parameters.

3. **Complexity & Runtime**: Keep in mind that hyperparameter tuning can significantly increase the runtime, especially with a large parameter grid or a large dataset. This is even more pronounced with Bayesian optimization because the optimization procedure itself also adds overhead. This is particularly true when you combine this with cross-validation. Always start with a smaller subset or a smaller parameter space to gauge the time it will take.

4. **Consistent Seeds**: If you want reproducibility, make sure to set random seeds consistently, especially if the methods you're using (like `RandomizedSearchCV` or the model itself) have stochastic behavior.

5. **Nested CV**: If you're using cross-validation within the Bayesian Optimization and also wish to evaluate the model's performance using cross-validation, you'll end up with a nested cross-validation setup. Nested CV is a robust way to estimate the performance of the entire process (including hyperparameter tuning), but it's computationally expensive.