# Hyperparameter Tuning Example

In this notebook, we will explore hyperparameter tuning methods for regression problems. We will use a real-world dataset and cover the following topics:
1. Overview of hyperparameter tuning methods
2. Dataset and preprocessing
3. Hyperparameter tuning methods: Grid Search, Randomized Search, Bayesian Optimization, and Genetic Algorithm
4. Detailed case studies for SVM and Neural Network models
5. Assumptions, Pros, and Cons of each method
6. Recommendations for choosing the right method

Let's get started with a brief overview of the dataset we will use and the preprocessing steps required.


# Dataset

We will use the Titanic dataset, which contains **nominal**, **ordinal**, and **numeric** features. This dataset is available from [Kaggle](https://www.kaggle.com/c/titanic/data). It contains information on passengers, their survival, and features such as age, fare, gender, and class.

Download the dataset:

```bash
!wget https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv

In [None]:
!wget https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv

--2024-10-05 16:12:13--  https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 60302 (59K) [text/plain]
Saving to: ‘titanic.csv’


2024-10-05 16:12:13 (4.28 MB/s) - ‘titanic.csv’ saved [60302/60302]



In [None]:
# Load and explore dataset
import pandas as pd
import numpy as np

# Load dataset
data = pd.read_csv('titanic.csv')

# Overview of the dataset
data.head()

# Check for missing values
data.isnull().sum()


Unnamed: 0,0
PassengerId,0
Survived,0
Pclass,0
Name,0
Sex,0
Age,177
SibSp,0
Parch,0
Ticket,0
Fare,0


# Data Preprocessing

Since our dataset contains **nominal**, **ordinal**, and **numeric** data, we need to handle each type properly.

- **Nominal** (e.g., `Sex`, `Embarked`): Use **One-Hot Encoding** to convert categorical values into binary columns.
- **Ordinal** (e.g., `Pclass`): Use **Ordinal Encoding**.
- **Numeric** (e.g., `Age`, `Fare`): Apply **Standardization** to normalize numeric features.

Let's preprocess the dataset:


In [None]:
from sklearn.preprocessing import OneHotEncoder, StandardScaler, OrdinalEncoder
from sklearn.model_selection import train_test_split
import numpy as np

# Nominal (One-Hot Encoding)
nominal_features = ['Sex', 'Embarked']
onehot_encoder = OneHotEncoder(sparse_output=False)  # Use sparse_output instead of sparse
nominal_encoded = onehot_encoder.fit_transform(data[nominal_features])

# Ordinal (Label Encoding)
ordinal_features = ['Pclass']
ordinal_encoder = OrdinalEncoder()
data[ordinal_features] = ordinal_encoder.fit_transform(data[ordinal_features])

# Numeric (Standardization)
numeric_features = ['Age', 'Fare']
scaler = StandardScaler()
data[numeric_features] = scaler.fit_transform(data[numeric_features])

# Combine all processed features
processed_data = np.concatenate([nominal_encoded, data[ordinal_features], data[numeric_features]], axis=1)

# Define target variable and split the data
y = data['Survived']
X_train, X_test, y_train, y_test = train_test_split(processed_data, y, test_size=0.2, random_state=42)


# Baseline Models

Before tuning, it's important to establish a baseline model for comparison. We'll start with:
1. **Random Forest**: An ensemble model that creates multiple decision trees and averages their predictions.
2. **Support Vector Machine (SVM)**: A model that aims to find the optimal boundary separating classes.
3. **Neural Network**: A basic feedforward neural network.

### Random Forest
1. `n_estimators`: Number of trees in the forest.
2. `max_depth`: Maximum depth of each tree. Deeper trees can model more complex relationships but may overfit.
3. `min_samples_split`: Minimum number of samples required to split a node.

### SVM
1. `C`: Regularization parameter that controls the trade-off between achieving a low training error and a low testing error.
2. `kernel`: Specifies the kernel type to be used in the algorithm (linear, polynomial, etc.).

### Neural Network
1. `hidden_layer_sizes`: Number of neurons in each hidden layer.
2. `activation`: Activation function for neurons (e.g., ReLU, tanh).
3. `learning_rate`: The step size used in updating the weights.



Let's train these models with default hyperparameters:


In [None]:
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

# Handle missing values (imputation)
imputer = SimpleImputer(strategy='mean')
X_train = imputer.fit_transform(X_train)
X_test = imputer.transform(X_test)

# Random Forest
rf = RandomForestClassifier(random_state=42)
rf.fit(X_train, y_train)
rf_pred = rf.predict(X_test)

# SVM
svm = SVC(random_state=42)
svm.fit(X_train, y_train)
svm_pred = svm.predict(X_test)

# Neural Network
nn = MLPClassifier(random_state=42, max_iter=1000, learning_rate_init=0.001)
nn.fit(X_train, y_train)
nn_pred = nn.predict(X_test)

# Evaluate models
print("Random Forest Accuracy:", accuracy_score(y_test, rf_pred))
print("SVM Accuracy:", accuracy_score(y_test, svm_pred))
print("Neural Network Accuracy:", accuracy_score(y_test, nn_pred))


Random Forest Accuracy: 0.7821229050279329
SVM Accuracy: 0.8044692737430168
Neural Network Accuracy: 0.770949720670391


# Hyperparameter Tuning Methods




## 1. Grid Search

**Grid Search** is an exhaustive search method that tests every possible combination of hyperparameter values. While thorough, it can be computationally expensive if there are many parameters or large datasets.

Steps:
1. Define a grid of hyperparameter values.
2. Evaluate each combination of values.
3. Select the combination that provides the best performance.

Here’s how Grid Search works visually:

![Grid Search Illustration](https://www.yourdatateacher.com/wp-content/uploads/2021/03/image-6.png)


### Assumptions:
- The search space is well-defined and limited to a manageable size.
- All combinations of hyperparameters are evaluated, which assumes that the best set of hyperparameters lies within the provided grid.

### Pros:
- **Exhaustive**: It guarantees that the best hyperparameter combination (within the defined grid) will be found.
- **Easy to implement**: Simple and intuitive to set up.

### Cons:
- **Computationally expensive**: It evaluates every combination, which becomes infeasible with a large search space.
- **Inefficient**: Grid Search doesn’t prioritize or focus on promising hyperparameter values.
- **Not scalable**: For large datasets or complex models, Grid Search can be slow.


### Random Forest (Grid Search)
1. `n_estimators`: Number of trees in the forest.
2. `max_depth`: Maximum depth of each tree. Deeper trees can model more complex relationships but may overfit.
3. `min_samples_split`: Minimum number of samples required to split a node.

In [None]:
from sklearn.model_selection import GridSearchCV

# Define the grid for Random Forest
param_grid_rf = {
    'n_estimators': [100, 200, 300],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10]
}

# Initialize Grid Search
grid_search_rf = GridSearchCV(estimator=rf, param_grid=param_grid_rf, cv=5, verbose=2,scoring='accuracy')
grid_search_rf.fit(X_train, y_train)

# Best hyperparameters
print("Best Hyperparameters for Random Forest:", grid_search_rf.best_params_)
print("Best Score:", grid_search_rf.best_score_)

Fitting 5 folds for each of 27 candidates, totalling 135 fits
[CV] END max_depth=None, min_samples_split=2, n_estimators=100; total time=   0.9s
[CV] END max_depth=None, min_samples_split=2, n_estimators=100; total time=   0.5s
[CV] END max_depth=None, min_samples_split=2, n_estimators=100; total time=   0.5s
[CV] END max_depth=None, min_samples_split=2, n_estimators=100; total time=   0.4s
[CV] END max_depth=None, min_samples_split=2, n_estimators=100; total time=   0.6s
[CV] END max_depth=None, min_samples_split=2, n_estimators=200; total time=   1.3s
[CV] END max_depth=None, min_samples_split=2, n_estimators=200; total time=   2.0s
[CV] END max_depth=None, min_samples_split=2, n_estimators=200; total time=   1.9s
[CV] END max_depth=None, min_samples_split=2, n_estimators=200; total time=   1.1s
[CV] END max_depth=None, min_samples_split=2, n_estimators=200; total time=   1.0s
[CV] END max_depth=None, min_samples_split=2, n_estimators=300; total time=   1.4s
[CV] END max_depth=None, 

**Noted: Using scoring in Scikit-learn**

The scoring parameter in Scikit-learn is used to specify the metric for evaluating models in hyperparameter tuning processes such as GridSearchCV or RandomizedSearchCV. If the scoring parameter is not specified, Scikit-learn will use a default scoring method based on the type of model being used.

Default scoring Values
- **Classification**: The default scoring is accuracy
- **Regression**: The default scoring is r2


Further Reading
For more detailed information on the scoring parameter, you can visit the official Scikit-learn documentation: https://scikit-learn.org/dev/modules/model_evaluation.html#scoring-parameter


### SVM (Grid Search)
We'll tune the following hyperparameters:
1. `C`: Regularization parameter.
2. `kernel`: Type of kernel (linear, polynomial, RBF).
3. `gamma`: Kernel coefficient for RBF and polynomial kernels.

In [None]:
# Define the grid for SVM
param_grid_svm = {
    'C': [0.1, 1, 10],
    'kernel': ['linear', 'rbf'],
    'gamma': ['scale', 'auto']
}

# Initialize Grid Search for SVM
grid_search_svm = GridSearchCV(estimator=svm, param_grid=param_grid_svm, cv=5, verbose=2)
grid_search_svm.fit(X_train, y_train)

# Best hyperparameters for SVM
print("Best Hyperparameters for SVM:", grid_search_svm.best_params_)
print("Best Score:", grid_search_svm.best_score_)

Fitting 5 folds for each of 12 candidates, totalling 60 fits
[CV] END ..................C=0.1, gamma=scale, kernel=linear; total time=   0.0s
[CV] END ..................C=0.1, gamma=scale, kernel=linear; total time=   0.0s
[CV] END ..................C=0.1, gamma=scale, kernel=linear; total time=   0.0s
[CV] END ..................C=0.1, gamma=scale, kernel=linear; total time=   0.0s
[CV] END ..................C=0.1, gamma=scale, kernel=linear; total time=   0.0s
[CV] END .....................C=0.1, gamma=scale, kernel=rbf; total time=   0.0s
[CV] END .....................C=0.1, gamma=scale, kernel=rbf; total time=   0.0s
[CV] END .....................C=0.1, gamma=scale, kernel=rbf; total time=   0.0s
[CV] END .....................C=0.1, gamma=scale, kernel=rbf; total time=   0.0s
[CV] END .....................C=0.1, gamma=scale, kernel=rbf; total time=   0.0s
[CV] END ...................C=0.1, gamma=auto, kernel=linear; total time=   0.0s
[CV] END ...................C=0.1, gamma=auto, k

### Neural Network (Grid Search)

For the **Neural Network**, we'll tune:
1. `hidden_layer_sizes`: The number of neurons in the hidden layers.
2. `activation`: Activation function for neurons.
3. `learning_rate_init`: Initial learning rate.


In [None]:
# Define the hyperparameter space for Neural Network
param_grid_nn = {
    'hidden_layer_sizes': [(50,), (100,), (150,)],
    'activation': ['relu', 'tanh'],
    'learning_rate_init': [0.001, 0.01, 0.1]
}

grid_search_nn = GridSearchCV(estimator=nn,  param_grid=param_grid_nn, cv=5, verbose=2)
grid_search_nn.fit(X_train, y_train)

# Best hyperparameters for Neural Network
print("Best Hyperparameters for Neural Network:", grid_search_nn.best_params_)
print("Best Score:",grid_search_nn.best_score_)

Fitting 5 folds for each of 18 candidates, totalling 90 fits
[CV] END activation=relu, hidden_layer_sizes=(50,), learning_rate_init=0.001; total time=   1.1s
[CV] END activation=relu, hidden_layer_sizes=(50,), learning_rate_init=0.001; total time=   1.0s
[CV] END activation=relu, hidden_layer_sizes=(50,), learning_rate_init=0.001; total time=   0.7s
[CV] END activation=relu, hidden_layer_sizes=(50,), learning_rate_init=0.001; total time=   1.9s
[CV] END activation=relu, hidden_layer_sizes=(50,), learning_rate_init=0.001; total time=   2.7s
[CV] END activation=relu, hidden_layer_sizes=(50,), learning_rate_init=0.01; total time=   0.9s
[CV] END activation=relu, hidden_layer_sizes=(50,), learning_rate_init=0.01; total time=   0.6s
[CV] END activation=relu, hidden_layer_sizes=(50,), learning_rate_init=0.01; total time=   0.4s
[CV] END activation=relu, hidden_layer_sizes=(50,), learning_rate_init=0.01; total time=   0.7s
[CV] END activation=relu, hidden_layer_sizes=(50,), learning_rate_init



[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   4.7s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   2.5s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   3.4s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   2.9s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   2.8s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   0.8s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   0.4s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   1.1s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   0.6s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   0.6s
[CV] END activation=relu,



[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   8.5s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   3.1s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   3.2s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   3.3s




[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   8.9s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   2.2s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   1.4s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   2.8s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   3.0s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   2.1s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.1; total time=   0.4s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.1; total time=   0.8s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.1; total time=   0.6s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.1; total time=   0.8s
[CV] END activation=tanh, hidden_



[CV] END activation=tanh, hidden_layer_sizes=(150,), learning_rate_init=0.001; total time=  11.1s
[CV] END activation=tanh, hidden_layer_sizes=(150,), learning_rate_init=0.01; total time=   1.9s
[CV] END activation=tanh, hidden_layer_sizes=(150,), learning_rate_init=0.01; total time=   2.1s
[CV] END activation=tanh, hidden_layer_sizes=(150,), learning_rate_init=0.01; total time=   3.0s
[CV] END activation=tanh, hidden_layer_sizes=(150,), learning_rate_init=0.01; total time=   0.9s
[CV] END activation=tanh, hidden_layer_sizes=(150,), learning_rate_init=0.01; total time=   1.4s
[CV] END activation=tanh, hidden_layer_sizes=(150,), learning_rate_init=0.1; total time=   0.5s
[CV] END activation=tanh, hidden_layer_sizes=(150,), learning_rate_init=0.1; total time=   0.5s
[CV] END activation=tanh, hidden_layer_sizes=(150,), learning_rate_init=0.1; total time=   0.4s
[CV] END activation=tanh, hidden_layer_sizes=(150,), learning_rate_init=0.1; total time=   0.8s
[CV] END activation=tanh, hidden_

## 2. Randomized Search

**Randomized Search** selects hyperparameter combinations at random from the defined grid. It doesn't try all combinations but explores the space more efficiently, which makes it faster in some cases.

Steps:
1. Define a range of possible hyperparameter values.
2. Randomly sample combinations.
3. Select the combination that gives the best performance.

Here’s a visual comparison between Grid Search and Randomized Search:

![Randomized Search Illustration](https://www.yourdatateacher.com/wp-content/uploads/2021/03/image-7.png)


### Assumptions:
- The hyperparameter space can be defined as a distribution, and a random selection of hyperparameters can potentially include the best-performing ones.
- The search space is large, but not every combination needs to be evaluated.

### Pros:
- **Faster than Grid Search**: It reduces the number of combinations evaluated, leading to quicker results.
- **Scalable**: More feasible for larger datasets or larger hyperparameter spaces.
- **More diverse sampling**: Randomized Search is better at exploring the hyperparameter space, as it doesn’t stick to a grid.

### Cons:
- **No guarantee**: It doesn’t guarantee finding the best combination, as not all combinations are evaluated.
- **Less exhaustive**: The randomness can sometimes lead to missing optimal configurations, especially if the number of iterations is too low.

### Random Forest (Randomized Search)
1. `n_estimators`: Number of trees in the forest.
2. `max_depth`: Maximum depth of each tree. Deeper trees can model more complex relationships but may overfit.
3. `min_samples_split`: Minimum number of samples required to split a node.

In [None]:
from sklearn.model_selection import RandomizedSearchCV

# Define the hyperparameter space for Random Forest
param_dist_rf = {
    'n_estimators': [int(x) for x in np.linspace(100, 500, 10)],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

# Initialize Randomized Search
random_search_rf = RandomizedSearchCV(estimator=rf, param_distributions=param_dist_rf, n_iter=10, cv=5, verbose=2)
random_search_rf.fit(X_train, y_train)

# Best hyperparameters
print("Best Hyperparameters for Random Forest (Randomized Search):", random_search_rf.best_params_)
print("Best Score:",random_search_rf.best_score_)

Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV] END max_depth=20, min_samples_split=2, n_estimators=366; total time=   0.7s
[CV] END max_depth=20, min_samples_split=2, n_estimators=366; total time=   0.7s
[CV] END max_depth=20, min_samples_split=2, n_estimators=366; total time=   0.7s
[CV] END max_depth=20, min_samples_split=2, n_estimators=366; total time=   0.7s
[CV] END max_depth=20, min_samples_split=2, n_estimators=366; total time=   0.7s
[CV] END max_depth=None, min_samples_split=2, n_estimators=100; total time=   0.2s
[CV] END max_depth=None, min_samples_split=2, n_estimators=100; total time=   0.2s
[CV] END max_depth=None, min_samples_split=2, n_estimators=100; total time=   0.3s
[CV] END max_depth=None, min_samples_split=2, n_estimators=100; total time=   0.3s
[CV] END max_depth=None, min_samples_split=2, n_estimators=100; total time=   0.3s
[CV] END max_depth=20, min_samples_split=2, n_estimators=322; total time=   1.0s
[CV] END max_depth=20, min_samples_spl

### SVM (Randomized Search)
We'll tune the following hyperparameters:
1. `C`: Regularization parameter.
2. `kernel`: Type of kernel (linear, polynomial, RBF).
3. `gamma`: Kernel coefficient for RBF and polynomial kernels.

In [None]:
# Define the grid for SVM
param_dist_svm = {
    'C': [0.1, 1, 10],
    'kernel': ['linear', 'rbf'],
    'gamma': ['scale', 'auto']
}

# Initialize Grid Search for SVM
random_search_svm = RandomizedSearchCV(estimator=svm, param_distributions=param_dist_svm, n_iter=10, cv=5, verbose=2)
random_search_svm.fit(X_train, y_train)

# Best hyperparameters for SVM
print("Best Hyperparameters for SVM:", random_search_svm.best_params_)
print("Best Score:", random_search_svm.best_score_)

Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV] END .....................C=1, gamma=auto, kernel=linear; total time=   0.0s
[CV] END .....................C=1, gamma=auto, kernel=linear; total time=   0.0s
[CV] END .....................C=1, gamma=auto, kernel=linear; total time=   0.0s
[CV] END .....................C=1, gamma=auto, kernel=linear; total time=   0.0s
[CV] END .....................C=1, gamma=auto, kernel=linear; total time=   0.0s
[CV] END ......................C=10, gamma=scale, kernel=rbf; total time=   0.0s
[CV] END ......................C=10, gamma=scale, kernel=rbf; total time=   0.0s
[CV] END ......................C=10, gamma=scale, kernel=rbf; total time=   0.0s
[CV] END ......................C=10, gamma=scale, kernel=rbf; total time=   0.0s
[CV] END ......................C=10, gamma=scale, kernel=rbf; total time=   0.0s
[CV] END ...................C=10, gamma=scale, kernel=linear; total time=   0.0s
[CV] END ...................C=10, gamma=scale, k

### Neural Network (Randomized Search)

For the **Neural Network**, we'll tune:
1. `hidden_layer_sizes`: The number of neurons in the hidden layers.
2. `activation`: Activation function for neurons.
3. `learning_rate_init`: Initial learning rate.


In [None]:
# Define the hyperparameter space for Neural Network
param_dist_nn = {
    'hidden_layer_sizes': [(50,), (100,), (150,)],
    'activation': ['relu', 'tanh'],
    'learning_rate_init': [0.001, 0.01, 0.1]
}

# Initialize Randomized Search for Neural Network
random_search_nn = RandomizedSearchCV(estimator=nn, param_distributions=param_dist_nn, n_iter=10, cv=5, verbose=2)
random_search_nn.fit(X_train, y_train)

# Best hyperparameters for Neural Network
print("Best Hyperparameters for Neural Network:", random_search_nn.best_params_)
print("Best Score:",random_search_nn.best_score_)

Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV] END activation=tanh, hidden_layer_sizes=(50,), learning_rate_init=0.01; total time=   1.3s
[CV] END activation=tanh, hidden_layer_sizes=(50,), learning_rate_init=0.01; total time=   0.8s
[CV] END activation=tanh, hidden_layer_sizes=(50,), learning_rate_init=0.01; total time=   2.3s
[CV] END activation=tanh, hidden_layer_sizes=(50,), learning_rate_init=0.01; total time=   1.3s
[CV] END activation=tanh, hidden_layer_sizes=(50,), learning_rate_init=0.01; total time=   1.6s
[CV] END activation=relu, hidden_layer_sizes=(50,), learning_rate_init=0.001; total time=   2.4s
[CV] END activation=relu, hidden_layer_sizes=(50,), learning_rate_init=0.001; total time=   2.6s
[CV] END activation=relu, hidden_layer_sizes=(50,), learning_rate_init=0.001; total time=   0.7s
[CV] END activation=relu, hidden_layer_sizes=(50,), learning_rate_init=0.001; total time=   1.4s
[CV] END activation=relu, hidden_layer_sizes=(50,), learning_rate_init=



[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   6.4s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   4.8s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   3.2s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   3.1s




[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   8.9s
[CV] END activation=tanh, hidden_layer_sizes=(150,), learning_rate_init=0.001; total time=   3.2s
[CV] END activation=tanh, hidden_layer_sizes=(150,), learning_rate_init=0.001; total time=   6.2s
[CV] END activation=tanh, hidden_layer_sizes=(150,), learning_rate_init=0.001; total time=   6.1s
[CV] END activation=tanh, hidden_layer_sizes=(150,), learning_rate_init=0.001; total time=   4.2s




[CV] END activation=tanh, hidden_layer_sizes=(150,), learning_rate_init=0.001; total time=  11.0s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   2.2s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   1.4s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   3.3s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   3.1s
[CV] END activation=tanh, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   1.5s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   0.8s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   0.3s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   1.1s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.01; total time=   0.7s
[CV] END activation=relu, hid



[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   3.0s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   2.1s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   3.5s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   3.1s
[CV] END activation=relu, hidden_layer_sizes=(100,), learning_rate_init=0.001; total time=   2.9s
Best Hyperparameters for Neural Network: {'learning_rate_init': 0.1, 'hidden_layer_sizes': (100,), 'activation': 'relu'}
Best Score: 0.8202501723628484


## 3. Bayesian Optimization

**Bayesian Optimization** is a strategy that builds a probabilistic model of the objective function and uses it to select hyperparameter values that are more likely to improve model performance. The idea is to minimize the number of evaluations required by leveraging prior results.

### Key Concepts in Bayesian Search:

1. **Probabilistic Model**:
   - Bayesian Search builds a surrogate model of the objective function (i.e., the function we are optimizing, typically the validation score).
   - This surrogate model predicts how the performance metric will behave with different hyperparameter values.

2. **Exploration vs. Exploitation**:
   - The search process balances between exploring unknown areas (exploration) and refining areas where good results have already been observed (exploitation).

3. **Acquisition Function**:
   - The acquisition function helps decide which hyperparameters to try next. It determines the trade-off between exploration and exploitation.

4. **Gaussian Processes (GP)**:
   - Bayesian Search often uses Gaussian Processes to model the objective function, though other methods.

---

## Steps in Bayesian Search

1. **Initialize**:
   - Start by evaluating a set of random hyperparameter values.

2. **Fit a Probabilistic Model**:
   - A model is built to predict the objective function for unseen hyperparameter values based on the data from previous evaluations.

3. **Select Next Set of Hyperparameters**:
   - Use the acquisition function to select the next set of hyperparameters that are expected to improve performance based on the model.

4. **Evaluate and Update**:
   - Evaluate the selected hyperparameters, update the model with the new data, and repeat the process.

---

### Pros:
- **Efficient**: It balances exploration and exploitation, reducing the number of evaluations required.
- **Focused search**: Bayesian Optimization tends to focus on the most promising regions of the hyperparameter space, saving time and computational resources.

### Cons:
- **Complexity**: More difficult to implement compared to Grid or Randomized Search.
- **Requires prior evaluations**: Bayesian methods rely on a probabilistic model, which requires careful tuning and might not always improve results in early stages.
- **Less flexible for discrete search spaces**: It works best with continuous hyperparameters.


![Baye Search Illustration](https://media.licdn.com/dms/image/v2/D4D22AQH3u6gE7tqAfQ/feedshare-shrink_800/feedshare-shrink_800/0/1713367511248?e=2147483647&v=beta&t=SWX6qWn6HFUX94Vxy0IB32u0iqvIvA8WD20gCtj3K94)

Let’s implement Bayesian Optimization using the `scikit-optimize` library:

In [None]:
!pip install scikit-optimize

Collecting scikit-optimize
  Downloading scikit_optimize-0.10.2-py2.py3-none-any.whl.metadata (9.7 kB)
Collecting pyaml>=16.9 (from scikit-optimize)
  Downloading pyaml-24.9.0-py3-none-any.whl.metadata (11 kB)
Downloading scikit_optimize-0.10.2-py2.py3-none-any.whl (107 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m107.8/107.8 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyaml-24.9.0-py3-none-any.whl (24 kB)
Installing collected packages: pyaml, scikit-optimize
Successfully installed pyaml-24.9.0 scikit-optimize-0.10.2


### Random Forest (Bayesian)
1. `n_estimators`: Number of trees in the forest.
2. `max_depth`: Maximum depth of each tree. Deeper trees can model more complex relationships but may overfit.
3. `min_samples_split`: Minimum number of samples required to split a node.

In [None]:
from skopt import BayesSearchCV

# Define the hyperparameter space for Random Forest
param_space_rf = {
    'n_estimators': [int(x) for x in np.linspace(100, 500, 10)],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

# Initialize Bayesian Search for Random Forest
bayes_search_rf = BayesSearchCV(
    estimator=rf,
    search_spaces=param_space_rf,
    n_iter=10,  # Number of iterations for search
    cv=5,       # 5-fold cross-validation
    verbose=2
)

# Train the Random Forest model using Bayesian Optimization
bayes_search_rf.fit(X_train, y_train)

# Best hyperparameters for Random Forest using Bayesian Optimization
print("Best Hyperparameters for Random Forest (Bayesian):", bayes_search_rf.best_params_)
print("Best Score:", bayes_search_rf.best_score_)

Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] END max_depth=10, min_samples_split=2, n_estimators=233; total time=   0.7s
[CV] END max_depth=10, min_samples_split=2, n_estimators=233; total time=   1.0s
[CV] END max_depth=10, min_samples_split=2, n_estimators=233; total time=   0.6s
[CV] END max_depth=10, min_samples_split=2, n_estimators=233; total time=   0.5s
[CV] END max_depth=10, min_samples_split=2, n_estimators=233; total time=   0.5s
Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] END max_depth=30, min_samples_split=2, n_estimators=411; total time=   1.6s
[CV] END max_depth=30, min_samples_split=2, n_estimators=411; total time=   1.6s
[CV] END max_depth=30, min_samples_split=2, n_estimators=411; total time=   1.5s
[CV] END max_depth=30, min_samples_split=2, n_estimators=411; total time=   1.6s
[CV] END max_depth=30, min_samples_split=2, n_estimators=411; total time=   2.0s
Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] END max_

### SVM (Bayesian)
We'll tune the following hyperparameters:
1. `C`: Regularization parameter.
2. `kernel`: Type of kernel (linear, polynomial, RBF).
3. `gamma`: Kernel coefficient for RBF and polynomial kernels.

In [None]:
from skopt import BayesSearchCV

# Define the parameter space for SVM
param_space_svm = {
    'C': (1e-6, 1e+6, 'log-uniform'),
    'kernel': ['linear', 'rbf'],
    'gamma': (1e-6, 1e+1, 'log-uniform')
}

# Initialize Bayesian Search for SVM
bayes_search_svm = BayesSearchCV(estimator=svm, search_spaces=param_space_svm, n_iter=10, cv=5, verbose=2)
bayes_search_svm.fit(X_train, y_train)

# Best hyperparameters for SVM using Bayesian Optimization
print("Best Hyperparameters for SVM (Bayesian):", bayes_search_svm.best_params_)
print("Best Score:",bayes_search_svm.best_score_)

Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] END C=78.34316681301911, gamma=0.028570583196002513, kernel=rbf; total time=   0.0s
[CV] END C=78.34316681301911, gamma=0.028570583196002513, kernel=rbf; total time=   0.0s
[CV] END C=78.34316681301911, gamma=0.028570583196002513, kernel=rbf; total time=   0.0s
[CV] END C=78.34316681301911, gamma=0.028570583196002513, kernel=rbf; total time=   0.0s
[CV] END C=78.34316681301911, gamma=0.028570583196002513, kernel=rbf; total time=   0.0s
Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] END C=5.153799126196801, gamma=0.007035718994827788, kernel=rbf; total time=   0.0s
[CV] END C=5.153799126196801, gamma=0.007035718994827788, kernel=rbf; total time=   0.0s
[CV] END C=5.153799126196801, gamma=0.007035718994827788, kernel=rbf; total time=   0.0s
[CV] END C=5.153799126196801, gamma=0.007035718994827788, kernel=rbf; total time=   0.0s
[CV] END C=5.153799126196801, gamma=0.007035718994827788, kernel=rbf; total time=

### Neural Network (Bayesian)

For the **Neural Network**, we'll tune:
1. `hidden_layer_sizes`: The number of neurons in the hidden layers.
2. `activation`: Activation function for neurons.
3. `learning_rate_init`: Initial learning rate.


In [None]:
from skopt import BayesSearchCV
from skopt.space import Real, Integer

# Define the hyperparameter space for Neural Network
param_space_nn = {
    'hidden_layer_sizes': Integer(50, 200),
    'alpha': Real(1e-5, 1e-2, prior='log-uniform'),
    'learning_rate_init': Real(1e-4, 1e-2, prior='log-uniform'),
    'max_iter': Integer(100, 500)
}



# Initialize Bayesian Search for Neural Network
bayes_search_nn = BayesSearchCV(
    estimator=nn,
    search_spaces=param_space_nn,
    n_iter=10,  # Number of iterations for search
    cv=5,       # 5-fold cross-validation
    verbose=2
)

# Train the Neural Network model using Bayesian Optimization
bayes_search_nn.fit(X_train, y_train)

# Best hyperparameters for Neural Network using Bayesian Optimization
print("Best Hyperparameters for Neural Network (Bayesian):", bayes_search_nn.best_params_)
print("Best Score:", bayes_search_nn.best_score_)




Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] END alpha=0.00026284850736935263, hidden_layer_sizes=181, learning_rate_init=0.009198232978205994, max_iter=452; total time=   1.5s
[CV] END alpha=0.00026284850736935263, hidden_layer_sizes=181, learning_rate_init=0.009198232978205994, max_iter=452; total time=   1.2s
[CV] END alpha=0.00026284850736935263, hidden_layer_sizes=181, learning_rate_init=0.009198232978205994, max_iter=452; total time=   0.9s
[CV] END alpha=0.00026284850736935263, hidden_layer_sizes=181, learning_rate_init=0.009198232978205994, max_iter=452; total time=   1.6s
[CV] END alpha=0.00026284850736935263, hidden_layer_sizes=181, learning_rate_init=0.009198232978205994, max_iter=452; total time=   2.4s
Fitting 5 folds for each of 1 candidates, totalling 5 fits




[CV] END alpha=0.0002477971229101244, hidden_layer_sizes=54, learning_rate_init=0.0004736479037892373, max_iter=182; total time=   0.4s




[CV] END alpha=0.0002477971229101244, hidden_layer_sizes=54, learning_rate_init=0.0004736479037892373, max_iter=182; total time=   0.5s




[CV] END alpha=0.0002477971229101244, hidden_layer_sizes=54, learning_rate_init=0.0004736479037892373, max_iter=182; total time=   0.5s




[CV] END alpha=0.0002477971229101244, hidden_layer_sizes=54, learning_rate_init=0.0004736479037892373, max_iter=182; total time=   0.4s




[CV] END alpha=0.0002477971229101244, hidden_layer_sizes=54, learning_rate_init=0.0004736479037892373, max_iter=182; total time=   0.5s
Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] END alpha=0.000479689430929986, hidden_layer_sizes=164, learning_rate_init=0.003635879579168757, max_iter=478; total time=   1.7s
[CV] END alpha=0.000479689430929986, hidden_layer_sizes=164, learning_rate_init=0.003635879579168757, max_iter=478; total time=   1.7s




[CV] END alpha=0.000479689430929986, hidden_layer_sizes=164, learning_rate_init=0.003635879579168757, max_iter=478; total time=   2.1s
[CV] END alpha=0.000479689430929986, hidden_layer_sizes=164, learning_rate_init=0.003635879579168757, max_iter=478; total time=   2.4s
[CV] END alpha=0.000479689430929986, hidden_layer_sizes=164, learning_rate_init=0.003635879579168757, max_iter=478; total time=   4.1s
Fitting 5 folds for each of 1 candidates, totalling 5 fits




[CV] END alpha=0.00013094570677652423, hidden_layer_sizes=173, learning_rate_init=0.00022563262563692665, max_iter=386; total time=   3.4s




[CV] END alpha=0.00013094570677652423, hidden_layer_sizes=173, learning_rate_init=0.00022563262563692665, max_iter=386; total time=   4.3s
[CV] END alpha=0.00013094570677652423, hidden_layer_sizes=173, learning_rate_init=0.00022563262563692665, max_iter=386; total time=   6.2s




[CV] END alpha=0.00013094570677652423, hidden_layer_sizes=173, learning_rate_init=0.00022563262563692665, max_iter=386; total time=   4.0s




[CV] END alpha=0.00013094570677652423, hidden_layer_sizes=173, learning_rate_init=0.00022563262563692665, max_iter=386; total time=   3.4s
Fitting 5 folds for each of 1 candidates, totalling 5 fits




[CV] END alpha=0.008287816178716349, hidden_layer_sizes=79, learning_rate_init=0.0029371581428059334, max_iter=142; total time=   0.5s




[CV] END alpha=0.008287816178716349, hidden_layer_sizes=79, learning_rate_init=0.0029371581428059334, max_iter=142; total time=   0.5s




[CV] END alpha=0.008287816178716349, hidden_layer_sizes=79, learning_rate_init=0.0029371581428059334, max_iter=142; total time=   0.5s




[CV] END alpha=0.008287816178716349, hidden_layer_sizes=79, learning_rate_init=0.0029371581428059334, max_iter=142; total time=   0.4s




[CV] END alpha=0.008287816178716349, hidden_layer_sizes=79, learning_rate_init=0.0029371581428059334, max_iter=142; total time=   0.4s
Fitting 5 folds for each of 1 candidates, totalling 5 fits




[CV] END alpha=0.008121191298011535, hidden_layer_sizes=139, learning_rate_init=0.0018617959069453418, max_iter=398; total time=   3.6s




[CV] END alpha=0.008121191298011535, hidden_layer_sizes=139, learning_rate_init=0.0018617959069453418, max_iter=398; total time=   2.1s




[CV] END alpha=0.008121191298011535, hidden_layer_sizes=139, learning_rate_init=0.0018617959069453418, max_iter=398; total time=   1.6s




[CV] END alpha=0.008121191298011535, hidden_layer_sizes=139, learning_rate_init=0.0018617959069453418, max_iter=398; total time=   1.6s




[CV] END alpha=0.008121191298011535, hidden_layer_sizes=139, learning_rate_init=0.0018617959069453418, max_iter=398; total time=   1.7s
Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] END alpha=0.00044901829177746897, hidden_layer_sizes=172, learning_rate_init=0.006690461916208403, max_iter=277; total time=   1.2s
[CV] END alpha=0.00044901829177746897, hidden_layer_sizes=172, learning_rate_init=0.006690461916208403, max_iter=277; total time=   1.2s




[CV] END alpha=0.00044901829177746897, hidden_layer_sizes=172, learning_rate_init=0.006690461916208403, max_iter=277; total time=   1.1s
[CV] END alpha=0.00044901829177746897, hidden_layer_sizes=172, learning_rate_init=0.006690461916208403, max_iter=277; total time=   2.8s
[CV] END alpha=0.00044901829177746897, hidden_layer_sizes=172, learning_rate_init=0.006690461916208403, max_iter=277; total time=   1.7s
Fitting 5 folds for each of 1 candidates, totalling 5 fits




[CV] END alpha=0.0008275106805005455, hidden_layer_sizes=122, learning_rate_init=0.003791248543973768, max_iter=292; total time=   1.0s
[CV] END alpha=0.0008275106805005455, hidden_layer_sizes=122, learning_rate_init=0.003791248543973768, max_iter=292; total time=   0.9s




[CV] END alpha=0.0008275106805005455, hidden_layer_sizes=122, learning_rate_init=0.003791248543973768, max_iter=292; total time=   1.0s




[CV] END alpha=0.0008275106805005455, hidden_layer_sizes=122, learning_rate_init=0.003791248543973768, max_iter=292; total time=   1.1s




[CV] END alpha=0.0008275106805005455, hidden_layer_sizes=122, learning_rate_init=0.003791248543973768, max_iter=292; total time=   1.0s
Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] END alpha=0.0013413144846988303, hidden_layer_sizes=159, learning_rate_init=0.00010736734297966452, max_iter=482; total time=   1.8s




[CV] END alpha=0.0013413144846988303, hidden_layer_sizes=159, learning_rate_init=0.00010736734297966452, max_iter=482; total time=   2.0s
[CV] END alpha=0.0013413144846988303, hidden_layer_sizes=159, learning_rate_init=0.00010736734297966452, max_iter=482; total time=   3.7s
[CV] END alpha=0.0013413144846988303, hidden_layer_sizes=159, learning_rate_init=0.00010736734297966452, max_iter=482; total time=   3.3s
[CV] END alpha=0.0013413144846988303, hidden_layer_sizes=159, learning_rate_init=0.00010736734297966452, max_iter=482; total time=   2.0s
Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] END alpha=2.4850326786574706e-05, hidden_layer_sizes=77, learning_rate_init=0.005683456048374652, max_iter=442; total time=   0.8s




[CV] END alpha=2.4850326786574706e-05, hidden_layer_sizes=77, learning_rate_init=0.005683456048374652, max_iter=442; total time=   1.3s
[CV] END alpha=2.4850326786574706e-05, hidden_layer_sizes=77, learning_rate_init=0.005683456048374652, max_iter=442; total time=   1.0s
[CV] END alpha=2.4850326786574706e-05, hidden_layer_sizes=77, learning_rate_init=0.005683456048374652, max_iter=442; total time=   0.8s
[CV] END alpha=2.4850326786574706e-05, hidden_layer_sizes=77, learning_rate_init=0.005683456048374652, max_iter=442; total time=   1.4s
Best Hyperparameters for Neural Network (Bayesian): OrderedDict([('alpha', 0.0008275106805005455), ('hidden_layer_sizes', 122), ('learning_rate_init', 0.003791248543973768), ('max_iter', 292)])
Best Score: 0.8202107751403526




# **Results Summary**

##Baseline Models

In [None]:
print("Random Forest Accuracy:", accuracy_score(y_test, rf_pred))
print("SVM Accuracy:", accuracy_score(y_test, svm_pred))
print("Neural Network Accuracy:", accuracy_score(y_test, nn_pred))

Random Forest Accuracy: 0.7821229050279329
SVM Accuracy: 0.8044692737430168
Neural Network Accuracy: 0.770949720670391


##Grid Search Results

In [None]:
print("Random Forest Accuracy:", grid_search_rf.best_score_)
print("SVM Accuracy:", grid_search_svm.best_score_)
print("Neural Network Accuracy:",grid_search_nn.best_score_)

Random Forest Accuracy: 0.8244459765586527
SVM Accuracy: 0.8230375258544272
Neural Network Accuracy: 0.8202501723628484


## Randomized Search Results

In [None]:
print("Random Forest Accuracy:",random_search_rf.best_score_)
print("SVM Accuracy:", random_search_svm.best_score_)
print("Neural Network Accuracy:",random_search_nn.best_score_)

Random Forest Accuracy: 0.823076923076923
SVM Accuracy: 0.8230375258544272
Neural Network Accuracy: 0.8202501723628484


## Bayesian Optimization Results

In [None]:
print("Random Forest Accuracy:", bayes_search_rf.best_score_)
print("SVM Accuracy:",bayes_search_svm.best_score_)
print("Neural Network Accuracy:", bayes_search_nn.best_score_)

Random Forest Accuracy: 0.8230670737712991
SVM Accuracy: 0.8216389244558258
Neural Network Accuracy: 0.8202107751403526


#Trick: ขั้นตอนการทำ Hyperparameter Tuning ในการเปรียบเทียบหลายโมเดล

การทำ **hyperparameter tuning** มักจะขึ้นอยู่กับกลยุทธ์และลักษณะของปัญหาที่คุณกำลังแก้ไข แต่โดยทั่วไปจะมีสองแนวทางหลักในการเปรียบเทียบหลายโมเดล ดังนี้:

### แนวทางที่ 1: หาโมเดลที่ดีที่สุดจากค่า Default ก่อน แล้วค่อย Tune Hyperparameters

1. **Train Models with Default Hyperparameters**:
   เราทำการฝึกแต่ละโมเดลโดยใช้ค่า hyperparameters เริ่มต้น (default) ก่อน เพื่อดูว่าโมเดลใดทำงานได้ดีที่สุดโดยไม่ต้องใช้เวลาในการ tune hyperparameters ตั้งแต่ต้น
   
2. **Select the Best Model**:
   หลังจากเปรียบเทียบผลลัพธ์ของแต่ละโมเดลจากชุดทดสอบหรือการทำ cross-validation คุณจะเลือกโมเดลที่ให้ผลลัพธ์ดีที่สุดเมื่อใช้ค่า default
   
3. **Tune Hyperparameters of the Best Model**:
   หลังจากได้โมเดลที่ดีที่สุดแล้ว คุณจะทำการ tune hyperparameters ของโมเดลนี้เท่านั้นเพื่อให้ได้ผลลัพธ์ที่ดีที่สุด
   
4. **Evaluate on Test Set**:
   เมื่อได้ค่า hyperparameters ที่ดีที่สุดจากการ tune คุณจะใช้โมเดลนี้กับชุดทดสอบ (test set) เพื่อวัดประสิทธิภาพจริง

**ข้อดี**:
- ประหยัดเวลาในการ tune hyperparameters เพราะคุณจะปรับแต่งแค่โมเดลเดียวที่ดีที่สุด
- เหมาะกับกรณีที่ต้องการทดลองหลายโมเดลแบบรวดเร็วและไม่ต้องการปรับ hyperparameters ของทุกโมเดล

**ข้อเสีย**:
- โมเดลที่อาจจะทำงานได้ดีกับ hyperparameters อื่นๆ อาจถูกมองข้ามไป เนื่องจากประสิทธิภาพไม่ดีในค่า default

### แนวทางที่ 2: Tune Hyperparameters สำหรับแต่ละโมเดลเลยตั้งแต่ต้น

1. **Tune Hyperparameters for Each Model**:
   ทำการ tune hyperparameters ของแต่ละโมเดลตั้งแต่ต้นเลย เพื่อหาค่า hyperparameters ที่ดีที่สุดสำหรับแต่ละโมเดล
   
2. **Compare the Best Versions of Each Model**:
   หลังจาก tune hyperparameters สำหรับทุกโมเดลแล้ว คุณจะนำผลลัพธ์ที่ดีที่สุดของแต่ละโมเดลมาเปรียบเทียบกัน
   
3. **Select the Best Model**:
   เลือกโมเดลที่มีประสิทธิภาพดีที่สุดจากการ tune hyperparameters
   
4. **Evaluate on Test Set**:
   หลังจากเลือกโมเดลที่ดีที่สุดแล้ว ใช้โมเดลนี้ทดสอบกับชุดทดสอบ (test set)

**ข้อดี**:
- คุณจะได้เปรียบเทียบโมเดลในสภาวะที่เหมาะสมที่สุดสำหรับแต่ละโมเดล (หลังจากปรับ hyperparameters แล้ว)
- มีโอกาสสูงที่จะค้นพบโมเดลที่ให้ผลลัพธ์ดีที่สุดจากทุกโมเดล

**ข้อเสีย**:
- ใช้เวลามากขึ้น เพราะคุณต้อง tune hyperparameters ของทุกโมเดล ซึ่งอาจใช้ทรัพยากรและเวลามาก

### คำแนะนำ
- **แนวทางที่ 1** เหมาะสำหรับกรณีที่มีเวลาจำกัดหรือเมื่อแค่ต้องการดูว่าโมเดลใดทำงานได้ดีที่สุดในเบื้องต้น โดยไม่ต้องการเสียเวลามากกับการ tune โมเดลทุกตัว
- **แนวทางที่ 2** เหมาะสำหรับกรณีที่ต้องการความแม่นยำสูงสุดและต้องการให้ทุกโมเดลมีโอกาสในการแสดงผลลัพธ์ที่ดีที่สุด เนื่องจากแต่ละโมเดลอาจทำงานได้ดีที่สุดเมื่อมีการปรับ hyperparameters อย่างเหมาะสม

หากคุณมีทรัพยากรเพียงพอ การใช้ **แนวทางที่ 2** จะให้ผลลัพธ์ที่ครอบคลุมกว่าและมีโอกาสสูงในการค้นพบโมเดลที่ทำงานดีที่สุดในสถานการณ์จริง
