## Hyperparameter Tuning

 process of optimizing the hyperparameters of a machine learning model in order to improve its performance

 **Hyperparameters:**
 - configuration settings that are not learned from the data but are set prior to the training process
 - essential aspects of the model architecture and training procedure that influence the learning process
 - determine key features such as model architecture, learning rate, and model complexity
- there are no set rules on which hyperparameters work best nor their optimal or default values
-  to find the optimum hyperparameter set. This activity is known as hyperparameter tuning or hyperparameter optimization.


### Hyperparameter Tuning Techniques
1. Grid Search:
Exhaustive search over a predefined hyperparameter grid.
Evaluates model performance for all possible combinations.

2. Random Search:
Randomly samples hyperparameter combinations.
More computationally efficient than grid search.

3. Bayesian Optimization:
Uses probabilistic models to model the objective function.
Adapts and focuses the search on promising hyperparameter regions.

5. Sequential Model-Based Optimization (SMBO):
Combines surrogate model predictions with acquisition functions.
Balances exploration-exploitation trade-off efficiently.
5. Gradient-Based Optimization:
Derivative-based optimization methods.
Efficient for continuous hyperparameters but less common in discrete spaces.

In [15]:
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import numpy as np

In [2]:
iris = datasets.load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

**Grid Search:**

**Definition:**
- Exhaustive search over a predefined hyperparameter grid.
- Systematically evaluates all possible combinations of hyperparameter values.

**Some Popular Algorithms it is Used With:**
- Grid search can be applied to a wide range of machine learning algorithms, including:
  1. Support Vector Machines (SVM)
  2. Decision Trees
  3. Random Forest
  4. k-Nearest Neighbors (k-NN)
  5. Gradient Boosting algorithms (e.g., XGBoost)

**When it is Useful:**
- **Simple Hyperparameter Spaces:** Grid search is effective when the hyperparameter space is relatively small and simple.
- **Exploring Interactions:** It helps in exploring interactions between hyperparameters.
- **Baseline Search:** It provides a good baseline tuning method.

**Examples of Usefulness:**
- Grid search is useful when tuning hyperparameters like learning rates, regularization strengths, or kernel types.
- In a decision tree, it can explore different depths and minimum samples per leaf.

**When it is Not Useful:**
- **Large Search Spaces:** Grid search becomes impractical when dealing with a large number of hyperparameter combinations.
- **Continuous Hyperparameters:** It's less effective when hyperparameters are continuous, as it might miss optimal values.
  
**Examples of Not Usefulness:**
- In deep neural networks with many hyperparameters, exploring all combinations exhaustively can be computationally expensive.
- When searching for optimal values of a learning rate in a continuous space.


In [3]:
svm_model = SVC()

param_grid = {
    'C': [0.1, 1, 10, 100],          # Regularization parameter
    'kernel': ['linear', 'rbf'],    # Kernel type
    'gamma': ['scale', 'auto'],     # Kernel coefficient
}

grid_search = GridSearchCV(estimator=svm_model, param_grid=param_grid, cv=5)

In [4]:
grid_search.fit(X_train, y_train)
print("Best Hyperparameters:", grid_search.best_params_)

Best Hyperparameters: {'C': 1, 'gamma': 'scale', 'kernel': 'linear'}


In [5]:
accuracy = grid_search.best_estimator_.score(X_test, y_test)
print("Test Accuracy:", accuracy)

Test Accuracy: 1.0


**Random Search:**

**Definition:**
- Randomly samples hyperparameter combinations from a predefined search space.
- Provides a more computationally efficient alternative to exhaustive grid search.

**Some Popular Algorithms it is Used With:**
- Random search is versatile and can be used with a variety of machine learning algorithms, including:
  1. Support Vector Machines (SVM)
  2. Decision Trees
  3. Random Forest
  4. Neural Networks
  5. Gradient Boosting algorithms (e.g., XGBoost)

**When it is Useful:**
- **Large Search Spaces:** Random search is beneficial when dealing with a large number of hyperparameter combinations.
- **Efficiency:** It is computationally more efficient than grid search, as it samples a subset of hyperparameter space.
- **Exploration:** Useful for exploring diverse regions of the hyperparameter space.

**Examples of Usefulness:**
- Random search is effective when searching for optimal combinations of hyperparameters like learning rates, regularization strengths, and depths of decision trees.
- In neural networks, it can efficiently sample architectures, dropout rates, and batch sizes.

**When it is Not Useful:**
- **Interactions Between Hyperparameters:** Random search might miss interactions between hyperparameters, as it samples independently.
- **Fine-Tuning:** Not suitable for fine-tuning or narrowing down the search space once a general idea is obtained.

**Examples of Not Usefulness:**
- In scenarios where there are strong interactions between multiple hyperparameters, random search may not explore these relationships thoroughly.
- If a more focused search is needed after an initial exploration, random search might not be the best choice.


In [6]:
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from scipy.stats import uniform, loguniform

svm_model = SVC()

param_dist = {
    'C': loguniform(1e-3, 1e3),  # Regularization parameter
    'kernel': ['linear', 'rbf'],  # Kernel type
    'gamma': uniform(0.01, 1.0),  # Kernel coefficient
}

random_search = RandomizedSearchCV(estimator=svm_model, param_distributions=param_dist, n_iter=10, cv=5)

random_search.fit(X_train, y_train)

RandomizedSearchCV(cv=5, estimator=SVC(),
                   param_distributions={'C': <scipy.stats._distn_infrastructure.rv_continuous_frozen object at 0x00000258985D73D0>,
                                        'gamma': <scipy.stats._distn_infrastructure.rv_continuous_frozen object at 0x00000258F45F73D0>,
                                        'kernel': ['linear', 'rbf']})

In [7]:
print("Best Hyperparameters:", random_search.best_params_)

Best Hyperparameters: {'C': 0.48443840681289646, 'gamma': 0.26461800657107326, 'kernel': 'linear'}


In [8]:
accuracy = random_search.best_estimator_.score(X_test, y_test)
print("Test Accuracy:", accuracy)

Test Accuracy: 1.0


**Bayesian Optimization:**

**Definition:**
- Bayesian optimization is a probabilistic model-based optimization technique.
- It models the objective function as a probabilistic surrogate, combining prior knowledge with observed results to make informed decisions about where to sample next in the hyperparameter space.

**Algorithms it is Used With:**
- Bayesian optimization is versatile and can be used with various machine learning algorithms, including:
  1. Support Vector Machines (SVM)
  2. Decision Trees
  3. Random Forest
  4. Gaussian Processes for regression
  5. Neural Networks
  6. Gradient Boosting algorithms (e.g., XGBoost)

**When it is Useful:**
- **Expensive Objective Functions:** Useful when evaluating the objective function is computationally expensive.
- **Global Optimization:** Efficient for finding global optima in the hyperparameter space.
- **Fewer Evaluations:** Requires fewer evaluations compared to exhaustive search methods.

**Examples of Usefulness:**
- Bayesian optimization is beneficial when optimizing hyperparameters like learning rates, regularization strengths, and kernel parameters.
- In scenarios where each evaluation of the objective function (e.g., model training) is time-consuming, such as optimizing hyperparameters of a deep neural network.

**When it is Not Useful:**
- **Simple Hyperparameter Spaces:** Might be overkill for small and simple hyperparameter spaces.
- **Low-Dimensional Spaces:** In low-dimensional spaces, simpler optimization methods like grid search or random search might suffice.

**Examples of Not Usefulness:**
- When dealing with a very simple model with only a couple of hyperparameters, Bayesian optimization might be too sophisticated.
- In scenarios where the objective function is not computationally expensive, simpler optimization methods may provide similar results more efficiently.

In [12]:
!pip install optuna

Collecting optuna
  Downloading optuna-3.5.0-py3-none-any.whl (413 kB)
     -------------------------------------- 413.4/413.4 kB 8.6 MB/s eta 0:00:00
Collecting alembic>=1.5.0
  Downloading alembic-1.13.1-py3-none-any.whl (233 kB)
     ------------------------------------- 233.4/233.4 kB 14.9 MB/s eta 0:00:00
Collecting colorlog
  Downloading colorlog-6.8.0-py3-none-any.whl (11 kB)
Collecting Mako
  Downloading Mako-1.3.0-py3-none-any.whl (78 kB)
     ---------------------------------------- 78.6/78.6 kB ? eta 0:00:00
Installing collected packages: Mako, colorlog, alembic, optuna
Successfully installed Mako-1.3.0 alembic-1.13.1 colorlog-6.8.0 optuna-3.5.0


In [13]:
import optuna
from sklearn.model_selection import cross_val_score

def objective(trial):
    # Define the hyperparameter search space
    C = trial.suggest_loguniform('C', 0.001, 10.0)
    gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
    kernel = trial.suggest_categorical('kernel', ['linear', 'rbf'])

    # Create SVM classifier with sampled hyperparameters
    clf = SVC(C=C, gamma=gamma, kernel=kernel, random_state=42)

    # Perform cross-validation and calculate the mean accuracy
    scores = cross_val_score(clf, X, y, cv=5, n_jobs=-1, scoring='accuracy')
    accuracy = np.mean(scores)

    return accuracy

In [16]:
# Create study object and specify the direction of optimization (maximize accuracy)
study = optuna.create_study(direction='maximize')

# Perform the optimization with a specified number of trials
study.optimize(objective, n_trials=100)

[I 2024-01-15 23:44:35,564] A new study created in memory with name: no-name-7ad273a9-af46-4bfa-bc58-ce846eb371b5
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:37,291] Trial 0 finished with value: 0.9200000000000002 and parameters: {'C': 0.04392078104274901, 'gamma': 0.02259338514385326, 'kernel': 'rbf'}. Best is trial 0 with value: 0.9200000000000002.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:38,619] Trial 1 finished with value: 0.9800000000000001 and parameters: {'C': 3.541419034423642, 'gamma': 0.022407556082365394, 'kernel': 'rbf'}. Best is trial 1 with value: 0.9800000000000001.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:38,653] Trial 2 finished with value: 0.96 and parameters: {'C': 0.0569177468797121, 'gamma': 0.13279334069290813, 'kern

  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:38,944] Trial 10 finished with value: 0.9533333333333334 and parameters: {'C': 8.607796974829597, 'gamma': 1.5883186792693604, 'kernel': 'rbf'}. Best is trial 1 with value: 0.9800000000000001.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:38,994] Trial 11 finished with value: 0.9733333333333334 and parameters: {'C': 1.8312862476045066, 'gamma': 0.417560240079788, 'kernel': 'rbf'}. Best is trial 1 with value: 0.9800000000000001.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:39,055] Trial 12 finished with value: 0.9800000000000001 and parameters: {'C': 4.436869121299874, 'gamma': 0.34417377996179305, 'kernel': 'rbf'}. Best is trial 1 with value: 0.9800000000000001.
  C = trial.suggest_loguniform('C', 0.001,

[I 2024-01-15 23:44:39,382] Trial 19 finished with value: 0.9533333333333334 and parameters: {'C': 0.20215743575773873, 'gamma': 0.1036629907605985, 'kernel': 'rbf'}. Best is trial 1 with value: 0.9800000000000001.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:39,441] Trial 20 finished with value: 0.9333333333333333 and parameters: {'C': 0.0035524157617867294, 'gamma': 0.2352289837915671, 'kernel': 'rbf'}. Best is trial 1 with value: 0.9800000000000001.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:39,490] Trial 21 finished with value: 0.9800000000000001 and parameters: {'C': 7.890969472619341, 'gamma': 0.2643694492569216, 'kernel': 'rbf'}. Best is trial 1 with value: 0.9800000000000001.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:39,534] Trial 22 

  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:39,925] Trial 30 finished with value: 0.96 and parameters: {'C': 0.5635854247254612, 'gamma': 0.05460780368580338, 'kernel': 'rbf'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:39,971] Trial 31 finished with value: 0.9800000000000001 and parameters: {'C': 4.634856550265469, 'gamma': 0.2845492036936936, 'kernel': 'rbf'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:40,013] Trial 32 finished with value: 0.9733333333333334 and parameters: {'C': 1.7382520287222343, 'gamma': 0.5945951229705632, 'kernel': 'rbf'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  

[I 2024-01-15 23:44:40,320] Trial 39 finished with value: 0.9866666666666667 and parameters: {'C': 1.3848482872725763, 'gamma': 0.010912555085191976, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:40,363] Trial 40 finished with value: 0.9866666666666667 and parameters: {'C': 0.7231209608598994, 'gamma': 0.012458346711937596, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:40,405] Trial 41 finished with value: 0.9866666666666667 and parameters: {'C': 0.7273274926366928, 'gamma': 0.01144527147515682, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:40,

  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:40,831] Trial 50 finished with value: 0.9866666666666667 and parameters: {'C': 0.7664734494929948, 'gamma': 0.0016004581854073346, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:40,875] Trial 51 finished with value: 0.9800000000000001 and parameters: {'C': 0.3976543170604983, 'gamma': 0.0034711595914895406, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:40,918] Trial 52 finished with value: 0.9866666666666667 and parameters: {'C': 0.5384549302153961, 'gamma': 0.002103986101811137, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_

  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:41,302] Trial 60 finished with value: 0.9533333333333334 and parameters: {'C': 0.06985059830819662, 'gamma': 0.0020993851049511338, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:41,347] Trial 61 finished with value: 0.9866666666666667 and parameters: {'C': 0.6268459571709875, 'gamma': 0.001774349465628229, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:41,392] Trial 62 finished with value: 0.9800000000000001 and parameters: {'C': 0.8180607732663709, 'gamma': 0.0016920575187907233, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest

  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:41,757] Trial 70 finished with value: 0.9733333333333334 and parameters: {'C': 3.3977143338705145, 'gamma': 0.12999973557667888, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:41,805] Trial 71 finished with value: 0.9866666666666667 and parameters: {'C': 0.5083424289967795, 'gamma': 0.002042554228279958, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:41,851] Trial 72 finished with value: 0.9800000000000001 and parameters: {'C': 0.4917390387956743, 'gamma': 0.0013273565325825592, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_lo

  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:42,214] Trial 80 finished with value: 0.9800000000000001 and parameters: {'C': 0.463582550947832, 'gamma': 0.18429846864110633, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:42,259] Trial 81 finished with value: 0.9866666666666667 and parameters: {'C': 0.8691808475203866, 'gamma': 0.001035230916877087, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:42,349] Trial 82 finished with value: 0.9800000000000001 and parameters: {'C': 1.0140470281582676, 'gamma': 0.0015923845166176708, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_log

[I 2024-01-15 23:44:42,668] Trial 89 finished with value: 0.9800000000000001 and parameters: {'C': 0.3688260743623751, 'gamma': 0.1271995114927495, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:42,719] Trial 90 finished with value: 0.9133333333333334 and parameters: {'C': 0.0010122670212338806, 'gamma': 0.02691298071940257, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:42,763] Trial 91 finished with value: 0.9866666666666667 and parameters: {'C': 0.6588369871965236, 'gamma': 0.017334535482481588, 'kernel': 'linear'}. Best is trial 25 with value: 0.9866666666666667.
  C = trial.suggest_loguniform('C', 0.001, 10.0)
  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:42

  gamma = trial.suggest_loguniform('gamma', 0.001, 10.0)
[I 2024-01-15 23:44:43,194] Trial 99 finished with value: 0.9200000000000002 and parameters: {'C': 2.004375597071688, 'gamma': 0.002412998147963223, 'kernel': 'rbf'}. Best is trial 25 with value: 0.9866666666666667.


In [17]:
print('Best trial:')
trial = study.best_trial
print('Accuracy: {:.4f}'.format(trial.value))
print('Hyperparameters: {}'.format(trial.params))

Best trial:
Accuracy: 0.9867
Hyperparameters: {'C': 4.351528425443894, 'gamma': 0.17207401288729915, 'kernel': 'rbf'}


**Gradient-Based Optimization:**

**Definition:**
- Gradient-based optimization involves using the gradient (partial derivatives) of the objective function with respect to the hyperparameters to guide the search for optimal values.
- Iteratively updates hyperparameters in the direction of steepest ascent or descent based on the gradient.

**Algorithms it is Used With:**
- Gradient-based optimization is commonly used with algorithms that involve differentiable objective functions, such as:
  1. Neural Networks (e.g., using stochastic gradient descent)
  2. Linear Regression
  3. Logistic Regression
  4. Support Vector Machines (using techniques like SMO)
  5. Linear Discriminant Analysis

**When it is Useful:**
- **Differentiable Objective Functions:** Effective when the objective function is differentiable, allowing computation of gradients.
- **Smooth Surfaces:** Suitable for optimizing smooth, continuous objective functions.
- **Local Search:** Efficient for fine-tuning in the vicinity of promising solutions.

**Examples of Usefulness:**
- Gradient-based optimization is valuable in training deep neural networks by updating weights to minimize the loss function.
- In linear regression, it is used to find the coefficients that minimize the sum of squared differences between predicted and actual values.

**When it is Not Useful:**
- **Discontinuous or Nondifferentiable Functions:** Ineffective when dealing with functions that are not differentiable or have discontinuities.
- **Global Optimization:** May get stuck in local minima and struggle to find the global minimum in complex, non-convex spaces.

**Examples of Not Usefulness:**
- In genetic algorithms or evolutionary strategies, where the objective function might not be differentiable, gradient-based optimization is not suitable.
- For hyperparameter tuning in complex models like deep neural networks with non-convex loss surfaces, it might struggle to find the global optimum.


In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation='relu', input_shape=(4,)),
    tf.keras.layers.Dense(3, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=50, verbose=1)

In [None]:
y_pred = model.predict(X_test)
y_pred_classes = tf.argmax(y_pred, axis=1).numpy()

accuracy = accuracy_score(y_test, y_pred_classes)
print("Test Accuracy:", accuracy)

**Sequential Model-Based Optimization (SMBO):**

**Definition:**
- Sequential Model-Based Optimization (SMBO) is an optimization technique that combines probabilistic surrogate models with acquisition functions to sequentially optimize the objective function.
- It iteratively fits surrogate models to the observed data and uses them to propose the next set of hyperparameters to evaluate.

**Algorithms it is Used With:**
- SMBO can be used with various machine learning algorithms, including:
  1. Support Vector Machines (SVM)
  2. Decision Trees
  3. Random Forest
  4. Neural Networks
  5. Gradient Boosting algorithms (e.g., XGBoost)

**When it is Useful:**
- **Expensive Objective Functions:** Effective when evaluating the objective function is computationally expensive.
- **Global Optimization:** Efficient for finding global optima in the hyperparameter space.
- **Adaptation:** Adapts to the characteristics of the optimization landscape over iterations.

**Examples of Usefulness:**
- In optimizing hyperparameters like learning rates, regularization strengths, and depths of decision trees.
- In scenarios where each evaluation of the objective function, such as training a complex model, is time-consuming.

**When it is Not Useful:**
- **Simple Hyperparameter Spaces:** Might be overkill for small and simple hyperparameter spaces.
- **Low-Dimensional Spaces:** In low-dimensional spaces, simpler optimization methods like grid search or random search might suffice.

**Examples of Not Usefulness:**
- When dealing with a very simple model with only a couple of hyperparameters, SMBO might be too sophisticated.
- In scenarios where the objective function is not computationally expensive, simpler optimization methods may provide similar results more efficiently.

additional info:
1. https://aws.amazon.com/what-is/hyperparameter-tuning/