# Exercise 2: Keras Tuner â€” Bayesian Optimization on Fashion-MNIST

Explore systematic hyperparameter optimization using Keras Tuner. The principle behind it is quite
similar to the implementation we did by hand, however it has support for other search algorithms
than random search. Documentation can be found under: https://keras.io/keras_tuner/.



Tasks:
1. Choose and explain one optimization strategy:
    - Either Hyperband Optimization
    - Bayesian Optimization  

    Briefly describe how the chosen method works (search for an appropriate reference paper or other academic resource! 
    Hint: look at what keras-tuner cites) and compare to random search.  

2. Implement the search:
Use Keras Tuner with your chosen strategy on the Fashion MNIST dataset
Build a small comparison experiment with random search from exercise 1. (e.g. convergence
speed.)
Is such an approach inherently better than random search with additional manual tuning?
Reason in one sentence.


Deliverables: A notebook or script+markdown demonstrating your implementation with clear
explanations and a summary of your findings. Note any use of an LLM in detail please.


--------------------

## How Bayesian Optimization Works

Bayesian Optimization is an iterative method for hyperparameter tuning that treats the validation performance as a probabilistic surrogate model, commonly a Gaussian Process. An acquisition function is then used to propose new hyperparameter settings by trading off exploration of uncertain regions and exploitation of areas likely to yield high performance. In contrast to random search, which samples hyperparameters independently, Bayesian Optimization utilizes information from previous evaluations to inform future trials, often resulting in greater sample efficiency (Garnett, 2015).

References:
[1] Garnett, R. (2015). "Bayesian Optimization." Lecture notes, CSE 515T: Bayesian Methods in Machine Learning, Washington University in St. Louis. Available at: https://www.cse.wustl.edu/~garnett/cse515t/spring_2015/files/lecture_notes/12.pdf



In [1]:
# Setup: install and import dependencies
import tensorflow as tf
import numpy as np
import random
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split




In [2]:
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
tf.keras.utils.set_random_seed(SEED)

plt.rcParams["figure.figsize"] = (6, 4)
plt.rcParams["axes.grid"] = True

In [3]:
# use fashin dataset

from tensorflow import keras

(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()

X_train_full = (X_train_full.astype("float32") / 255.0)[..., None]
X_test = (X_test.astype("float32") / 255.0)[..., None]

X_tr, X_val, y_tr, y_val = train_test_split(
    X_train_full, y_train_full, test_size=0.2, stratify=y_train_full, random_state=SEED
)

X_tr.shape, X_val.shape, X_test.shape

((48000, 28, 28, 1), (12000, 28, 28, 1), (10000, 28, 28, 1))

In [4]:
from tensorflow.keras import layers

def build_model(hp):
    model = keras.Sequential()
    model.add(layers.Input(shape=(28, 28, 1)))
    model.add(layers.Flatten())

    # Tune number of units
    units = hp.Int("units", min_value=64, max_value=512, step=64)
    model.add(layers.Dense(units, activation="relu"))

    # Tune dropout
    dropout = hp.Float("dropout", min_value=0.0, max_value=0.5, step=0.1)
    model.add(layers.Dropout(dropout))

    model.add(layers.Dense(10, activation="softmax"))

    # Tune learning rate
    lr = hp.Float("learning_rate", 1e-4, 1e-2, sampling="log")

    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=lr),
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"]
    )

    return model


In [5]:
import keras_tuner
tuner = keras_tuner.BayesianOptimization(
    build_model,
    objective="val_accuracy",
    max_trials=20,
    directory="kt_dir",
    project_name="fashion_mnist_bo",
    seed=SEED
)


In [6]:
tuner.search(
    X_tr, y_tr,
    epochs=10,
    validation_data=(X_val, y_val),
    batch_size=128,
    verbose=1
)



Trial 20 Complete [00h 00m 12s]
val_accuracy: 0.8895000219345093

Best val_accuracy So Far: 0.8913333415985107
Total elapsed time: 00h 03m 24s


In [7]:
random_tuner = keras_tuner.RandomSearch(
    build_model,
    objective="val_accuracy",
    max_trials=20,
    directory="kt_dir",
    project_name="random_search",
    seed=SEED
)

random_tuner.search(
    X_tr, y_tr,
    epochs=10,
    validation_data=(X_val, y_val),
    batch_size=128,
    verbose=1
)

Trial 20 Complete [00h 00m 10s]
val_accuracy: 0.8838333487510681

Best val_accuracy So Far: 0.8927500247955322
Total elapsed time: 00h 03m 14s


In [8]:
best_model_bo = tuner.get_best_models()[0]

print("EVAL of Bayesian Optimization best model:")
test_loss, test_acc = best_model_bo.evaluate(X_test, y_test, verbose=0)
print("Test accuracy:", test_acc)
print("Test loss:", test_loss)

print("\nEVAL of Random Search best model:")
best_model_rs = random_tuner.get_best_models()[0]
test_loss, test_acc = best_model_rs.evaluate(X_test, y_test, verbose=0)
print("Test accuracy:", test_acc)
print("Test loss:", test_loss)



  saveable.load_own_variables(weights_store.get(inner_path))


EVAL of Bayesian Optimization best model:
Test accuracy: 0.8795999884605408
Test loss: 0.33820289373397827

EVAL of Random Search best model:
Test accuracy: 0.8787999749183655
Test loss: 0.3380807340145111


In [9]:
## Compare the hyperparameters:

random_best = random_tuner.get_best_hyperparameters(1)[0]
bayes_best = tuner.get_best_hyperparameters(1)[0]

print("Best hyperparameters from Random Search:\n", random_best.values)
print("Best hyperparameters from Bayesian Optimization:\n", bayes_best.values)

Best hyperparameters from Random Search:
 {'units': 256, 'dropout': 0.2, 'learning_rate': 0.0006752863927347823}
Best hyperparameters from Bayesian Optimization:
 {'units': 512, 'dropout': 0.2, 'learning_rate': 0.000686485764629306}


**Is Bayesian Optimization inherently better than random search? (one sentence)**   
Bayesian Optimization is not inherently better than random search with additional manual tuning, but it can reach competitive hyperparameter configurations more efficiently by exploiting information from previous trials.

Short info: Both Bayesian Optimization and Random Search achieved very similar accuracy, with Random Search slightly higher, and Random Search was a bit faster, taking about 3:14 min versus 3:24 min for Bayesian Optimization.

**LLM usage:**

- Used the keras docu for the coding part and also copilot auto completion.
- for the Evaluation prints I just took ChatGPT cause i was lazy
- And also for the interpretations / reasonings i used chatGPT to rewrite it with better words