## Part 1h: TensorFlow Keras - Callbacks and TensorBoard

**Description:**

This Colab demonstrates the use of Keras callbacks to monitor and manage the training of a neural network. We will use the `ModelCheckpoint` callback to save the best model weights during training, the `EarlyStopping` callback to prevent overfitting, and the `TensorBoard` callback to visualize training metrics, model graphs, and more in the TensorBoard web interface.

We will train a simple neural network on the `digits` dataset and use these callbacks to enhance the training process.

In [2]:
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
import datetime
import os

# Load the digits dataset
digits = load_digits()
X, y = digits.data, digits.target

# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Define the model
model = models.Sequential([
    layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Define the callbacks
# 1. ModelCheckpoint: Save the model with the best validation accuracy
checkpoint_filepath = 'model_checkpoint/best_model.keras'  # Changed the filepath to end with .keras
model_checkpoint_callback = ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_best_only=True,
    monitor='val_accuracy',
    mode='max',
    verbose=1
)

# 2. EarlyStopping: Stop training when validation loss doesn't improve
early_stopping_callback = EarlyStopping(
    monitor='val_loss',
    patience=10,
    restore_best_weights=True,
    verbose=1
)

# 3. TensorBoard: Log metrics and more for visualization
log_dir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)

# Train the model with the callbacks
history = model.fit(
    X_train, y_train,
    epochs=100,
    validation_data=(X_test, y_test),
    callbacks=[model_checkpoint_callback, early_stopping_callback, tensorboard_callback],
    verbose=0  # Reduce verbosity during training
)

# Load the best model saved by ModelCheckpoint
best_model = tf.keras.models.load_model(checkpoint_filepath)

# Evaluate the best model on the test set
loss, accuracy = best_model.evaluate(X_test, y_test, verbose=0)
print(f"\nBest Model - Test Accuracy: {accuracy}")

# Instructions on how to view TensorBoard logs in Colab
print("\nInstructions to view TensorBoard:")
print(f"1. Run the following command in a new code cell: `!tensorboard --logdir {log_dir}`")
print("2. Click on the link that TensorBoard provides.")

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)



Epoch 1: val_accuracy improved from -inf to 0.73333, saving model to model_checkpoint/best_model.keras

Epoch 2: val_accuracy improved from 0.73333 to 0.87222, saving model to model_checkpoint/best_model.keras

Epoch 3: val_accuracy improved from 0.87222 to 0.91944, saving model to model_checkpoint/best_model.keras

Epoch 4: val_accuracy improved from 0.91944 to 0.93056, saving model to model_checkpoint/best_model.keras

Epoch 5: val_accuracy improved from 0.93056 to 0.95278, saving model to model_checkpoint/best_model.keras

Epoch 6: val_accuracy improved from 0.95278 to 0.95833, saving model to model_checkpoint/best_model.keras

Epoch 7: val_accuracy improved from 0.95833 to 0.96111, saving model to model_checkpoint/best_model.keras

Epoch 8: val_accuracy did not improve from 0.96111

Epoch 9: val_accuracy improved from 0.96111 to 0.96389, saving model to model_checkpoint/best_model.keras

Epoch 10: val_accuracy did not improve from 0.96389

Epoch 11: val_accuracy did not improve fr

## Results for Part 1h: TensorFlow Keras - Callbacks and TensorBoard

In this experiment, we demonstrated the use of three key Keras callbacks during the training of a simple neural network on the `digits` dataset: `ModelCheckpoint`, `EarlyStopping`, and `TensorBoard`.

* **`ModelCheckpoint`:** This callback was configured to save the model weights to the file `model_checkpoint/best_model.keras` whenever the validation accuracy (`val_accuracy`) improved. The output during training shows that the model was saved multiple times as the validation accuracy increased, reaching a peak of 0.97778.

* **`EarlyStopping`:** This callback monitored the validation loss (`val_loss`) and was set to stop training if no improvement was observed for 10 consecutive epochs (`patience=10`). The output indicates that training was stopped early after 54 epochs because the validation loss did not improve for 10 epochs. Importantly, the callback also restored the best weights found during training (from epoch 44 in this run).

* **`TensorBoard`:** This callback logged various metrics (like loss and accuracy), the model graph, and histograms of weights and biases to the `logs/20250411-010705` directory. These logs can be visualized using the TensorBoard web interface to gain deeper insights into the training process.

The test accuracy of the best model (as restored by `EarlyStopping`) was:

* **Best Model - Test Accuracy:** 0.9778

**Analysis:**

The use of callbacks significantly enhanced the training process:

* `ModelCheckpoint` ensured that we had a saved version of the model that performed best on the validation set, protecting against potential overfitting in later epochs.
* `EarlyStopping` automatically stopped the training when further epochs were unlikely to yield better generalization, saving computational resources and preventing overfitting. The restoration of the best weights ensures that we evaluate the model at its optimal point during training.
* `TensorBoard` provides a powerful tool for visualizing and debugging the training process. By launching TensorBoard with the specified log directory, one can observe the trends in loss and accuracy, examine the distribution of weights and biases, and understand the model's architecture.

**A/B Test (Conceptual):**

An A/B test here could involve comparing the performance of a model trained with these callbacks to one trained for a fixed number of epochs without them. The model with callbacks would likely show better or comparable performance with potentially fewer training epochs and a saved "best" version. Another A/B test could compare different `patience` values for `EarlyStopping` to see how it affects the training duration and final model performance.

To further explore this, you can now run the command `!tensorboard --logdir logs/20250411-010705` in a new Colab cell and examine the visualizations.

Ready to move on to **Part 1i: Using Keras Tuner**?

In [5]:
!tensorboard --logdir logs/20250411-010705

2025-04-11 01:09:39.373325: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1744333779.418056    7151 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744333779.429485    7151 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-11 01:09:44.708618: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)

NOTE: Using experimental fast data loading logic. To disable, pass
    "--load_fast=false" and report issues on GitHub. More details:
    https://github.com/tensorflow/tensorboard/issues/4784

Serving TensorBoard on localhost; to expose to the network, u