<a href="https://www.kaggle.com/code/mohamedmustafashaban/keras-tuner-nlp-cv-ml?scriptVersionId=224817405" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

**Hyperparameter Tuning with Keras Tuner: Optimizing Deep Learning Models**

Hyperparameter tuning is a critical step in developing high-performing deep learning models. **Keras Tuner** simplifies this process by automating the search for optimal hyperparameters. In this work, we demonstrate the power of **Keras Tuner** across three different domains: **Computer Vision (MNIST Digit Classification), Regression (California Housing Prices), and Natural Language Processing (20 Newsgroups Text Classification).**

### 1️⃣ Computer Vision: MNIST Digit Classification
For the **MNIST handwritten digit classification task**, we build a Convolutional Neural Network (CNN) and tune:
- Number of convolutional layers
- Filter sizes
- Learning rate
- Batch size

Using **Hyperband**, we accelerate the search process and identify an optimized model with minimal computational cost.

### 2️⃣ Regression: California Housing Prices
For the **California Housing dataset**, we implement a **Fully Connected Neural Network (FCNN)** to predict housing prices based on input features. We optimize:
- Number of neurons in hidden layers
- Learning rate
- Dropout rate

With **Bayesian Optimization**, we efficiently explore the hyperparameter space and achieve a model that generalizes well on unseen data.

### 3️⃣ NLP: 20 Newsgroups Text Classification
For the **20 Newsgroups text classification task**, we use a **Neural Network (NN) for text classification** and optimize key hyperparameters such as:
- Number of hidden layers
- Number of neurons per layer
- Learning rate
- Batch size

Using **Random Search**, we identify the best combination of these hyperparameters, leading to improved accuracy on text classification.

### Conclusion
Keras Tuner provides a powerful and efficient way to optimize hyperparameters across various domains. Whether for **computer vision, regression, or text classification**, leveraging **Random Search, Bayesian Optimization, or Hyperband** enables us to systematically improve model performance. The flexibility and ease of integration make **Keras Tuner** an essential tool for deep learning practitioners.

1. **Imports**:
   - `tensorflow`: The main library for building and training neural networks.
   - `keras`: A high-level API within TensorFlow for building neural networks.
   - `mnist`: A dataset of handwritten digits.
   - `Dense`, `Flatten`: Layers used in building the neural network.
   - `keras_tuner`: A library for hyperparameter tuning.


In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Dense, Flatten
import keras_tuner as kt

2. **Loading Data**:
This line loads the MNIST dataset, splitting it into training and testing sets.

In [2]:
# Load MNIST Data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


3. **Normalization**:
   
   Here, we normalize the pixel values to be between 0 and 1, which helps in faster convergence during training

In [3]:
# Normalize the data
X_train, X_test = X_train / 255.0, X_test / 255.0


4. **Model Building Function**:
   
   This function constructs the neural network model. The `hp` parameter allows tuning of hyperparameters.

5. **Adding Layers**:

   This loop adds a tunable number of hidden layers (between 1 and 3). Each layer's size is also tunable.

6. **Output Layer**:
  
   The output layer has 10 units (one for each digit) with a softmax activation function to produce probability distributions.

7. **Model Compilation**:
   
   The model is compiled with a tunable learning rate and uses the Adam optimizer, which adjusts the learning rate during training.

In [4]:
# Function to build a model with tunable parameters
def build_model(hp):
    model = keras.Sequential()
    model.add(Flatten(input_shape=(28, 28)))

    # Tune the number of hidden layers and units in each layer
    for i in range(hp.Int('num_layers', 1, 3)):
        model.add(Dense(units=hp.Int(f'units_{i}', min_value=32, max_value=256, step=32), activation='relu'))

    # Output layer
    model.add(Dense(10, activation='softmax'))

    # Compile model with tunable learning rate
    model.compile(optimizer=keras.optimizers.Adam(hp.Choice('learning_rate', [0.001, 0.0005, 0.0001])),
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

8. **Hyperparameter Tuning**:
   
   This sets up the Keras Tuner to randomly search for the best hyperparameters based on validation accuracy.

9. **Executing the Search**:
  
   This command runs the hyperparameter tuning process, training the model for 10 epochs while using 20% of the training data for validation.

In [5]:
# Use Keras Tuner to find the best hyperparameters
tuner = kt.RandomSearch(build_model,
                        objective='val_accuracy',
                        max_trials=10,  # Number of trials
                        executions_per_trial=1,
                        directory='mnist_tuning',
                        project_name='mnist')

# Execute the search
tuner.search(X_train, y_train, epochs=10, validation_split=0.2)

Trial 10 Complete [00h 00m 30s]
val_accuracy: 0.9789166450500488

Best val_accuracy So Far: 0.9789166450500488
Total elapsed time: 00h 05m 06s


10. **Retrieving Best Hyperparameters**:
    
    Finally, this line retrieves the best hyperparameters found during the tuning process and prints them out.

In [6]:
# Get the best model hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
print(f"Best number of layers: {best_hps.get('num_layers')}, Best units per layer: {best_hps.get('units_0')}, Learning rate: {best_hps.get('learning_rate')}")

Best number of layers: 2, Best units per layer: 128, Learning rate: 0.0005


### Optimization Tips
- **Increase `max_trials`**: To explore more combinations of hyperparameters.
- **Adjust `epochs`**: Increase the number of epochs for better training, especially if the model is underfitting.
- **Early Stopping**: Implement early stopping to prevent overfitting.
- **Learning Rate Scheduler**: Consider using a learning rate scheduler for better training dynamics.

This code effectively uses Keras Tuner to optimize a simple neural network for the MNIST dataset by searching through various architectures and hyperparameters.

In [7]:
import tensorflow as tf
import keras_tuner as kt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [8]:
# Load California housing prices dataset
data = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

**Normalize the data**
The data is normalized to have a mean of 0 and a standard deviation of 1, which is important for training neural networks effectively.

In [9]:
# Normalize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


In [10]:
# Function to build the model
def build_model(hp):
    model = tf.keras.Sequential()
    
    # Tune the number of hidden layers and the number of units in each layer
    for i in range(hp.Int('num_layers', 1, 3)):
        model.add(tf.keras.layers.Dense(units=hp.Int(f'units_{i}', 16, 128, step=16), activation='relu'))
    
    model.add(tf.keras.layers.Dense(1))  # Output layer

    # Tune the learning rate
    model.compile(optimizer=tf.keras.optimizers.Adam(hp.Choice('learning_rate', [0.001, 0.0005, 0.0001])),
                  loss='mse',
                  metrics=['mae'])
    return model


In [11]:
# Create Keras Tuner for finding the best model
tuner = kt.BayesianOptimization(build_model,
                                objective='val_mae',
                                max_trials=10,
                                directory='housing_tuning',
                                project_name='housing')

# Execute the search
tuner.search(X_train, y_train, epochs=10, validation_split=0.2)

Trial 10 Complete [00h 00m 13s]
val_mae: 0.4646338224411011

Best val_mae So Far: 0.38897740840911865
Total elapsed time: 00h 01m 52s


In [12]:
# Display the best hyperparameters found
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
print(f"Best number of layers: {best_hps.get('num_layers')}, Best units: {best_hps.get('units_0')}, Learning rate: {best_hps.get('learning_rate')}")

Best number of layers: 3, Best units: 112, Learning rate: 0.0005



1. **Imports**:
   - `tensorflow`: For building and training the neural network.
   - `keras_tuner`: For hyperparameter tuning.
   - Layers such as `Embedding`, `LSTM`, `Dense`, and `Bidirectional` are imported for constructing the model.
   - `Tokenizer` and `pad_sequences` are used for preprocessing text data.
   - `train_test_split`: For splitting the dataset into training and testing sets.
   - `fetch_20newsgroups`: To load the dataset for text classification.

In [13]:
import tensorflow as tf
import keras_tuner as kt
from tensorflow.keras.layers import Embedding, LSTM, Dense, Bidirectional
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_20newsgroups


2. **Loading Data**:

   This code fetches the 20 Newsgroups dataset and splits it into training and testing sets, using 20% of the data for testing.

In [14]:
# Load news data
data = fetch_20newsgroups(subset='all', categories=['rec.sport.baseball', 'sci.space'], remove=('headers', 'footers', 'quotes'))
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

3. **Text Preprocessing**:

   The `Tokenizer` converts text data into sequences of integers, and `pad_sequences` ensures that all sequences have the same length (200 in this case).

In [15]:
# Convert texts to numerical sequences
tokenizer = Tokenizer(num_words=10000, oov_token="<OOV>")
tokenizer.fit_on_texts(X_train)
X_train_seq = pad_sequences(tokenizer.texts_to_sequences(X_train), maxlen=200, padding='post', truncating='post')
X_test_seq = pad_sequences(tokenizer.texts_to_sequences(X_test), maxlen=200, padding='post', truncating='post')

4. **Building the Model**:
   
   This function defines the model architecture. The `hp` parameter allows for tuning of hyperparameters.

5. **Adding Layers**:
  
   This loop adds a tunable number of Bidirectional LSTM layers (from 1 to 3) with tunable units (from 32 to 128, in steps of 32).

6. **Final Layer**:
   
   The final LSTM layer has a fixed size of 64 units, and the output layer uses a sigmoid activation function for binary classification.

7. **Compiling the Model**:
 
   The model is compiled with the Adam optimizer, binary cross-entropy loss, and accuracy as a metric. The learning rate is also tunable.

In [16]:
# Function to build the model
def build_model(hp):
    model = tf.keras.Sequential()
    model.add(Embedding(input_dim=10000, output_dim=hp.Int('embedding_dim', 32, 128, step=32), input_length=200))
    
    # Tune the number of LSTM layers
    for i in range(hp.Int('num_layers', 1, 3)):
        model.add(Bidirectional(LSTM(units=hp.Int(f'lstm_units_{i}', 32, 128, step=32), return_sequences=True)))
    
    model.add(Bidirectional(LSTM(64)))  # Final layer
    model.add(Dense(1, activation='sigmoid'))  # Output layer

    # Tune the learning rate
    model.compile(optimizer=tf.keras.optimizers.Adam(hp.Choice('learning_rate', [0.001, 0.0005, 0.0001])),
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    return model

8. **Setting Up Keras Tuner**:

   This initializes the Keras Tuner using Hyperband optimization to find the best hyperparameters based on validation accuracy.

In [17]:
# Search for the best hyperparameters using Keras Tuner
tuner = kt.Hyperband(build_model,
                     objective='val_accuracy',
                     max_epochs=10,
                     directory='text_tuning',
                     project_name='text_classification')



9. **Executing the Search**:

   This command runs the hyperparameter tuning process over 10 epochs, using 20% of the training data for validation.

In [18]:
# Execute the search
tuner.search(X_train_seq, y_train, epochs=10, validation_split=0.2)

Trial 30 Complete [00h 00m 32s]
val_accuracy: 0.9085173606872559

Best val_accuracy So Far: 0.9274448156356812
Total elapsed time: 00h 06m 37s


10. **Retrieving Best Hyperparameters**:
  
    This line retrieves the best hyperparameters found during the tuning process and prints them.

In [19]:
# Display the best hyperparameters found
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
print(f"Best number of layers: {best_hps.get('num_layers')}, Best LSTM units: {best_hps.get('lstm_units_0')}, Learning rate: {best_hps.get('learning_rate')}")

Best number of layers: 1, Best LSTM units: 64, Learning rate: 0.0005


### Optimization Tips
- **Adjust `num_words` in Tokenizer**: Increasing this can capture more vocabulary, but may also increase complexity.
- **Early Stopping**: Implement early stopping to prevent overfitting if the validation accuracy stops improving.
- **Experiment with Dropout Layers**: Adding dropout layers can help reduce overfitting.

This code effectively tunes a Bidirectional LSTM model for classifying text data, showcasing the capabilities of Keras Tuner for optimizing hyperparameters in deep learning models