<a href="https://colab.research.google.com/github/FVLegion/AI-Studio-ClearML/blob/main/LSTM_HPO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Step 1: Load and Preprocess the Dataset**

First, we need to load your dataset. If you have your own dataset, you'll need to load it accordingly (e.g., from a CSV, text files, etc.). For this example, we'll use the IMDb dataset.

In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Load the dataset
max_features = 10000  # Number of words to consider as features
maxlen = 500  # Cut texts after this many words (among top max_features most common words)

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

# Pad sequences to a fixed length
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test = pad_sequences(x_test, maxlen=maxlen)

print("Dataset loaded and preprocessed.")

Dataset loaded and preprocessed.


In [6]:
# Get the word index
word_index = imdb.get_word_index()

# Reverse the word index to map integer indices to words
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

# Decode a sample review
def decode_review(text):
    return ' '.join([reverse_word_index.get(i - 3, '?') for i in text])

# Number of samples to display
num_samples_to_display = 5 # You can change this number

print(f"Displaying the first {num_samples_to_display} sample reviews:\n")

for i in range(num_samples_to_display):
    decoded_review = decode_review(x_train[i])
    sample_label = y_train[i]

    print(f"Sample Review (Index {i}):")
    print(decoded_review)
    print(f"Sample Label: {sample_label} {'(Positive)' if sample_label == 1 else '(Negative)'}")
    print("-" * 50) # Print a separator for clarity

Displaying the first 5 sample reviews:

Sample Review (Index 0):
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert ? is an amazing actor and now the same being director ? father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the fil

**Step 2: Build the BiLSTM Model**

Now, let's define a function to build your BiLSTM model. This will be useful for hyperparameter tuning later, as you can easily change the model architecture based on the hyperparameters.

In [2]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Bidirectional, LSTM, Dense

def build_bilstm_model(embedding_dim=128, lstm_units=128, dense_units=64, dropout_rate=0.5):
    model = Sequential([
        Embedding(max_features, embedding_dim, input_length=maxlen),
        Bidirectional(LSTM(lstm_units, return_sequences=True)),
        Bidirectional(LSTM(lstm_units)),
        Dense(dense_units, activation='relu'),
        tf.keras.layers.Dropout(dropout_rate),
        Dense(1, activation='sigmoid')  # Sigmoid for binary classification
    ])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

**Step 3: Train the BiLSTM Model**

We can now train a baseline model with some default hyperparameters.

In [3]:
# Build the baseline model
model = build_bilstm_model()

# Train the model
history = model.fit(x_train, y_train, epochs=5, batch_size=32, validation_split=0.2)

print("\nBaseline model training complete.")



Epoch 1/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m60s[0m 84ms/step - accuracy: 0.6841 - loss: 0.5716 - val_accuracy: 0.7846 - val_loss: 0.4277
Epoch 2/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m50s[0m 81ms/step - accuracy: 0.8597 - loss: 0.3438 - val_accuracy: 0.7386 - val_loss: 0.4917
Epoch 3/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m51s[0m 82ms/step - accuracy: 0.8885 - loss: 0.2849 - val_accuracy: 0.8398 - val_loss: 0.3868
Epoch 4/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m82s[0m 81ms/step - accuracy: 0.9344 - loss: 0.1828 - val_accuracy: 0.8668 - val_loss: 0.3472
Epoch 5/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m51s[0m 82ms/step - accuracy: 0.9581 - loss: 0.1212 - val_accuracy: 0.8588 - val_loss: 0.4380

Baseline model training complete.


**Step 4: Hyperparameter Tuning**

For hyperparameter tuning, we can use libraries like Keras Tuner or Hyperopt. Here, I'll show you a simple example using a grid search approach with scikit-learn's GridSearchCV by wrapping your Keras model with KerasClassifier.

First, lets install Keras Tuner:

In [7]:
# !pip install keras-tuner==1.0.2
!pip install keras-tuner
# Add a step to verify the installation
!pip show keras-tuner

import keras_tuner as kt

def build_model_for_tuning(hp):
    embedding_dim = hp.Int('embedding_dim', min_value=64, max_value=256, step=32)
    lstm_units = hp.Int('lstm_units', min_value=64, max_value=256, step=32)
    dense_units = hp.Int('dense_units', min_value=32, max_value=128, step=32)
    dropout_rate = hp.Float('dropout_rate', min_value=0.3, max_value=0.7, step=0.1)

    model = Sequential([
        Embedding(max_features, embedding_dim, input_length=maxlen),
        Bidirectional(LSTM(lstm_units, return_sequences=True)),
        Bidirectional(LSTM(lstm_units)),
        Dense(dense_units, activation='relu'),
        tf.keras.layers.Dropout(dropout_rate),
        Dense(1, activation='sigmoid')
    ])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

# Initialize the tuner
tuner = kt.RandomSearch(
    build_model_for_tuning,
    objective='val_accuracy',
    max_trials=10,  # Number of hyperparameter combinations to try
    executions_per_trial=1,
    directory='my_dir',
    project_name='bilstm_tuning')

# Perform the search
tuner.search(x_train, y_train, epochs=5, validation_split=0.2)

# Get the optimal hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]

print(f"""
The optimal hyperparameters are:
Embedding Dimension: {best_hps.get('embedding_dim')}
LSTM Units: {best_hps.get('lstm_units')}
Dense Units: {best_hps.get('dense_units')}
Dropout Rate: {best_hps.get('dropout_rate')}
""")

# Build and train the final model with the best hyperparameters
best_model = tuner.get_best_models(num_models=1)[0]

print("\nTraining the final model with optimal hyperparameters.")
# You can train this model for more epochs if needed
# history_tuned = best_model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

Name: keras-tuner
Version: 1.0.2
Summary: Hypertuner for Keras
Home-page: https://github.com/keras-team/keras-tuner
Author: The Keras Tuner authors
Author-email: kerastuner@google.com
License: Apache License 2.0
Location: /usr/local/lib/python3.11/dist-packages
Requires: colorama, future, numpy, packaging, requests, scikit-learn, scipy, tabulate, terminaltables, tqdm
Required-by: 


ModuleNotFoundError: No module named 'keras_tuner'

**Explanation:**

1.  **Data Loading and Preprocessing:** We load the IMDb dataset and use `pad_sequences` to ensure all input sequences have the same length, which is required for batching in neural networks.
2.  **Model Building Function:** We define a function `build_bilstm_model` that creates the BiLSTM architecture. This function takes hyperparameters as arguments, making it easy to build different model variations during tuning.
3.  **Baseline Training:** We train a basic model to get a sense of performance before tuning.
4.  **Hyperparameter Tuning with Keras Tuner:**
    *   We define `build_model_for_tuning` which is similar to `build_bilstm_model` but uses `hp` (HyperParameters) to define the search space for each hyperparameter.
    *   `kt.RandomSearch` is initialized to search for the best hyperparameters. We set the objective to maximize validation accuracy.
    *   `tuner.search` runs the hyperparameter search, training different models with different combinations of hyperparameters.
    *   `tuner.get_best_hyperparameters` retrieves the best set of hyperparameters found.
    *   `tuner.get_best_models` retrieves the best-performing model.

**Important Considerations:**

*   **Dataset:** Replace the IMDb data loading with your own dataset loading and preprocessing steps.
*   **Hyperparameter Space:** The hyperparameter ranges and steps in `build_model_for_tuning` are examples. You should adjust them based on your specific problem and dataset.
*   **Tuning Algorithm:** Keras Tuner offers other tuning algorithms like `BayesianOptimization`. You can explore these for potentially better results.
*   **Epochs and Batch Size:** The number of epochs and batch size used during tuning and final training should be chosen carefully.
*   **Validation Set:** It's crucial to use a validation set during tuning to avoid overfitting to the training data.
*   **Evaluation:** After tuning, evaluate your best model on the test set to get a realistic estimate of its performance.

In [8]:
# Install Optuna
!pip install optuna tensorflow

# Import necessary libraries
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Bidirectional, LSTM, Dense
import optuna
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

# Load the dataset (re-including this for clarity, assuming the kernel was restarted)
max_features = 10000  # Number of words to consider as features
maxlen = 500  # Cut texts after this many words (among top max_features most common words)

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

# Pad sequences to a fixed length
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test = pad_sequences(x_test, maxlen=maxlen)

print("Dataset loaded and preprocessed.")


# Define the objective function for Optuna
def objective(trial):
    # Suggest hyperparameters
    embedding_dim = trial.suggest_int('embedding_dim', 64, 256, step=32)
    lstm_units = trial.suggest_int('lstm_units', 64, 256, step=32)
    dense_units = trial.suggest_int('dense_units', 32, 128, step=32)
    dropout_rate = trial.suggest_float('dropout_rate', 0.3, 0.7, step=0.1)
    learning_rate = trial.suggest_float('learning_rate', 1e-4, 1e-2, log=True)

    # Build the model with suggested hyperparameters
    model = Sequential([
        Embedding(max_features, embedding_dim, input_length=maxlen),
        Bidirectional(LSTM(lstm_units, return_sequences=True)),
        Bidirectional(LSTM(lstm_units)),
        Dense(dense_units, activation='relu'),
        tf.keras.layers.Dropout(dropout_rate),
        Dense(1, activation='sigmoid')
    ])

    # Compile the model with a suggested learning rate
    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])

    # Train the model
    # Use validation_split for evaluation during tuning
    history = model.fit(x_train, y_train,
                        epochs=5, # Use a reasonable number of epochs for tuning
                        batch_size=32,
                        validation_split=0.2,
                        verbose=0) # Set verbose to 0 to reduce output during tuning

    # Return the validation accuracy as the objective value to minimize (or maximize by returning negative)
    # Optuna by default minimizes, so we return negative validation accuracy to maximize it
    return history.history['val_accuracy'][-1] # Return the accuracy from the last epoch


# Create a study and optimize
# Specify direction='maximize' to maximize the validation accuracy
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=10) # Run 10 trials

# Print the best hyperparameters and objective value
print("\nOptuna study complete.")
print("Best hyperparameters: ", study.best_params)
print("Best validation accuracy: ", study.best_value)

# You can then build and train the final model with the best hyperparameters found by Optuna
best_hps = study.best_params

print("\nTraining the final model with optimal hyperparameters.")

# Build the final model with the best hyperparameters
final_model = Sequential([
    Embedding(max_features, best_hps['embedding_dim'], input_length=maxlen),
    Bidirectional(LSTM(best_hps['lstm_units'], return_sequences=True)),
    Bidirectional(LSTM(best_hps['lstm_units'])),
    Dense(best_hps['dense_units'], activation='relu'),
    tf.keras.layers.Dropout(best_hps['dropout_rate']),
    Dense(1, activation='sigmoid')
])

# Compile the final model with the best learning rate if it was tuned
final_optimizer = tf.keras.optimizers.Adam(learning_rate=best_hps.get('learning_rate', 0.001)) # Use default if not tuned
final_model.compile(optimizer=final_optimizer, loss='binary_crossentropy', metrics=['accuracy'])

# Train the final model (you might want to train for more epochs here)
# history_tuned = final_model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

print("\nFinal model with optimal hyperparameters built and compiled.")
# Note: The final model is built but not trained in this block.
# You would typically train it in a subsequent cell with the full training data
# or with a different number of epochs.

Collecting optuna
  Downloading optuna-4.3.0-py3-none-any.whl.metadata (17 kB)
Collecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.15.2-py3-none-any.whl.metadata (7.3 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Downloading optuna-4.3.0-py3-none-any.whl (386 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m386.6/386.6 kB[0m [31m30.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading alembic-1.15.2-py3-none-any.whl (231 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m231.9/231.9 kB[0m [31m22.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Installing collected packages: colorlog, alembic, optuna
Successfully installed alembic-1.15.2 colorlog-6.9.0 optuna-4.3.0


[I 2025-05-20 14:05:22,547] A new study created in memory with name: no-name-5a89e5c2-673d-433f-ae26-d4cffc93fa33


Dataset loaded and preprocessed.


[I 2025-05-20 14:11:51,378] Trial 0 finished with value: 0.8741999864578247 and parameters: {'embedding_dim': 64, 'lstm_units': 160, 'dense_units': 64, 'dropout_rate': 0.3, 'learning_rate': 0.0005392401074232552}. Best is trial 0 with value: 0.8741999864578247.
[I 2025-05-20 14:21:59,544] Trial 1 finished with value: 0.8705999851226807 and parameters: {'embedding_dim': 96, 'lstm_units': 256, 'dense_units': 96, 'dropout_rate': 0.6000000000000001, 'learning_rate': 0.00032045211399809025}. Best is trial 0 with value: 0.8741999864578247.
[I 2025-05-20 14:28:53,731] Trial 2 finished with value: 0.8700000047683716 and parameters: {'embedding_dim': 224, 'lstm_units': 192, 'dense_units': 32, 'dropout_rate': 0.3, 'learning_rate': 0.00014607009954030012}. Best is trial 0 with value: 0.8741999864578247.
[I 2025-05-20 14:35:47,991] Trial 3 finished with value: 0.8519999980926514 and parameters: {'embedding_dim': 64, 'lstm_units': 224, 'dense_units': 128, 'dropout_rate': 0.7, 'learning_rate': 0.000


Optuna study complete.
Best hyperparameters:  {'embedding_dim': 256, 'lstm_units': 160, 'dense_units': 96, 'dropout_rate': 0.5, 'learning_rate': 0.002105550978190454}
Best validation accuracy:  0.8925999999046326

Training the final model with optimal hyperparameters.

Final model with optimal hyperparameters built and compiled.
