**Problem statement**- Conventional optimization functions such as the grid search method  and Bayesian optimization method take a lot of time to optimize the hyperparameters of the LSTM model.  this is primarily because it takes a lot of time to perform the test of accuracy for each architecture and for each learning rate value for a given architecture. This  Project presents a very simple but powerful technique to optimize the hyperparameters in a much shorter time.  

**Description**

The approach of this method is to optimize the number of layers first without caring about learning rate in stage 1.   in stage 2 we can optimize the model for optimal learning rate for the given number of layers which we selected during stage 1.

In this particular example we will try to optimize the LSTM model to predict The positive and negative reviews on a given data set.
Here, we will utilize a dataset comprising 50,000 movie reviews from IMDB. Although Keras provides a pre-downloaded dataset that is similar, it is only half the size. However, Keras' version has already undergone a conversion process where the text in the dataset is represented by integer tokens. This conversion is a vital step in natural language processing, which will also be demonstrated in this tutorial. Therefore, we will download the original text data instead of using Keras' preprocessed version.


Install the required packages as mentioned below

In [1]:

import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split as split

from keras.layers import LSTM, Input, Dense
from keras.models import Model


import os
import numpy as np
from keras.datasets import imdb

For optimization it is important to define an obective function which can define the criteria of performance.In this case criteria of performance is the accuracy of classification using a certain tnetwork architecture. The network architecture with best accuracy of classification will be selected.


In [2]:

def objective(space):
    
    num_of_units=int(space[0])
    learning_rate1=space[1]
    print(space[2])
    epocs1=int(space[2])
    
    
    
    """defining the data parameters"""
    
    
    """num_words sets the maximum number of unique words to be included in the vocabulary, which is used to map words to integers. In this case, the top 88,584 most frequently occurring words in the dataset will be selected."""
    num_words = 88584
    """This line loads the IMDB movie review dataset and splits it into training and test sets. The imdb.load_data() function returns a tuple of two lists, where the first list contains the reviews as a sequence of word indices and the second list contains the corresponding sentiment labels (0 for negative and 1 for positive). By setting num_words to num_words, only the top num_words most frequently occurring words are retained in the dataset."""
    (train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words = num_words)
    
    
    
    
    
    """
    This is prepocessing to trim the length of the sentenses to suible sizes
    
    The `pad_sequences()` function is used to ensure that all the sequences (i.e., sentences or input data) have the same length. This is important for training neural networks since they require a fixed input size. 
    
    the `pad_sequences()` function is used to pad the sequences in the `train_data` and `test_data` datasets to a maximum length of `max_length`. This ensures that all the sequences in these datasets have the same length, which is necessary to feed them into the neural network model for training and testing.
    
    The `max_length` parameter specifies the maximum length of the sequences. Any sequences that are shorter than this length are padded with zeros at the end, and any sequences that are longer than this length are truncated.
    
    After the padding has been applied, the modified sequences are assigned back to the original variables `train_data` and `test_data`, respectively. The modified sequences can then be fed into the neural network for training and testing.
    
    """
    
    
    
    
    max_length= 250
    sample_length = 64
    import keras
    import tensorflow as tf
    from keras.utils import pad_sequences
    
    train_data=pad_sequences(train_data,max_length)
    test_data=pad_sequences(test_data,max_length)
    
    
    
    
    """
    model architecture and optimization
    
    
    This code defines a Sequential model in TensorFlow Keras that can be used for binary classification tasks.
    
    Here's a breakdown of what each line does:
    
    model = tf.keras.Sequential([]): This creates a new instance of the Sequential class in TensorFlow Keras, which allows us to stack layers on top of each other to create a neural network.
    
    tf.keras.layers.Embedding(num_words, num_of_units): This is the first layer in the model. It is an Embedding layer, which takes an integer input (representing the index of a word in a vocabulary) and converts it to a dense vector of fixed size (in this case represented by num_of_units). The num_words argument specifies the size of the vocabulary (i.e., the maximum integer index that can be used as input).
    
    tf.keras.layers.LSTM(num_of_units): This is the second layer in the model. It is a LSTM layer, which stands for Long Short-Term Memory. LSTM layers are commonly used for processing sequences of data (e.g., text or time-series data). This layer has num_of_units units, which determines the size of the output from this layer.
    
    tf.keras.layers.Dense(1, activation='sigmoid'): This is the final layer in the model. It is a Dense layer with a single unit, which makes it suitable for binary classification tasks. The sigmoid activation function is used to ensure that the output of the layer is a probability between 0 and 1.
    
    In summary, this model takes integer inputs (representing words in a vocabulary) and converts them to dense vectors using an Embedding layer. The resulting vectors are then processed by an LSTM layer to capture the sequence information, and finally passed through a Dense layer with a sigmoid activation function to produce a binary classification output.
    """
    
    model=tf.keras.Sequential([
        tf.keras.layers.Embedding(num_words,num_of_units),
        tf.keras.layers.LSTM(num_of_units),
        tf.keras.layers.Dense(1,activation='sigmoid')
    ])
    
    
    from keras import optimizers
    sgd = optimizers.RMSprop(learning_rate=learning_rate1)
    model.compile(loss="binary_crossentropy",optimizer=sgd,metrics=['accuracy'])
    history=model.fit(train_data,train_labels,epochs=epocs1,validation_split=0.2)
    
    
    
    result=model.evaluate(test_data,test_labels)
    print(result)
    
    return result[1]
    



The next step is to define the limitof number of layers we want to iterate over and creating lists for storing the accuracy of each of these models

In [4]:
maximum_number_of_layers = 200
minimum_number_of_layers = 10

stage1ac = []  # List to store stage 1 accuracies with respect to change in number of layers
Sto_number_of_layers = []  # List to store the number of layers

Iterate over the range of number_of_layers with a step size of 10. Select the optimal number_of_layers based on the maximum value of accuracy.

In [None]:

number_of_epocs = 1
learning_rate = 0.001

# Iterate over the range of number_of_layers with a step size of 10
for number_of_layers in range(minimum_number_of_layers, maximum_number_of_layers, 10):
    space = np.array([int(number_of_layers), learning_rate, int(number_of_epocs)])
    stage1ac.append(objective(space))  # Call the objective function and append the result to stage1ac
    Sto_number_of_layers.append(number_of_layers)  # Append the number_of_layers to Sto_number_of_layers

Opt_number_of_layers = Sto_number_of_layers[np.argmax(np.array(stage1ac))]  # Select the optimal number_of_layers based on the maximum value in stage1ac


1.0
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[0.3689042627811432, 0.8551599979400635]
1.0
[0.3583487868309021, 0.8553199768066406]
1.0
[0.3465568423271179, 0.8598799705505371]
1.0
[0.3035806119441986, 0.8784400224685669]
1.0
[0.34576472640037537, 0.8592399954795837]
1.0
[0.3572029769420624, 0.8512399792671204]
1.0
[0.36699163913726807, 0.8517600297927856]
1.0
[0.3437735438346863, 0.8586000204086304]
1.0
[0.3541824519634247, 0.8617600202560425]
1.0
[0.30295616388320923, 0.8765599727630615]
1.0
[0.343137264251709, 0.8612800240516663]
1.0
[0.404620498418808, 0.8271600008010864]
1.0
159/782 [=====>........................] - ETA: 1:03 - loss: 0.3396 - accuracy: 0.8597

Once the number of layers has been decided the next step is to decide the learning rate by doing itereatious within defined limits.

In [None]:

stage2ac = []  # List to store stage 2 accuracies
Sto_number_of_layers = []  # Clear the previous values of Sto_number_of_layers

number_of_epocs = 30
maximum_learning_rate = 0.02
learning_rate = maximum_learning_rate
com = 0

learning_rate_sto = []  # List to store the learning rate values

import math

maximum_number_of_iterations = 100
iterations = int(math.exp(math.log(maximum_number_of_iterations) / 2)) + 1

# Iterate over the range of iterations
for itr in range(iterations):
    learning_rate_sto.append(learning_rate)  # Append the learning_rate to learning_rate_sto
    
    space = np.array([int(Opt_number_of_layers), learning_rate, int(number_of_epocs)])
    stage2ac.append(objective(space))  # Call the objective function and append the result to stage2ac
    
    com = stage2ac[itr - 1]
    
    # Compare the current accuracy with the previous accuracy
    if stage2ac[itr] > com:
        learning_rate = learning_rate / 2
    else:
        learning_rate = (maximum_learning_rate + learning_rate) / 2

Opt_learning_rate = learning_rate_sto[np.argmax(np.array(stage2ac))]  # Select the optimal learning_rate based on the maximum value in stage2ac

The last step is to show the final accuracy of the model

In [None]:
print("The best model as per defined limits of hyperparameters is",np.max(np.array(stage2ac)) )