<a href="https://colab.research.google.com/github/Jhansipothabattula/Machine_Learning/blob/main/Day63.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Long Short-Term Memory(LSTM) Networks

# Introduction to LSTMs and How They Address RNN Limitations

* **What Are LSTMs?**
    * Type of Recurrent Neural Network (RNN) specifically designed to handle long-term dependencies
    * LSTMs mitigate the vanishing gradient problem by using specialized gates to manage the flow of information

* **Key Features of LSTMs**
    * **Memory Cells**
        * Maintain a long-term memory state across sequences
    * **Gated Mechanism**
        * Regulates how much information to keep, update, or forget at each time step
    * **Effective for Long Sequences**
        * Handles sequential data with dependencies across many time steps.


# Advantages Over Vanilla RNNs

* **Retains long-term dependencies**
* **Prevents gradient-related issues during training**
* **Outperforms RNNs on tasks like language modeling, speech recognition, and time-series forecasting**

# LSTM Cell Structure: Input, Forget, and Output Gates

* **Forget Gate**
    * Decides what information to discard from the cell state
    * The formula is:
$f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$
* $W_f$: Weight matrix for the forget gate
* $f_t$: Forget gate output
* $x_t$: Input
 * $h_{t-1}$: Previous hidden state

* **Input Gate**
    * Decides what new information to add to the cell state
    * The formula is:
$i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$
* $\sigma$: Sigmoid activation function
 * $W_i$: Weight matrix for the input gate
* $h_{t-1}$: Hidden state from the previous time step

* **Cell State Update**
    * Combines the forget gate and input gate results to update the cell state.
    * The formula is:
$C_t = f_t \cdot C_{t-1} + i_t \cdot \tilde{C}_t$ (Note: $\tilde{C}_t$ is the candidate state, often calculated as $\tanh(W_C \cdot [h_{t-1}, x_t] + b_C
$)

* **Output Gate**
    * Decides what information to output at each time step.
    * The formula is:
$o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$
        * $W_o$: Weight matrix for the output gate
        * $b_o$: Bias for the output gate

# Applications of LSTMs

* **Natural Language Processing (NLP)**
    * Sentiment analysis, machine translation, text generation
* **Time-Series Forecasting**
    * Predicting stock prices, weather patterns, or sales trends
* **Speech Recognition**
    * Converting spoken words into text
* **Anomaly Detection**
    * Identifying unusual patterns in sequential data



 **Objective:**
- Build an LSTM model for sentiment analysis on the IMDB Movie Reviews Dataset and compare its performance with a basic RNN model

In [1]:
# Code for loading and pre-processing the IMDB dataset
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, LSTM, Dense

vocab_size = 10000
max_len = 200

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=vocab_size)

X_train = pad_sequences(X_train, maxlen=max_len, padding='post')
X_test = pad_sequences(X_test, maxlen=max_len, padding='post')

print(f"Training Data Shape: {X_train.shape}")
print(f"Test Data Shape: {X_test.shape}")

# Code for defining and training the SimpleRNN model
rnn_model = Sequential([
    Embedding(input_dim=vocab_size, output_dim=128),
    SimpleRNN(128, activation='tanh', return_sequences=False),
    Dense(1, activation='sigmoid')
])

rnn_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
rnn_model.summary()

rnn_history = rnn_model.fit(X_train, y_train, epochs=5, batch_size=32, validation_split=0.2)

rnn_loss, rnn_accuracy = rnn_model.evaluate(X_test, y_test)
print(f"RNN Test Loss: {rnn_loss:.4f}, Test Accuracy: {rnn_accuracy:.4f}")

# Code for defining and training the SimpleLSTM model
lstm_model = Sequential([
    Embedding(input_dim=vocab_size, output_dim=128),
    SimpleRNN(128, activation='tanh', return_sequences=False),
    Dense(1, activation='sigmoid')
])

lstm_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
lstm_model.summary()

lstm_history = lstm_model.fit(X_train, y_train, epochs=5, batch_size=32, validation_split=0.2)

lstm_loss, lstm_accuracy = lstm_model.evaluate(X_test, y_test)
print(f"LSTM Test Loss: {lstm_loss:.4f}, Test Accuracy: {lstm_accuracy:.4f}")




Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Training Data Shape: (25000, 200)
Test Data Shape: (25000, 200)


Epoch 1/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m61s[0m 94ms/step - accuracy: 0.5018 - loss: 0.7012 - val_accuracy: 0.5032 - val_loss: 0.6937
Epoch 2/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m56s[0m 90ms/step - accuracy: 0.5498 - loss: 0.6800 - val_accuracy: 0.5578 - val_loss: 0.6692
Epoch 3/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m84s[0m 94ms/step - accuracy: 0.6051 - loss: 0.6374 - val_accuracy: 0.5644 - val_loss: 0.6635
Epoch 4/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 92ms/step - accuracy: 0.6349 - loss: 0.6003 - val_accuracy: 0.5780 - val_loss: 0.6586
Epoch 5/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m62s[0m 99ms/step - accuracy: 0.6407 - loss: 0.5792 - val_accuracy: 0.5876 - val_loss: 0.6691
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 27ms/step - accuracy: 0.5879 - loss: 0.6609
RNN Test Loss: 0.6680, Test Accuracy: 0.5818


Epoch 1/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m71s[0m 110ms/step - accuracy: 0.5161 - loss: 0.6949 - val_accuracy: 0.5446 - val_loss: 0.6797
Epoch 2/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m67s[0m 108ms/step - accuracy: 0.5764 - loss: 0.6641 - val_accuracy: 0.5428 - val_loss: 0.6754
Epoch 3/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m79s[0m 103ms/step - accuracy: 0.5981 - loss: 0.6348 - val_accuracy: 0.5370 - val_loss: 0.6822
Epoch 4/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 94ms/step - accuracy: 0.6120 - loss: 0.6067 - val_accuracy: 0.5448 - val_loss: 0.7001
Epoch 5/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m89s[0m 105ms/step - accuracy: 0.6305 - loss: 0.5771 - val_accuracy: 0.5722 - val_loss: 0.7002
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 27ms/step - accuracy: 0.5719 - loss: 0.6914
LSTM Test Loss: 0.6975, Test Accuracy: 0.5654
