## What is LSTM?
- LSTM stands for Long Short-Term Memory.

- It’s a type of Recurrent Neural Network (RNN) designed to remember information for long sequences.

- Unlike a regular RNN, LSTMs can remember dependencies across longer time periods without forgetting earlier inputs.

- LSTM uses gates to control the flow of information, deciding what to remember, forget, and output at each step.

## Why Use LSTM?
- Standard RNNs struggle with vanishing gradients, meaning they forget information from earlier steps in long sequences.

- LSTM solves this with a special memory structure called cell state and gates that control what to keep, forget, or output.

- Can be used for:

    - Text generation

    - Language modeling

    - Speech recognition

    - Time series forecasting

## Key Concepts
1. Cell State:

    - Think of it as a “conveyor belt” that carries important information through the sequence.

    - Information can flow unchanged, so the network doesn’t forget everything at each step.

2. Hidden State:

    - Similar to RNN’s hidden state, it stores the output of the current step and passes it forward.

3. Gates:
LSTM has three main gates to control the flow of information:

    - Forget Gate: Decides what past information to throw away.

    - Input Gate: Decides what new information to add to the cell state.

    - Output Gate: Decides what information to output at the current step.


## How LSTM Works
- At each step, input data is combined with the previous hidden state.

- The forget gate removes irrelevant information from the cell state.

- The input gate adds new, relevant information to the cell state.

- The cell state now holds the updated memory.

- The output gate decides what information from the cell state to pass on as output.

- Repeat for each step in the sequence.

### Difference between RNN and LSTM

| Feature                | RNN        | LSTM                       |
| ---------------------- | ---------- | -------------------------- |
| Memory                 | Short-term | Long-term (via cell state) |
| Gates                  | None       | Input, Forget, Output      |
| Handles long sequences | Poor       | Excellent                  |
| Vanishing gradient     | Yes        | No (better stability)      |


In [1]:
# Step 1: Import libraries
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense


In [2]:
# Step 2: Example corpus
corpus = [
    "hello how are you",
    "hello how is your day",
    "hello how are your friends",
    "hello what are you doing"
]


In [3]:
# Step 3: Tokenize the text
tokenizer = Tokenizer()
tokenizer.fit_on_texts(corpus)
total_words = len(tokenizer.word_index) + 1  # +1 because indexing starts from 1
print("Total unique words:", total_words)


Total unique words: 11


In [4]:
# Step 4: Create input sequences
input_sequences = []

for line in corpus:
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i+1]  # Take first i+1 words
        input_sequences.append(n_gram_sequence)

# Check an example
print("Example sequence:", input_sequences[0])


Example sequence: [1, 2]


In [5]:
# Step 5: Pad sequences to same length
max_seq_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_seq_len, padding='pre'))

# Split inputs (X) and labels (y)
X = input_sequences[:, :-1]
y = input_sequences[:, -1]

print("Example X[0]:", X[0], "-> y[0]:", y[0])


Example X[0]: [0 0 0 1] -> y[0]: 2


All sequences must be the same length.

padding='pre' adds zeros at the beginning of shorter sequences.

X contains input words, y contains the next word to predict.

In [6]:
# Step 6: Build LSTM Model
model = Sequential()
model.add(Embedding(input_dim=total_words, output_dim=10, input_length=max_seq_len-1))
model.add(LSTM(50, activation='tanh'))  # LSTM layer
model.add(Dense(total_words, activation='softmax'))  # Predict next word

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])




Embedding layer: Converts integers to dense vectors (meaningful representations).

LSTM layer: Learns patterns in sequences.

Dense + Softmax: Predicts probability for each word.

Loss: sparse_categorical_crossentropy because y is integer-coded.

Optimizer: Adam, for faster convergence.

In [7]:
# Step 7: Train the model
history = model.fit(X, y, epochs=200, verbose=0)
print("Training complete!")


Training complete!


In [8]:
# Step 8: Predict next word
def predict_next_word_lstm(model, tokenizer, text_seq, max_seq_len):
    token_list = tokenizer.texts_to_sequences([text_seq])[0]
    token_list = pad_sequences([token_list], maxlen=max_seq_len-1, padding='pre')
    predicted = model.predict(token_list, verbose=0)
    predicted_word_index = np.argmax(predicted)
    
    for word, index in tokenizer.word_index.items():
        if index == predicted_word_index:
            return word

# Test
seed_text = "hello how is"
next_word = predict_next_word_lstm(model, tokenizer, seed_text, max_seq_len)
print(f"Input: '{seed_text}' → Predicted next word: '{next_word}'")


Input: 'hello how is' → Predicted next word: 'your'


In [10]:
model.summary()