# LSTM for Text and Sequence Generation
This notebook explains the mathematical foundation and code implementation of LSTM (Long Short-Term Memory) models for both text generation and sequence prediction tasks using Python and TensorFlow/Keras.

## 1. What is LSTM?
LSTM (Long Short-Term Memory) is a type of Recurrent Neural Network (RNN) that is capable of learning long-term dependencies. It solves the vanishing gradient problem in traditional RNNs using gates that control the flow of information.

### LSTM Cell Structure
**Mathematics:**

- Forget gate: $f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$
- Input gate: $i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$
- Candidate memory: $\tilde{C}_t = \tanh(W_C \cdot [h_{t-1}, x_t] + b_C)$
- Output gate: $o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$
- Final memory update: $C_t = f_t * C_{t-1} + i_t * \tilde{C}_t$
- Hidden state: $h_t = o_t * \tanh(C_t)$

##  LSTM model to generate poem-like text. 

* **Temperature Sampling:** Controls randomness when generating each character. A lower temperature (e.g., 0.5) makes predictions more conservative, while a higher temperature (e.g., 1.5) introduces more randomness.
* **Early Stopping:** A callback that stops training if the validation loss stops improving, preventing overfitting.
* **Seed Control:** Setting random seeds for reproducibility, ensuring you get the same model initialization and training behavior each time.


### Explanation of Key Concepts

1. **Temperature Sampling:**

   * **What it is:** Adjusts the probability distribution used to pick the next character by applying a "temperature" parameter.
   * **Example:**

     * With `temperature=0.5`, the model tends to pick high-probability characters (more deterministic).
     * With `temperature=1.5`, choices become more random, which might lead to more creative or unexpected outputs.

2. **Early Stopping:**

   * **What it is:** A strategy to halt model training when further improvement is unlikely, based on monitoring a metric (e.g., loss).
   * **Example:**

     * If the training loss does not decrease for 5 consecutive epochs (`patience=5`), training stops to avoid overfitting.

3. **Seed Control:**

   * **What it is:** Setting fixed random seeds in Python, NumPy, and TensorFlow to ensure reproducibility.
   * **Example:**

     * By setting `seed_value=42` for all relevant libraries, you ensure that the randomness (e.g., weight initialization, training shuffles, sampling) remains the same across different runs.

---




In [15]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.callbacks import EarlyStopping
import random
import os


# Set seeds for reproducibility
seed_value = 42
os.environ['PYTHONHASHSEED'] = str(seed_value)
tf.random.set_seed(seed_value)
np.random.seed(seed_value)
random.seed(seed_value)


text = (
    "Two roads diverged in a yellow wood,\n"
    "And sorry I could not travel both\n"
    "And be one traveler, long I stood\n"
    "And looked down one as far as I could\n"
    "To where it bent in the undergrowth;"
)

# Create a sorted list of unique characters
chars = sorted(list(set(text)))
print("Unique characters:", chars)

Unique characters: ['\n', ' ', ',', ';', 'A', 'I', 'T', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'k', 'l', 'n', 'o', 'r', 's', 't', 'u', 'v', 'w', 'y']


In [19]:
# Create mappings from characters to indices and vice versa
char2idx = {c: i for i, c in enumerate(chars)}
idx2char = {i: c for i, c in enumerate(chars)}


In [22]:
# Set sequence length and create input-output sequences
seq_length = 50
sequences = []
next_chars = []
for i in range(0, len(text) - seq_length):
    sequences.append(text[i: i + seq_length])
    next_chars.append(text[i + seq_length])

print("Number of sequences:", len(sequences))

Number of sequences: 129


In [23]:
# Vectorize the sequences (one-hot encoding)
X = np.zeros((len(sequences), seq_length, len(chars)), dtype=np.bool_)
y = np.zeros((len(sequences), len(chars)), dtype=np.bool_)

for i, seq in enumerate(sequences):
    for t, char in enumerate(seq):
        X[i, t, char2idx[char]] = 1
    y[i, char2idx[next_chars[i]]] = 1


In [24]:
# We define a simple LSTM model for character-level text prediction.


model = Sequential([
    LSTM(128, input_shape=(seq_length, len(chars))),
    Dense(len(chars), activation='softmax')
])

model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()



In [25]:
# **Early Stopping** is a strategy that monitors a metric (e.g., validation loss) during training,
# and stops training if there is no further improvement. This can save time and prevent overfitting.
# In our case, we monitor the training loss (or you could split some data as a validation set)
# and stop if it does not improve for a few epochs.

early_stopping = EarlyStopping(monitor='loss', patience=5, verbose=1)

In [26]:

## 5. Training the Model
# We train the model on our prepared dataset. For real applications, use more epochs and a larger corpus.


history = model.fit(X, y, epochs=50, batch_size=16, callbacks=[early_stopping])

def sample(preds, temperature=1.0):
    """
    Sample an index from a probability array reweighted by temperature.
    """
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds + 1e-8) / temperature  # add epsilon to avoid log(0)
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


def generate_text(seed, length=200, temperature=1.0):
    generated = seed
    print("Seed:", seed)
    for i in range(length):
        # Prepare the input sequence (one-hot encoding)
        x_pred = np.zeros((1, seq_length, len(chars)))
        for t, char in enumerate(seed):
            x_pred[0, t, char2idx[char]] = 1.
        
        # Predict the next character probabilities
        preds = model.predict(x_pred, verbose=0)[0]
        next_index = sample(preds, temperature)
        next_char = idx2char[next_index]
        
        # Append the next character
        generated += next_char
        seed = seed[1:] + next_char
    return generated


seed_text = text[:seq_length]
print("Generated poem with temperature=0.5:\n")
print(generate_text(seed_text, length=200, temperature=0.5))

print("\nGenerated poem with temperature=1.0:\n")
print(generate_text(seed_text, length=200, temperature=1.0))

print("\nGenerated poem with temperature=1.5:\n")
print(generate_text(seed_text, length=200, temperature=1.5))


Epoch 1/50
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 21ms/step - loss: 3.2847
Epoch 2/50
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - loss: 3.1725
Epoch 3/50
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step - loss: 2.8910
Epoch 4/50
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - loss: 2.8720
Epoch 5/50
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - loss: 2.8336
Epoch 6/50
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - loss: 2.8184
Epoch 7/50
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - loss: 2.8081
Epoch 8/50
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - loss: 2.7935
Epoch 9/50
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - loss: 2.7778
Epoch 10/50
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step - loss: 2.7612
Epoch 11/50
[1m9/9

### Why not meaningful Output: 
Use a large corpus of poems (e.g., thousands of Shakespearean sonnets, modern poems, etc.) for better learning.
### Lack of Meaning Understanding in LSTM
- LSTMs don't “understand” meaning—they only learn statistical patterns of sequences.

- For actual semantic understanding or theme, you'd need Transformers (GPT, BERT) trained on large corpora.

| Improvement               | What to Do                                                          |
| ------------------------- | ------------------------------------------------------------------- |
| **More Data**             | Use a large text corpus (\~1MB or more) of poems.                   |
| **Word-Level Modeling**   | Use `Tokenizer` + `Embedding` layer for word-level LSTM generation. |
| **Train Longer**          | Use at least 100–200 epochs with good hardware.                     |
| **Use GRU/BiLSTM**        | Try stacking layers or using bidirectional LSTMs.                   |
| **Use Pretrained Models** | Fine-tune GPT-2 or LLaMA models on your poem dataset.               |


## Fine-tuning a small GPT model on poems will result in meaningful poetic generation much faster than an LSTM.