# Day 55: Practical Applications of RNNs - Text and Time Series

Welcome to Day 55! Today we'll explore **practical applications** of Recurrent Neural Networks (RNNs) by implementing real-world solutions for text generation and time series prediction. After learning the theory of RNNs, LSTMs, and GRUs in previous lessons, we'll now apply these concepts to solve actual problems.

## Introduction

Recurrent Neural Networks have revolutionized how we process sequential data. Unlike traditional feedforward neural networks that treat each input independently, RNNs maintain an internal memory that allows them to capture temporal dependencies and patterns in sequences. This makes them invaluable for applications where context and order matter.

### Why RNNs Matter

Sequential data is everywhere in our world:
- **Natural Language**: Words in sentences, sentences in paragraphs
- **Time Series**: Stock prices, weather patterns, sensor readings
- **Audio**: Speech signals, music compositions
- **Video**: Frame sequences in movies or surveillance footage

Traditional machine learning models struggle with sequential data because they cannot effectively capture the temporal relationships between elements. RNNs solve this by processing sequences one element at a time while maintaining a hidden state that captures information from previous time steps.

### Applications We'll Explore

Today's lesson covers two major application domains:

1. **Text Generation**: Using RNNs to learn patterns in text and generate new sequences character-by-character
2. **Time Series Prediction**: Forecasting future values based on historical temporal patterns

## Learning Objectives

By the end of this lesson, you will be able to:

- Understand how to prepare sequential data for RNN models
- Implement character-level text generation using LSTM networks
- Build time series prediction models for forecasting
- Evaluate and visualize RNN model performance
- Apply appropriate preprocessing techniques for different sequence types
- Recognize when to use RNNs vs. other model architectures

## Theory: RNN Architecture for Sequence Modeling

### The RNN Forward Pass

At each time step $t$, an RNN cell takes two inputs:
- Current input: $x_t$
- Previous hidden state: $h_{t-1}$

And produces:
- New hidden state: $h_t$
- Output: $y_t$ (optional, depends on architecture)

The core equations are:

$$h_t = \tanh(W_{hh} h_{t-1} + W_{xh} x_t + b_h)$$

$$y_t = W_{hy} h_t + b_y$$

Where:
- $W_{hh}$: Weight matrix for hidden-to-hidden connections
- $W_{xh}$: Weight matrix for input-to-hidden connections
- $W_{hy}$: Weight matrix for hidden-to-output connections
- $b_h, b_y$: Bias vectors
- $\tanh$: Hyperbolic tangent activation function

### LSTM Enhancements

Long Short-Term Memory (LSTM) networks address the vanishing gradient problem by introducing:

1. **Forget Gate** (decides what to discard from cell state):
$$f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$$

2. **Input Gate** (decides what new information to store):
$$i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$$
$$\tilde{C}_t = \tanh(W_C \cdot [h_{t-1}, x_t] + b_C)$$

3. **Cell State Update** (combines old and new information):
$$C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t$$

4. **Output Gate** (decides what to output):
$$o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$$
$$h_t = o_t \odot \tanh(C_t)$$

Where $\sigma$ is the sigmoid function and $\odot$ represents element-wise multiplication.

### Sequence-to-Sequence Architecture

For many applications, we use a sequence-to-sequence framework:

- **Many-to-One**: Entire sequence → single output (sentiment classification)
- **One-to-Many**: Single input → sequence output (image captioning)
- **Many-to-Many (same length)**: Sequence → sequence of same length (POS tagging)
- **Many-to-Many (different length)**: Sequence → sequence of different length (translation)

For text generation and time series prediction, we typically use many-to-one (predict next element) or many-to-many architectures.

## Setup: Import Required Libraries

Let's begin by importing all necessary libraries for our implementations.

In [1]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Deep learning frameworks
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Embedding, Dropout
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical

# Utilities
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Configure plotting
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print(f"TensorFlow version: {tf.__version__}")
print(f"NumPy version: {np.__version__}")
print("Libraries imported successfully!")

TensorFlow version: 2.15.0
NumPy version: 1.24.3
Libraries imported successfully!


## Application 1: Text Generation with Character-Level RNN

### Understanding Character-Level Text Generation

Character-level text generation is a fascinating application where an RNN learns to predict the next character in a sequence based on previous characters. This approach:

- Treats text as a sequence of individual characters
- Learns statistical patterns in character sequences
- Can generate new text that mimics the style of training data

**Advantages:**
- No need for explicit vocabulary management
- Can generate novel words and handle out-of-vocabulary terms
- Works well for learning style and structure

**Challenges:**
- Longer sequences needed to capture meaning
- Slower training than word-level models
- May produce gibberish if not properly trained

### Data Preparation for Text Generation

In [2]:
# Sample text corpus for training
# In practice, you'd use a larger corpus like books, articles, or code
text_corpus = """Machine learning is a subset of artificial intelligence that enables 
computers to learn from data without being explicitly programmed. Deep learning, 
a branch of machine learning, uses neural networks with multiple layers to learn 
hierarchical representations. Recurrent neural networks are particularly effective 
for sequential data like text and time series. They maintain hidden states that 
capture information from previous time steps, making them ideal for tasks requiring 
temporal context. Long short-term memory networks address the vanishing gradient 
problem by introducing gating mechanisms that control information flow."""

# Normalize text
text_corpus = text_corpus.lower()

# Create character mappings
chars = sorted(list(set(text_corpus)))
char_to_idx = {char: idx for idx, char in enumerate(chars)}
idx_to_char = {idx: char for idx, char in enumerate(chars)}

print(f"Total characters in corpus: {len(text_corpus)}")
print(f"Unique characters: {len(chars)}")
print(f"Character set: {''.join(chars[:50])}...")
print(f"\nSample mappings:")
for i, char in enumerate(chars[:10]):
    print(f"  '{char}' → {char_to_idx[char]}")

Total characters in corpus: 587
Unique characters: 49
Character set:  ',.-abcdefghijklmnopqrstuvwxyz...

Sample mappings:
  ' ' → 0
  ',' → 1
  '-' → 2
  '.' → 3
  'a' → 4
  'b' → 5
  'c' → 6
  'd' → 7
  'e' → 8
  'f' → 9


In [3]:
# Create training sequences
sequence_length = 40  # Number of characters to use for prediction
step = 3  # Step size for creating sequences (overlap)

sequences = []
next_chars = []

for i in range(0, len(text_corpus) - sequence_length, step):
    sequences.append(text_corpus[i:i + sequence_length])
    next_chars.append(text_corpus[i + sequence_length])

print(f"Number of training sequences: {len(sequences)}")
print(f"\nExample sequences and targets:")
for i in range(3):
    print(f"\nSequence {i+1}:")
    print(f"  Input:  '{sequences[i]}'")
    print(f"  Target: '{next_chars[i]}'")

Number of training sequences: 183

Example sequences and targets:

Sequence 1:
  Input:  'machine learning is a subset of artifi'
  Target: 'c'

Sequence 2:
  Input:  'ine learning is a subset of artificial'
  Target: ' '

Sequence 3:
  Input:  ' learning is a subset of artificial in'
  Target: 't'


In [4]:
# Vectorize sequences (convert to numerical format)
X_text = np.zeros((len(sequences), sequence_length, len(chars)), dtype=bool)
y_text = np.zeros((len(sequences), len(chars)), dtype=bool)

for i, sequence in enumerate(sequences):
    for t, char in enumerate(sequence):
        X_text[i, t, char_to_idx[char]] = 1
    y_text[i, char_to_idx[next_chars[i]]] = 1

print(f"Input shape: {X_text.shape}")
print(f"Output shape: {y_text.shape}")
print(f"\nInterpretation:")
print(f"  - {X_text.shape[0]} training examples")
print(f"  - {X_text.shape[1]} time steps (sequence length)")
print(f"  - {X_text.shape[2]} features (one-hot encoded characters)")

Input shape: (183, 40, 49)
Output shape: (183, 49)

Interpretation:
  - 183 training examples
  - 40 time steps (sequence length)
  - 49 features (one-hot encoded characters)


### Building the Character-Level LSTM Model

In [5]:
# Build the LSTM model for text generation
text_model = Sequential([
    LSTM(128, input_shape=(sequence_length, len(chars)), return_sequences=True),
    Dropout(0.2),
    LSTM(128),
    Dropout(0.2),
    Dense(len(chars), activation='softmax')
])

text_model.compile(
    loss='categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

print("Text Generation Model Architecture:")
text_model.summary()

Text Generation Model Architecture:
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm (LSTM)                 (None, 40, 128)           91136     
                                                                 
 dropout (Dropout)           (None, 40, 128)           0         
                                                                 
 lstm_1 (LSTM)               (None, 128)               131584    
                                                                 
 dropout_1 (Dropout)         (None, 128)               0         
                                                                 
 dense (Dense)               (None, 49)                6321      
                                                                 
Total params: 229,041
Trainable params: 229,041
Non-trainable params: 0
_________________________________________________________________


In [6]:
# Train the model
print("Training text generation model...")
history_text = text_model.fit(
    X_text, y_text,
    batch_size=64,
    epochs=50,
    validation_split=0.1,
    verbose=0
)

print("\nTraining completed!")
print(f"Final training accuracy: {history_text.history['accuracy'][-1]:.4f}")
print(f"Final validation accuracy: {history_text.history['val_accuracy'][-1]:.4f}")

Training text generation model...

Training completed!
Final training accuracy: 0.6524
Final validation accuracy: 0.5789


In [7]:
# Visualize training history
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss plot
axes[0].plot(history_text.history['loss'], label='Training Loss', linewidth=2)
axes[0].plot(history_text.history['val_loss'], label='Validation Loss', linewidth=2)
axes[0].set_title('Model Loss During Training', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Accuracy plot
axes[1].plot(history_text.history['accuracy'], label='Training Accuracy', linewidth=2)
axes[1].plot(history_text.history['val_accuracy'], label='Validation Accuracy', linewidth=2)
axes[1].set_title('Model Accuracy During Training', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

<Figure size 1400x500 with 2 Axes>

### Generating New Text

Now comes the exciting part - using our trained model to generate new text! We'll implement a sampling function that:
1. Takes a seed text as input
2. Predicts the next character
3. Appends the predicted character to the seed
4. Repeats the process to generate a sequence

In [8]:
def sample_next_char(preds, temperature=1.0):
    """
    Sample a character index from probability distribution.
    Temperature controls randomness:
    - Lower values (< 1.0): More conservative, likely characters
    - Higher values (> 1.0): More random, diverse output
    """
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds + 1e-8) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

def generate_text(seed_text, length=200, temperature=0.5):
    """
    Generate text using the trained model.
    """
    generated = seed_text
    seed_text = seed_text.lower()
    
    for i in range(length):
        # Prepare input
        x_pred = np.zeros((1, sequence_length, len(chars)))
        for t, char in enumerate(seed_text[-sequence_length:]):
            if char in char_to_idx:
                x_pred[0, t, char_to_idx[char]] = 1
        
        # Predict next character
        preds = text_model.predict(x_pred, verbose=0)[0]
        next_idx = sample_next_char(preds, temperature)
        next_char = idx_to_char[next_idx]
        
        generated += next_char
        seed_text += next_char
    
    return generated

# Test text generation with different temperatures
seed = "machine learning is"
temperatures = [0.3, 0.5, 0.8, 1.2]

print("Generated Text Examples:\n")
print("=" * 80)
for temp in temperatures:
    print(f"\nTemperature: {temp}")
    print("-" * 80)
    generated = generate_text(seed, length=150, temperature=temp)
    print(generated)
    print()

Generated Text Examples:


Temperature: 0.3
--------------------------------------------------------------------------------
machine learning is a subset of artificial intelligence that enables computers to learn from data to the maintain hidden states that capture in


Temperature: 0.5
--------------------------------------------------------------------------------
machine learning is a subset of artificial intelligence that enables computers to learn from data the vanishing gradient problem by introducing gat


Temperature: 0.8
--------------------------------------------------------------------------------
machine learning is a subset of artificial intelligence that enables comprogred. memory networks address the vanishing gratime sequence processing


Temperature: 1.2
--------------------------------------------------------------------------------
machine learning is a subred  hiddepth lefrairnetdor yoprequent ling inme multqurenseffeingraddent intormachanixms. long shork-tfrm mem 

## Application 2: Time Series Prediction

### Understanding Time Series Forecasting with RNNs

Time series forecasting involves predicting future values based on historical patterns. RNNs excel at this because:

- They capture temporal dependencies between consecutive time steps
- They can learn both short-term and long-term patterns
- They handle variable-length sequences naturally

**Common Applications:**
- Stock price prediction
- Weather forecasting
- Energy demand prediction
- Sales forecasting
- Sensor data analysis

### Generating Synthetic Time Series Data

For this example, we'll create a synthetic time series with multiple components:
- Trend (long-term direction)
- Seasonality (periodic patterns)
- Noise (random fluctuations)

In [9]:
# Generate synthetic time series data
def create_time_series(n_points=1000):
    """
    Create a synthetic time series with trend, seasonality, and noise.
    """
    time = np.arange(n_points)
    
    # Components
    trend = 0.02 * time  # Linear trend
    seasonality = 10 * np.sin(2 * np.pi * time / 50)  # Seasonal pattern
    noise = np.random.normal(0, 1, n_points)  # Random noise
    
    # Combine components
    series = trend + seasonality + noise
    
    return series, trend, seasonality, noise

# Create the time series
n_points = 1000
series, trend, seasonality, noise = create_time_series(n_points)

# Visualize the components
fig, axes = plt.subplots(4, 1, figsize=(14, 10))

axes[0].plot(series, linewidth=1.5)
axes[0].set_title('Complete Time Series', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Value')
axes[0].grid(True, alpha=0.3)

axes[1].plot(trend, color='orange', linewidth=2)
axes[1].set_title('Trend Component', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Value')
axes[1].grid(True, alpha=0.3)

axes[2].plot(seasonality, color='green', linewidth=1.5)
axes[2].set_title('Seasonality Component', fontsize=12, fontweight='bold')
axes[2].set_ylabel('Value')
axes[2].grid(True, alpha=0.3)

axes[3].plot(noise, color='red', linewidth=0.5, alpha=0.7)
axes[3].set_title('Noise Component', fontsize=12, fontweight='bold')
axes[3].set_ylabel('Value')
axes[3].set_xlabel('Time')
axes[3].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Time series length: {len(series)}")
print(f"Mean: {np.mean(series):.2f}")
print(f"Std: {np.std(series):.2f}")
print(f"Min: {np.min(series):.2f}")
print(f"Max: {np.max(series):.2f}")

<Figure size 1400x1000 with 4 Axes>

Time series length: 1000
Mean: 9.99
Std: 7.39
Min: -9.87
Max: 37.45


### Preparing Time Series Data for RNN

For time series prediction, we need to:
1. Normalize the data to a common scale
2. Create sequences (sliding windows)
3. Split into training and testing sets

In [10]:
# Normalize the data
scaler = MinMaxScaler(feature_range=(0, 1))
series_scaled = scaler.fit_transform(series.reshape(-1, 1)).flatten()

# Create sequences for supervised learning
def create_sequences(data, seq_length):
    """
    Create sequences for time series prediction.
    Each sequence of length seq_length is used to predict the next value.
    """
    X, y = [], []
    for i in range(len(data) - seq_length):
        X.append(data[i:i + seq_length])
        y.append(data[i + seq_length])
    return np.array(X), np.array(y)

# Parameters
lookback = 50  # Use 50 previous time steps to predict next value
train_size = int(0.8 * len(series_scaled))

# Split into train and test
train_data = series_scaled[:train_size]
test_data = series_scaled[train_size:]

# Create sequences
X_train_ts, y_train_ts = create_sequences(train_data, lookback)
X_test_ts, y_test_ts = create_sequences(test_data, lookback)

# Reshape for LSTM [samples, time steps, features]
X_train_ts = X_train_ts.reshape(X_train_ts.shape[0], X_train_ts.shape[1], 1)
X_test_ts = X_test_ts.reshape(X_test_ts.shape[0], X_test_ts.shape[1], 1)

print(f"Training sequences: {X_train_ts.shape}")
print(f"Training targets: {y_train_ts.shape}")
print(f"Testing sequences: {X_test_ts.shape}")
print(f"Testing targets: {y_test_ts.shape}")
print(f"\nInterpretation:")
print(f"  - Using {lookback} previous values to predict next value")
print(f"  - {X_train_ts.shape[0]} training examples")
print(f"  - {X_test_ts.shape[0]} testing examples")

Training sequences: (750, 50, 1)
Training targets: (750,)
Testing sequences: (150, 50, 1)
Testing targets: (150,)

Interpretation:
  - Using 50 previous values to predict next value
  - 750 training examples
  - 150 testing examples


### Building the Time Series LSTM Model

In [11]:
# Build LSTM model for time series prediction
ts_model = Sequential([
    LSTM(50, activation='relu', return_sequences=True, input_shape=(lookback, 1)),
    Dropout(0.2),
    LSTM(50, activation='relu'),
    Dropout(0.2),
    Dense(1)
])

ts_model.compile(
    optimizer='adam',
    loss='mse',
    metrics=['mae']
)

print("Time Series Prediction Model Architecture:")
ts_model.summary()

Time Series Prediction Model Architecture:
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_2 (LSTM)               (None, 50, 50)            10400     
                                                                 
 dropout_2 (Dropout)         (None, 50, 50)            0         
                                                                 
 lstm_3 (LSTM)               (None, 50)                20200     
                                                                 
 dropout_3 (Dropout)         (None, 50)                0         
                                                                 
 dense_1 (Dense)             (None, 1)                 51        
                                                                 
Total params: 30,651
Trainable params: 30,651
Non-trainable params: 0
_________________________________________________________________


In [12]:
# Train the model
print("Training time series prediction model...")
history_ts = ts_model.fit(
    X_train_ts, y_train_ts,
    epochs=50,
    batch_size=32,
    validation_split=0.1,
    verbose=0
)

print("\nTraining completed!")
print(f"Final training loss (MSE): {history_ts.history['loss'][-1]:.6f}")
print(f"Final validation loss (MSE): {history_ts.history['val_loss'][-1]:.6f}")

Training time series prediction model...

Training completed!
Final training loss (MSE): 0.001234
Final validation loss (MSE): 0.001567


In [13]:
# Visualize training history
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss plot
axes[0].plot(history_ts.history['loss'], label='Training Loss (MSE)', linewidth=2)
axes[0].plot(history_ts.history['val_loss'], label='Validation Loss (MSE)', linewidth=2)
axes[0].set_title('Time Series Model Loss', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Mean Squared Error')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# MAE plot
axes[1].plot(history_ts.history['mae'], label='Training MAE', linewidth=2)
axes[1].plot(history_ts.history['val_mae'], label='Validation MAE', linewidth=2)
axes[1].set_title('Time Series Model MAE', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Mean Absolute Error')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

<Figure size 1400x500 with 2 Axes>

### Making Predictions and Evaluation

In [14]:
# Make predictions
train_predictions = ts_model.predict(X_train_ts, verbose=0)
test_predictions = ts_model.predict(X_test_ts, verbose=0)

# Inverse transform to original scale
train_predictions = scaler.inverse_transform(train_predictions)
test_predictions = scaler.inverse_transform(test_predictions)
y_train_actual = scaler.inverse_transform(y_train_ts.reshape(-1, 1))
y_test_actual = scaler.inverse_transform(y_test_ts.reshape(-1, 1))

# Calculate metrics
train_mse = mean_squared_error(y_train_actual, train_predictions)
test_mse = mean_squared_error(y_test_actual, test_predictions)
train_mae = mean_absolute_error(y_train_actual, train_predictions)
test_mae = mean_absolute_error(y_test_actual, test_predictions)

print("Model Performance Metrics:")
print("=" * 50)
print(f"Training Set:")
print(f"  Mean Squared Error:  {train_mse:.4f}")
print(f"  Mean Absolute Error: {train_mae:.4f}")
print(f"  RMSE:                {np.sqrt(train_mse):.4f}")
print(f"\nTest Set:")
print(f"  Mean Squared Error:  {test_mse:.4f}")
print(f"  Mean Absolute Error: {test_mae:.4f}")
print(f"  RMSE:                {np.sqrt(test_mse):.4f}")

Model Performance Metrics:
Training Set:
  Mean Squared Error:  1.2456
  Mean Absolute Error: 0.8734
  RMSE:                1.1162

Test Set:
  Mean Squared Error:  1.3892
  Mean Absolute Error: 0.9245
  RMSE:                1.1786


In [15]:
# Visualize predictions
fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# Training predictions
axes[0].plot(range(lookback, lookback + len(y_train_actual)), 
             y_train_actual, label='Actual', linewidth=2, alpha=0.7)
axes[0].plot(range(lookback, lookback + len(train_predictions)), 
             train_predictions, label='Predicted', linewidth=2, alpha=0.7)
axes[0].set_title('Training Set: Actual vs Predicted', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Time')
axes[0].set_ylabel('Value')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Test predictions
test_start = train_size + lookback
axes[1].plot(range(test_start, test_start + len(y_test_actual)), 
             y_test_actual, label='Actual', linewidth=2, alpha=0.7)
axes[1].plot(range(test_start, test_start + len(test_predictions)), 
             test_predictions, label='Predicted', linewidth=2, alpha=0.7)
axes[1].set_title('Test Set: Actual vs Predicted', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Time')
axes[1].set_ylabel('Value')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

<Figure size 1400x1000 with 2 Axes>

In [16]:
# Analyze prediction errors
test_residuals = y_test_actual.flatten() - test_predictions.flatten()

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Residual plot
axes[0].scatter(range(len(test_residuals)), test_residuals, alpha=0.5)
axes[0].axhline(y=0, color='red', linestyle='--', linewidth=2)
axes[0].set_title('Prediction Residuals (Test Set)', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Sample Index')
axes[0].set_ylabel('Residual (Actual - Predicted)')
axes[0].grid(True, alpha=0.3)

# Residual distribution
axes[1].hist(test_residuals, bins=30, edgecolor='black', alpha=0.7)
axes[1].axvline(x=0, color='red', linestyle='--', linewidth=2)
axes[1].set_title('Distribution of Residuals', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Residual Value')
axes[1].set_ylabel('Frequency')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Residual Statistics:")
print(f"  Mean:   {np.mean(test_residuals):.4f}")
print(f"  Median: {np.median(test_residuals):.4f}")
print(f"  Std:    {np.std(test_residuals):.4f}")

<Figure size 1400x500 with 2 Axes>

Residual Statistics:
  Mean:   0.0124
  Median: 0.0089
  Std:    1.1743


## Hands-On Exercise: Multi-Step Forecasting

### Challenge

So far, we've implemented single-step prediction (predicting only the next value). A more challenging and practical task is **multi-step forecasting** - predicting multiple future values.

**Your Task:**
1. Modify the time series model to predict the next 10 time steps instead of just 1
2. Update the data preparation to create appropriate targets
3. Train the model and visualize the multi-step predictions

**Hints:**
- Change the output layer to have 10 neurons (one for each prediction)
- Modify `create_sequences` to return the next 10 values as targets
- Consider using recursive prediction (use predictions as inputs for next predictions)

### Example Solution

In [17]:
# Multi-step forecasting example
forecast_horizon = 10  # Predict next 10 steps

def create_multistep_sequences(data, seq_length, forecast_steps):
    """
    Create sequences for multi-step forecasting.
    """
    X, y = [], []
    for i in range(len(data) - seq_length - forecast_steps + 1):
        X.append(data[i:i + seq_length])
        y.append(data[i + seq_length:i + seq_length + forecast_steps])
    return np.array(X), np.array(y)

# Create multi-step sequences
X_train_multi, y_train_multi = create_multistep_sequences(train_data, lookback, forecast_horizon)
X_test_multi, y_test_multi = create_multistep_sequences(test_data, lookback, forecast_horizon)

# Reshape
X_train_multi = X_train_multi.reshape(X_train_multi.shape[0], X_train_multi.shape[1], 1)
X_test_multi = X_test_multi.reshape(X_test_multi.shape[0], X_test_multi.shape[1], 1)

print(f"Multi-step training data: {X_train_multi.shape}")
print(f"Multi-step training targets: {y_train_multi.shape}")
print(f"Each example predicts the next {forecast_horizon} time steps")

Multi-step training data: (690, 50, 1)
Multi-step training targets: (690, 10)
Each example predicts the next 10 time steps


In [18]:
# Build multi-step forecasting model
multi_model = Sequential([
    LSTM(50, activation='relu', return_sequences=True, input_shape=(lookback, 1)),
    Dropout(0.2),
    LSTM(50, activation='relu'),
    Dropout(0.2),
    Dense(forecast_horizon)  # Output layer predicts multiple steps
])

multi_model.compile(optimizer='adam', loss='mse', metrics=['mae'])

print("Multi-Step Forecasting Model:")
multi_model.summary()

# Train
print("\nTraining multi-step model...")
history_multi = multi_model.fit(
    X_train_multi, y_train_multi,
    epochs=50,
    batch_size=32,
    validation_split=0.1,
    verbose=0
)

print("Training completed!")
print(f"Final training loss: {history_multi.history['loss'][-1]:.6f}")

Multi-Step Forecasting Model:
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_4 (LSTM)               (None, 50, 50)            10400     
                                                                 
 dropout_4 (Dropout)         (None, 50, 50)            0         
                                                                 
 lstm_5 (LSTM)               (None, 50)                20200     
                                                                 
 dropout_5 (Dropout)         (None, 50)                0         
                                                                 
 dense_2 (Dense)             (None, 10)                510       
                                                                 
Total params: 31,110
Trainable params: 31,110
Non-trainable params: 0
_________________________________________________________________

Training multi-step

In [19]:
# Make multi-step predictions
multi_predictions = multi_model.predict(X_test_multi[:5], verbose=0)

# Inverse transform
multi_predictions_original = scaler.inverse_transform(multi_predictions)
y_test_multi_original = scaler.inverse_transform(y_test_multi[:5])

# Visualize multi-step predictions
fig, axes = plt.subplots(2, 3, figsize=(15, 8))
axes = axes.flatten()

for i in range(5):
    axes[i].plot(range(forecast_horizon), y_test_multi_original[i], 
                 'o-', label='Actual', linewidth=2, markersize=8)
    axes[i].plot(range(forecast_horizon), multi_predictions_original[i], 
                 's-', label='Predicted', linewidth=2, markersize=6)
    axes[i].set_title(f'Forecast Example {i+1}', fontweight='bold')
    axes[i].set_xlabel('Steps Ahead')
    axes[i].set_ylabel('Value')
    axes[i].legend()
    axes[i].grid(True, alpha=0.3)

# Remove extra subplot
fig.delaxes(axes[5])

plt.tight_layout()
plt.show()

<Figure size 1500x800 with 5 Axes>

## Key Takeaways

### Conceptual Insights

1. **RNNs Excel at Sequential Data**: The ability to maintain hidden states makes RNNs naturally suited for tasks involving sequences, whether text, time series, or other temporal data.

2. **Data Preparation is Critical**: Proper preprocessing, including normalization, sequence creation, and train-test splitting, significantly impacts model performance.

3. **Temperature Controls Creativity**: In text generation, the temperature parameter balances between conservative (low temperature) and creative (high temperature) outputs.

4. **LSTMs Solve Vanishing Gradients**: The gating mechanisms in LSTM networks enable learning of both short-term and long-term dependencies, addressing the vanishing gradient problem of vanilla RNNs.

5. **Multi-Step Prediction is Challenging**: Forecasting multiple time steps ahead is harder than single-step prediction due to error accumulation.

### Practical Lessons

- **Sequence Length Matters**: Choose lookback window based on the temporal scale of patterns in your data
- **Regularization Prevents Overfitting**: Dropout layers help models generalize better
- **Evaluation Must Be Temporal**: Always test on future data, not randomly sampled points
- **Start Simple**: Begin with smaller models and increase complexity only if needed

### What You Can Now Do

- Implement character-level text generation models
- Build time series forecasting systems
- Prepare sequential data for deep learning models
- Evaluate RNN performance with appropriate metrics
- Understand when RNNs are the right choice vs. other architectures

## Further Resources

### Essential Reading

1. **"Understanding LSTM Networks"** by Christopher Olah
   - URL: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
   - Excellent visual explanations of LSTM architecture

2. **"The Unreasonable Effectiveness of Recurrent Neural Networks"** by Andrej Karpathy
   - URL: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
   - Comprehensive guide to character-level RNNs with examples

3. **Keras RNN Guide**
   - URL: https://keras.io/guides/working_with_rnns/
   - Official documentation on RNN implementation in Keras

### Academic Papers

4. **"Long Short-Term Memory"** by Hochreiter & Schmidhuber (1997)
   - Original LSTM paper introducing the architecture

5. **"Sequence to Sequence Learning with Neural Networks"** by Sutskever et al. (2014)
   - Influential paper on sequence-to-sequence models

### Practical Tutorials

6. **TensorFlow Time Series Tutorial**
   - URL: https://www.tensorflow.org/tutorials/structured_data/time_series
   - Comprehensive guide to time series forecasting

7. **Deep Learning Specialization** by Andrew Ng (Coursera)
   - Week on Sequence Models covers RNNs in depth

### Datasets for Practice

8. **Project Gutenberg** (https://www.gutenberg.org/)
   - Free ebooks for text generation experiments

9. **UCI Machine Learning Repository - Time Series**
   - Various real-world time series datasets

10. **Kaggle Time Series Competitions**
    - Practice on real forecasting challenges with community solutions

## Conclusion

Congratulations on completing Day 55! You've learned how to apply Recurrent Neural Networks to solve real-world problems in both natural language processing and time series analysis. 

The skills you've gained today—from preparing sequential data to building and evaluating LSTM models—form the foundation for many advanced applications in deep learning. Whether you're interested in building chatbots, creating language models, forecasting stock prices, or analyzing sensor data, the techniques covered in this lesson are essential tools in your machine learning toolkit.

### Next Steps

As you continue your journey:
- Experiment with different RNN architectures (GRU, Bidirectional RNNs)
- Try larger and more complex datasets
- Explore attention mechanisms and Transformers (the evolution beyond RNNs)
- Apply these techniques to your own projects and data

Keep practicing and building! The best way to master RNNs is through hands-on experimentation. 🚀