## How LSTMs process sequential data

## Objective
The objective of generating this synthetic dataset is to:

- Create a small, manageable time series dataset for our LSTM example
- Have 10 samples with 3 features each to visualize how inputs flow through an LSTM
- Use this data to track cell states and hidden states at each time step
- Build intuition about how LSTM components work with real data

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
import numpy as np

# Set random seed for reproducibility
np.random.seed(42)

# Generate time series data: 10 time steps, 3 features
n_samples = 10
n_features = 3

# Generate 3 input features:
# - Temperature: values between 15-25°C
# - Humidity: values between 40-80%
# - Wind speed: values between 0-15 km/h

temperature = np.array([18.2, 19.5, 20.1, 22.4, 23.8, 25.0, 23.2, 21.5, 19.8, 17.5])
humidity = np.array([65.2, 62.8, 58.5, 55.0, 45.2, 42.1, 48.5, 52.3, 60.5, 67.8])
wind_speed = np.array([5.2, 6.8, 8.5, 10.2, 12.5, 14.8, 13.2, 11.5, 9.2, 6.5])

In [None]:
# Create input features array
X = np.column_stack((temperature, humidity, wind_speed))

# Generate target variable: power consumption in kWh
# This is a function of the features (with some noise)
# Higher temperatures, lower humidity, and higher wind speeds lead to higher power consumption
y = 2.5 * temperature - 0.5 * humidity + 1.2 * wind_speed + np.random.normal(0, 5, n_samples)

print("Input data (X) shape:", X.shape)
print("Target data (y) shape:", y.shape)

print("\nFirst 3 samples of input data:")
print(X[:3])
print("\nFirst 3 samples of target data:")
print(y[:3])

print("\nComplete dataset:")
for i in range(n_samples):
    print(f"Day {i+1}: Temp={X[i,0]:.1f}°C, Humidity={X[i,1]:.1f}%, Wind={X[i,2]:.1f}km/h → Power={y[i]:.1f}kWh")

In [None]:
X

In [None]:
data = pd.DataFrame(np.concatenate([X, y.reshape(-1, 1)], axis = 1),
             columns = ['Temp', "Humidity", "WindSpeed", "PowerConsumption"])

In [None]:
data

## The First Step in LSTM

The first step of setting up an LSTM involves defining its architecture and initializing its states. Let me walk you through this process:

### Determine the LSTM architecture dimensions:

1. **Input dimension (n_features)**: The number of features in your input data (3 in our example - temperature, humidity, wind speed)
2. **Hidden state dimension (n_hidden)**: The number of LSTM units/neurons in your layer
3. **Output dimension:** Depends on your prediction task (1 for our power consumption example)


### Initialize the weight matrices and biases:

- Weight matrices for each gate (forget, input, cell, output gates)
- Bias vectors for each gate


### Initialize the initial states:

- **Cell state (C₀)**: Usually initialized as zeros, with shape [batch_size, n_hidden]
- **Hidden state (h₀)**: Usually initialized as zeros, with shape [batch_size, n_hidden]


Let's walk you through how this would look in code for our example:

In [None]:
# Set random seed for reproducibility
np.random.seed(42)

# Define LSTM dimensions
n_features = 3            # Temperature, humidity, wind speed
n_hidden = 4              # Number of LSTM units
n_output = 1              # Power consumption

# Initialize weight matrices and biases
# For each gate, weights have shape [n_hidden, n_features + n_hidden]
# Biases have shape [n_hidden]

# Forget gate weights and bias
W_f = np.random.randn(n_hidden, n_features + n_hidden) * 0.01
b_f = np.zeros(n_hidden)

# Input gate weights and bias
W_i = np.random.randn(n_hidden, n_features + n_hidden) * 0.01
b_i = np.zeros(n_hidden)

# Cell candidate weights and bias
W_c = np.random.randn(n_hidden, n_features + n_hidden) * 0.01
b_c = np.zeros(n_hidden)

# Output gate weights and bias
W_o = np.random.randn(n_hidden, n_features + n_hidden) * 0.01
b_o = np.zeros(n_hidden)

# Output layer weights and bias
W_y = np.random.randn(n_output, n_hidden) * 0.01
b_y = np.zeros(n_output)

# Initialize initial states
h_0 = np.zeros(n_hidden)  # Initial hidden state
C_0 = np.zeros(n_hidden)  # Initial cell state

print("LSTM Architecture Configuration:")
print(f"Input features: {n_features}")
print(f"Hidden units: {n_hidden}")
print(f"Output dimension: {n_output}")
print("\nWeight matrix shapes:")
print(f"Forget gate (W_f): {W_f.shape}")
print(f"Input gate (W_i): {W_i.shape}")
print(f"Cell candidate (W_c): {W_c.shape}")
print(f"Output gate (W_o): {W_o.shape}")
print(f"Output layer (W_y): {W_y.shape}")
print("\nInitial states:")
print(f"Hidden state (h_0): {h_0}")
print(f"Cell state (C_0): {C_0}")

## Implementing Using Lookback of 2

If you want to implement a lookback of 2 in your LSTM model, it means each prediction will be based on the current and one previous time step. This changes how we structure our input data. Here's what the next step would look like:
Implementing a Lookback of 2

Reshape the input data: We need to reorganize our data into sequences of length 2.

In [None]:
def create_sequences(X, y, lookback=2):
    """Create sequences where we predict the step AFTER the lookback window."""
    X_seq, y_seq = [], []
    for i in range(len(X) - lookback):
        X_seq.append(X[i:i+lookback])  # Sequence of lookback steps
        y_seq.append(y[i+lookback])    # Target is the step after the sequence
    return np.array(X_seq), np.array(y_seq)

X_sequences, y_sequences = create_sequences(X, y, lookback=2)

print("Original X shape:", X.shape)
print("Sequence X shape:", X_sequences.shape)  # Should be [8, 2, 3]
print("Sequence y shape:", y_sequences.shape)  # Should be [8]

In [None]:
X

In [None]:
X_sequences

In [None]:
y_sequences

### Forward Pass Through LSTM with Lookback=2

Next, we need to process each sequence through the LSTM. Since we have a lookback of 2, each sequence will involve two LSTM cell updates:

In [None]:
# Define activation functions
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def tanh(x):
    return np.tanh(x)

# Let's take the first sequence as our example
first_sequence = X_sequences[0]   # Weather data from days 1 and 2
target = y_sequences[0]           # Power consumption on day 3

print("Processing sequence:")
print(first_sequence)
print("Target:", target)

# Initialize states (these are at t=0, before we process any inputs)
h_t = np.zeros(n_hidden)  # Initial hidden state
C_t = np.zeros(n_hidden)  # Initial cell state

In [None]:
print("Initial states:")
print("h_0:", h_t)
print("C_0:", C_t)

In [None]:
# Process each time step in the sequence
for t in range(len(first_sequence)):
    x_t = first_sequence[t]  # Current input at time step t
    
    # Step 1: Concatenate input with previous hidden state
    combined = np.concatenate([x_t, h_t])
    
    # Step 2: Calculate forget gate
    f_t = sigmoid(np.dot(W_f, combined) + b_f)
    
    # Step 3: Calculate input gate
    i_t = sigmoid(np.dot(W_i, combined) + b_i)
    
    # Step 4: Calculate cell candidate
    c_tilde = tanh(np.dot(W_c, combined) + b_c)
    
    # Step 5: Calculate output gate
    o_t = sigmoid(np.dot(W_o, combined) + b_o)
    
    # Step 6: Update cell state
    C_t_prev = C_t.copy()          # Save previous cell state for visualization
    C_t = f_t * C_t + i_t * c_tilde
    
    # Step 7: Update hidden state
    h_t_prev = h_t.copy()  # Save previous hidden state for visualization
    h_t = o_t * tanh(C_t)
    
    print(f"\n--- Time step {t+1} (Day {t+1}) ---")
    print(f"Input x_{t+1}:", x_t)
    print(f"Forget gate f_{t+1}:", f_t)
    print(f"Input gate i_{t+1}:", i_t)
    print(f"Cell candidate c_tilde_{t+1}:", c_tilde)
    print(f"Output gate o_{t+1}:", o_t)
    print(f"Cell state C_{t+1}:", C_t)
    print(f"Change in cell state:", C_t - C_t_prev)
    print(f"Hidden state h_{t+1}:", h_t)
    print(f"Change in hidden state:", h_t - h_t_prev)

# Make prediction using the final hidden state
y_pred = np.dot(W_y, h_t) + b_y

print("\n--- Final Prediction ---")
print("Predicted power consumption:", y_pred[0])
print("Actual power consumption:", target)
print("Prediction error:", y_pred[0] - target)

## States Values Extraction
Let me extract the final states after processing our first sequence, which would be used as initial states for the second sequence.
Based on our previous code, let's complete the forward pass for the first sequence and see what the h_t and C_t values are at the end:

In [None]:
# Initialize states once at the beginning
h_t = np.zeros(n_hidden)  # Shape: [4]
C_t = np.zeros(n_hidden)  # Shape: [4]

print("Initial states before processing first sequence:")
print("h_0:", h_t)
print("C_0:", C_t)

# Process the first sequence
first_sequence = X_sequences[0]  # Weather data from days 1 and 2

for t in range(len(first_sequence)):
    x_t = first_sequence[t]
    
    # Concatenate input with previous hidden state
    combined = np.concatenate([x_t, h_t])
    
    # Calculate gate activations
    f_t = sigmoid(np.dot(W_f, combined) + b_f)
    i_t = sigmoid(np.dot(W_i, combined) + b_i)
    c_tilde = tanh(np.dot(W_c, combined) + b_c)
    o_t = sigmoid(np.dot(W_o, combined) + b_o)
    
    # Update cell state
    C_t = f_t * C_t + i_t * c_tilde
    
    # Update hidden state
    h_t = o_t * tanh(C_t)
    
    print(f"\nAfter processing Day {t+1}:")
    print(f"Cell state C_{t+1}:", C_t)
    print(f"Hidden state h_{t+1}:", h_t)

print("\nFinal states after processing first sequence (Days 1-2):")
print("Final cell state (C_2):", C_t)
print("Final hidden state (h_2):", h_t)
print("\nThese values will be used as initial states for the second sequence (Days 2-3).")

# These final values would be used as initial states for the second sequence
# second_sequence = X_sequences[1]  # Days 2-3
# The initial states for this sequence would be the final C_t and h_t from above

---

## Full Code with Formatted Output 

In [None]:
import numpy as np
from tabulate import tabulate

# Set random seed for reproducibility
np.random.seed(42)

# Generate our sample data
temperature = np.array([18.2, 19.5, 20.1, 22.4, 23.8, 25.0, 23.2, 21.5, 19.8, 17.5])
humidity = np.array([65.2, 62.8, 58.5, 55.0, 45.2, 42.1, 48.5, 52.3, 60.5, 67.8])
wind_speed = np.array([5.2, 6.8, 8.5, 10.2, 12.5, 14.8, 13.2, 11.5, 9.2, 6.5])
X = np.column_stack((temperature, humidity, wind_speed))

# Target: power consumption (kWh)
y = 2.5 * temperature - 0.5 * humidity + 1.2 * wind_speed + np.random.normal(0, 5, 10)

# Create sequences with lookback of 2 (predict the step after the sequence)
def create_sequences(X, y, lookback=2):
    X_seq, y_seq = [], []
    for i in range(len(X) - lookback):
        X_seq.append(X[i:i+lookback])
        y_seq.append(y[i+lookback])
    return np.array(X_seq), np.array(y_seq)

X_sequences, y_sequences = create_sequences(X, y, lookback=2)

# Define LSTM dimensions
n_features = 3  # Temperature, humidity, wind speed
n_hidden = 4    # Number of LSTM units
n_output = 1    # Power consumption prediction

# Initialize weight matrices and biases
# For simplicity, we'll use small random values
W_f = np.random.randn(n_hidden, n_features + n_hidden) * 0.01
b_f = np.zeros(n_hidden)

W_i = np.random.randn(n_hidden, n_features + n_hidden) * 0.01
b_i = np.zeros(n_hidden)

W_c = np.random.randn(n_hidden, n_features + n_hidden) * 0.01
b_c = np.zeros(n_hidden)

W_o = np.random.randn(n_hidden, n_features + n_hidden) * 0.01
b_o = np.zeros(n_hidden)

W_y = np.random.randn(n_output, n_hidden) * 0.01
b_y = np.zeros(n_output)

# Define activation functions
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def tanh(x):
    return np.tanh(x)

# Process all sequences with state persistence
# Initialize states once at the beginning
h_t = np.zeros(n_hidden)
C_t = np.zeros(n_hidden)

print("=" * 80)
print("LSTM FORWARD PASS WITH STATE PERSISTENCE ACROSS SEQUENCES")
print("=" * 80)
print(f"Processing {len(X_sequences)} sequences with lookback=2")
print(f"Initial states: h_0={h_t}, C_0={C_t}")
print("=" * 80)

predictions = []
all_states = []

for seq_idx, sequence in enumerate(X_sequences):
    print(f"\n{'=' * 30} SEQUENCE {seq_idx+1} {'=' * 30}")
    print(f"Days {seq_idx+1}-{seq_idx+2} → Predicting Day {seq_idx+3}")
    
    # Store states for this sequence
    sequence_states = []
    
    # Process each time step in the sequence
    for t in range(len(sequence)):
        x_t = sequence[t]
        day_num = seq_idx + t + 1
        
        # Concatenate input with previous hidden state
        combined = np.concatenate([x_t, h_t])
        
        # Calculate gate activations
        f_t = sigmoid(np.dot(W_f, combined) + b_f)
        i_t = sigmoid(np.dot(W_i, combined) + b_i)
        c_tilde = tanh(np.dot(W_c, combined) + b_c)
        o_t = sigmoid(np.dot(W_o, combined) + b_o)
        
        # Update cell state
        C_t_prev = C_t.copy()
        C_t = f_t * C_t + i_t * c_tilde
        
        # Update hidden state
        h_t_prev = h_t.copy()
        h_t = o_t * tanh(C_t)
        
        # Store states
        sequence_states.append({
            'day': day_num,
            'input': x_t,
            'forget_gate': f_t,
            'input_gate': i_t,
            'cell_candidate': c_tilde,
            'output_gate': o_t,
            'cell_state': C_t.copy(),
            'hidden_state': h_t.copy()
        })
        
        # Print gate values and states for this time step
        print(f"\n--- Time step {t+1} (Day {day_num}) ---")
        print("Input Features:")
        input_table = [
            ["Temperature", f"{x_t[0]:.2f}°C"],
            ["Humidity", f"{x_t[1]:.2f}%"],
            ["Wind Speed", f"{x_t[2]:.2f} km/h"]
        ]
        print(tabulate(input_table, headers=["Feature", "Value"], tablefmt="grid"))
        
        print("\nGate Values:")
        gates_table = [
            ["Forget Gate (f_t)", f"{f_t}"],
            ["Input Gate (i_t)", f"{i_t}"],
            ["Cell Candidate (c_tilde)", f"{c_tilde}"],
            ["Output Gate (o_t)", f"{o_t}"]
        ]
        print(tabulate(gates_table, headers=["Gate", "Values"], tablefmt="grid"))
        
        print("\nState Updates:")
        state_table = [
            ["Previous Cell State", f"{C_t_prev}"],
            ["New Cell State", f"{C_t}"],
            ["Change in Cell State", f"{C_t - C_t_prev}"],
            ["Previous Hidden State", f"{h_t_prev}"],
            ["New Hidden State", f"{h_t}"],
            ["Change in Hidden State", f"{h_t - h_t_prev}"]
        ]
        print(tabulate(state_table, headers=["State", "Values"], tablefmt="grid"))
    
    # Make prediction for this sequence
    y_pred = np.dot(W_y, h_t) + b_y
    predictions.append(y_pred[0])
    
    # Print prediction vs actual
    print(f"\n--- Prediction for Day {seq_idx+3} ---")
    pred_table = [
        ["Predicted Power Consumption", f"{y_pred[0]:.2f} kWh"],
        ["Actual Power Consumption", f"{y_sequences[seq_idx]:.2f} kWh"],
        ["Prediction Error", f"{y_pred[0] - y_sequences[seq_idx]:.2f} kWh"]
    ]
    print(tabulate(pred_table, headers=["Metric", "Value"], tablefmt="grid"))
    
    # Save states for this sequence
    all_states.append(sequence_states)
    
    print(f"\nFinal states after sequence {seq_idx+1}:")
    final_state_table = [
        ["Cell State (C_t)", f"{C_t}"],
        ["Hidden State (h_t)", f"{h_t}"]
    ]
    print(tabulate(final_state_table, headers=["State", "Values"], tablefmt="grid"))
    
    if seq_idx < len(X_sequences) - 1:
        print("\n→ These states will be used as initial states for the next sequence")

# Print overall prediction performance
print("\n" + "=" * 80)
print("OVERALL PREDICTION PERFORMANCE")
print("=" * 80)
mse = np.mean((np.array(predictions) - y_sequences) ** 2)
mae = np.mean(np.abs(np.array(predictions) - y_sequences))

performance_table = [
    ["Mean Squared Error", f"{mse:.4f}"],
    ["Mean Absolute Error", f"{mae:.4f}"]
]
print(tabulate(performance_table, headers=["Metric", "Value"], tablefmt="grid"))

# Print a summary of how states evolved across all sequences
print("\n" + "=" * 80)
print("SUMMARY OF STATE EVOLUTION ACROSS ALL SEQUENCES")
print("=" * 80)

# Extract cell state and hidden state norms for each time step
days = []
cell_state_norms = []
hidden_state_norms = []

for seq_idx, sequence_states in enumerate(all_states):
    for state in sequence_states:
        days.append(state['day'])
        cell_state_norms.append(np.linalg.norm(state['cell_state']))
        hidden_state_norms.append(np.linalg.norm(state['hidden_state']))

state_evolution = []
for i in range(len(days)):
    state_evolution.append([
        days[i],
        f"{cell_state_norms[i]:.4f}",
        f"{hidden_state_norms[i]:.4f}"
    ])

print(tabulate(state_evolution, 
               headers=["Day", "Cell State Norm", "Hidden State Norm"], 
               tablefmt="grid"))

print("\nNote: The norm values show how the 'magnitude' of the states changes over time")
print("Higher values indicate more information is being stored in the states")

In [None]:
# Set random seed for reproducibility
np.random.seed(42)

# Generate data: 10 samples, each with 3 features
n_samples = 10
n_features = 3

# Method 1: Generate random data
X_random = np.random.randn(n_samples, n_features)

# Method 2: Generate structured data (sine wave with noise + trend + random feature)
t = np.linspace(0, 4*np.pi, n_samples)
feature1 = np.sin(t) + np.random.normal(0, 0.1, n_samples)  # Sine wave with noise
feature2 = 0.1 * t + np.random.normal(0, 0.1, n_samples)    # Linear trend with noise
feature3 = np.random.normal(0, 0.5, n_samples)              # Random noise

# Combine features into a dataset
X_structured = np.column_stack((feature1, feature2, feature3))

In [None]:
X_structured

In [None]:
# Let's visualize our data
plt.figure(figsize=(12, 6))

plt.subplot(2, 1, 1)
plt.title("Random Data (3 Features)")
for i in range(n_features):
    plt.plot(X_random[:, i], label=f'Feature {i+1}')
plt.legend()
plt.grid(True)

plt.subplot(2, 1, 2)
plt.title("Structured Data (Sine Wave, Trend, Random)")
for i in range(n_features):
    plt.plot(X_structured[:, i], label=f'Feature {i+1}')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

print("Random data shape:", X_random.shape)
print("Structured data shape:", X_structured.shape)
print("\nRandom data first 3 samples:")
print(X_random[:3])
print("\nStructured data first 3 samples:")
print(X_structured[:3])