- We'll do two examples:

- An LSTM for a sequence classification task using synthetic data.
- A GRU for a time series regression task (predicting future values of a sine wave).

---
**A. LSTM for Sequence Classification (Synthetic Data)**

In this example, we'll generate synthetic sequences where the class label depends on whether a specific sub-sequence pattern appears. This will demonstrate how an LSTM can learn to recognize patterns over time steps.

Task: Classify sequences as Class 0 or Class 1. A sequence is Class 1 if it contains the sub-sequence [1, 1, 0] at any point, otherwise it's Class 0. All other elements will be random.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import seaborn as sns

# --- 1. Device Configuration ---
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# --- 2. Hyperparameters ---
sequence_length = 20  # Length of each input sequence
feature_size = 1      # Each element in the sequence is a single number (0 or 1 for simplicity, or random)
hidden_size_lstm = 64 # Number of features in the hidden state of LSTM
num_layers_lstm = 1   # Number of recurrent layers (stacked LSTMs)
num_classes_seq = 2   # Binary classification (pattern present or not)
num_epochs_seq = 30
batch_size_seq = 32
learning_rate_seq = 0.001

# --- 3. Generate Synthetic Sequential Data ---
print("Generating synthetic sequential data...")
def generate_sequence(seq_len, pattern=[1, 1, 0]):
    contains_pattern = False
    # Decide if this sequence will contain the pattern
    if np.random.rand() > 0.5: # 50% chance to contain pattern
        contains_pattern = True
        pattern_len = len(pattern)
        if seq_len < pattern_len:
            # Fallback if sequence is too short for pattern (should not happen with current settings)
            sequence = np.random.randint(0, 2, seq_len)
            contains_pattern = False # Re-evaluate based on actual sequence
            for i in range(len(sequence) - pattern_len + 1):
                if list(sequence[i:i+pattern_len]) == pattern:
                    contains_pattern = True
                    break
            return sequence, 1 if contains_pattern else 0


        start_idx = np.random.randint(0, seq_len - pattern_len + 1)
        sequence = np.random.randint(0, 2, seq_len) # Fill with random 0s and 1s
        sequence[start_idx : start_idx + pattern_len] = pattern
    else:
        sequence = np.random.randint(0, 2, seq_len)
        # Double check if random sequence accidentally contains pattern
        pattern_len = len(pattern)
        for i in range(len(sequence) - pattern_len + 1):
            if list(sequence[i:i+pattern_len]) == pattern:
                contains_pattern = True # It does, so label it as 1
                break
    return sequence, 1 if contains_pattern else 0

num_samples_seq = 2000
sequences = []
labels = []
for _ in range(num_samples_seq):
    seq, label = generate_sequence(sequence_length)
    sequences.append(seq)
    labels.append(label)

sequences = np.array(sequences, dtype=np.float32).reshape(-1, sequence_length, feature_size)
labels = np.array(labels, dtype=np.int64)

print(f"Generated {num_samples_seq} sequences of length {sequence_length}.")
print(f"Example sequence (shape {sequences[0].shape}):\n{sequences[0].ravel()}")
print(f"Label for example: {labels[0]}")
print(f"Class distribution: Class 0: {np.sum(labels==0)}, Class 1: {np.sum(labels==1)}")


# Split data
X_train_seq, X_test_seq, y_train_seq, y_test_seq = train_test_split(
    sequences, labels, test_size=0.2, random_state=42, stratify=labels
)

# Convert to PyTorch Tensors
X_train_tensor_seq = torch.tensor(X_train_seq, dtype=torch.float32)
y_train_tensor_seq = torch.tensor(y_train_seq, dtype=torch.long) # CrossEntropyLoss expects long for labels
X_test_tensor_seq = torch.tensor(X_test_seq, dtype=torch.float32)
y_test_tensor_seq = torch.tensor(y_test_seq, dtype=torch.long)

# Create TensorDatasets and DataLoaders
train_dataset_seq = TensorDataset(X_train_tensor_seq, y_train_tensor_seq)
test_dataset_seq = TensorDataset(X_test_tensor_seq, y_test_tensor_seq)

train_loader_seq = DataLoader(dataset=train_dataset_seq, batch_size=batch_size_seq, shuffle=True)
test_loader_seq = DataLoader(dataset=test_dataset_seq, batch_size=batch_size_seq, shuffle=False)

# --- 4. Define the LSTM Model for Sequence Classification ---
class LSTMClassifier(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(LSTMClassifier, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        # LSTM layer:
        #   input_size: The number of expected features in the input x
        #   hidden_size: The number of features in the hidden state h
        #   num_layers: Number of recurrent layers.
        #   batch_first=True: Means input/output tensors are (batch, seq, feature)
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        
        # Fully connected output layer
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        # Initialize hidden state and cell state with zeros
        # h0 shape: (num_layers * num_directions, batch, hidden_size)
        # c0 shape: (num_layers * num_directions, batch, hidden_size)
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)

        # Forward propagate LSTM
        # out: tensor of shape (batch_size, seq_length, hidden_size) containing output features from the last layer of LSTM
        # (hn, cn): tuple of hidden state and cell state for t = seq_length
        out, _ = self.lstm(x, (h0, c0))

        # We only need the output from the last time step for classification
        # out shape: (batch_size, seq_len, hidden_size) -> out[:, -1, :] shape: (batch_size, hidden_size)
        out = self.fc(out[:, -1, :]) # Decode the hidden state of the last time step
        return out

# --- 5. Instantiate the Model, Loss, and Optimizer ---
model_lstm = LSTMClassifier(feature_size, hidden_size_lstm, num_layers_lstm, num_classes_seq).to(device)
print("\nLSTM Model Architecture:")
print(model_lstm)

criterion_seq = nn.CrossEntropyLoss()
optimizer_seq = optim.Adam(model_lstm.parameters(), lr=learning_rate_seq)

# --- 6. Training Loop ---
print("\nStarting LSTM Training...")
train_losses_seq = []
for epoch in range(num_epochs_seq):
    model_lstm.train()
    running_loss_seq = 0.0
    for i, (seq_batch, labels_batch) in enumerate(train_loader_seq):
        seq_batch = seq_batch.to(device)
        labels_batch = labels_batch.to(device)

        outputs = model_lstm(seq_batch)
        loss = criterion_seq(outputs, labels_batch)

        optimizer_seq.zero_grad()
        loss.backward()
        optimizer_seq.step()
        running_loss_seq += loss.item()
    
    epoch_loss = running_loss_seq / len(train_loader_seq)
    train_losses_seq.append(epoch_loss)
    if (epoch+1) % 5 == 0 or epoch == num_epochs_seq -1 :
      print(f'Epoch [{epoch+1}/{num_epochs_seq}], Loss: {epoch_loss:.4f}')

print("Finished LSTM Training.")

# --- 7. Evaluation ---
print("\nStarting LSTM Evaluation...")
model_lstm.eval()
all_labels_lstm, all_predicted_lstm = [], []
with torch.no_grad():
    n_correct = 0
    n_samples = 0
    for seq_batch, labels_batch in test_loader_seq:
        seq_batch = seq_batch.to(device)
        labels_batch = labels_batch.to(device)
        outputs = model_lstm(seq_batch)
        _, predicted = torch.max(outputs.data, 1)
        n_samples += labels_batch.size(0)
        n_correct += (predicted == labels_batch).sum().item()
        all_labels_lstm.extend(labels_batch.cpu().numpy())
        all_predicted_lstm.extend(predicted.cpu().numpy())

accuracy_lstm = 100.0 * n_correct / n_samples
print(f'Accuracy of the LSTM on test sequences: {accuracy_lstm:.2f} %')

print("\nConfusion Matrix (LSTM):")
cm_lstm = confusion_matrix(all_labels_lstm, all_predicted_lstm)
sns.heatmap(cm_lstm, annot=True, fmt="d", cmap="cividis", xticklabels=["No Pattern", "Pattern"], yticklabels=["No Pattern", "Pattern"])
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix - LSTM Sequence Classification")
plt.show()

print("\nClassification Report (LSTM):")
print(classification_report(all_labels_lstm, all_predicted_lstm, target_names=["No Pattern", "Pattern"]))

# Plot training loss
plt.figure(figsize=(10,5))
plt.plot(train_losses_seq, label='Training Loss')
plt.title('LSTM Training Loss')
plt.xlabel('Epoch')
plt.ylabel('CrossEntropy Loss')
plt.legend()
plt.grid(True)
plt.show()


# Discussion: LSTM for Sequence Classification Example

This document discusses the key components and rationale behind using a Long Short-Term Memory (LSTM) network for a sequence classification task, as illustrated in the provided example.

## 1. Data Generation Strategy
- **Task**: The synthetic dataset is designed to create a sequence classification problem where the label is contingent on identifying a specific temporal pattern or sub-sequence (e.g., `[1,1,0]`) within a longer sequence.
- **Purpose**: This type of data forces the LSTM to learn and remember information over time steps, which is the core strength of Recurrent Neural Networks (RNNs) and LSTMs. It's not just about individual elements but their order and relationship.

## 2. Input Shape for `nn.LSTM`
- **Requirement**: The `nn.LSTM` layer in PyTorch, when `batch_first=True`, expects input tensors to have a specific shape: `(batch_size, sequence_length, feature_size)`.
    - `batch_size`: The number of sequences processed in parallel.
    - `sequence_length`: The number of time steps in each sequence.
    - `feature_size`: The dimensionality of the input features at each time step.
- **Example Adaptation**: In the described scenario, if each element in the sequence is a single number, `feature_size` would be 1. The raw sequences are reshaped to meet this `(batch_size, seq_len, 1)` format before being fed into the LSTM.

## 3. The `nn.LSTM` Layer
- **Core Component**: `nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)` is the heart of the sequence processing model.
- **Key Parameters**:
    - `input_size`: The number of expected features in the input `x` at each time step (dimensionality of each element in the sequence). In the example, this is 1.
    - `hidden_size`: The number of features in the hidden state `h` (and also the cell state `c`). This determines the capacity of the LSTM to store information.
    - `num_layers (int, optional)`: The number of recurrent layers. E.g., setting `num_layers=2` would mean stacking two LSTMs together to form a "stacked LSTM," with the second LSTM taking in outputs of the first LSTM and computing the final results. This can allow for learning more complex temporal hierarchies (default is 1).
    - `batch_first (bool, optional)`: If `True`, then the input and output tensors are provided as `(batch, seq, feature)` instead of `(seq, batch, feature)`. `True` is often more intuitive (default is `False`).
    - `dropout (float, optional)`: If non-zero, introduces a Dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to `dropout`.
    - `bidirectional (bool, optional)`: If `True`, becomes a bidirectional LSTM (processes sequence from start-to-end and end-to-start).

## 4. Initialization of Hidden and Cell States (`h0`, `c0`)
- **Requirement**: LSTMs (and other RNNs) require an initial hidden state (`h_0`) and, for LSTMs specifically, an initial cell state (`c_0`) at the beginning of processing each sequence.
- **Common Practice**: For each batch, these initial states are typically initialized to tensors of zeros.
    - Shape of `h_0`: `(num_layers * num_directions, batch_size, hidden_size)`
    - Shape of `c_0`: `(num_layers * num_directions, batch_size, hidden_size)`
- **Action**: Before processing a new batch of sequences, new zero-tensors for `h_0` and `c_0` are created and passed to the LSTM layer along with the input batch. These must be moved to the correct device (CPU/GPU).

## 5. Using the LSTM Output for Classification
- **LSTM Output Structure**: The `nn.LSTM` layer returns:
    1.  `output`: A tensor containing the output features (`h_t`) from the last layer of the LSTM, for each time step. If `batch_first=True`, its shape is `(batch_size, sequence_length, num_directions * hidden_size)`.
    2.  `(h_n, c_n)`: A tuple containing the final hidden state and final cell state for each element in the batch.
        - `h_n` shape: `(num_layers * num_directions, batch_size, hidden_size)`
        - `c_n` shape: `(num_layers * num_directions, batch_size, hidden_size)`
- **Strategy for Sequence Classification**: A common approach is to use the hidden state of the LSTM from the *last time step* as a summary or encoding of the entire sequence.
    - This can be obtained either from `output[:, -1, :]` (which takes the last time step's output from all layers if `num_layers=1`, or the last layer's output at the last time step) or from `h_n` (by selecting the appropriate layer's final hidden state, often the last layer if `num_layers > 1`).
    - For a uni-directional, single-layer LSTM, `output[:, -1, :]` is equivalent to `h_n.squeeze(0)` (if `num_layers * num_directions = 1`).
- **Final Classification**: This last hidden state vector (e.g., `output[:, -1, :]`) is then fed into one or more fully connected (`nn.Linear`) layers, followed by an appropriate activation function (e.g., Sigmoid for binary classification, Softmax via `nn.CrossEntropyLoss` for multi-class) to produce the final class predictions.

## 6. Evaluation Metrics
- **Standard Approach**: For sequence classification tasks, standard classification metrics are used, similar to those in image or tabular data classification.
- **Examples**:
    - **Accuracy**: The proportion of correctly classified sequences.
    - **Precision, Recall, F1-score**: Especially useful for imbalanced datasets.
    - **Confusion Matrix**: To visualize performance across different classes.
    - **Loss Value**: The value from the chosen loss function (e.g., `nn.BCELossWithLogits` for binary, `nn.CrossEntropyLoss` for multi-class) is also a key indicator.

## Summary
This example using an LSTM for sequence classification demonstrates how to leverage the memory capabilities of LSTMs to understand temporal dependencies in data. Key aspects include correctly shaping the input, initializing hidden/cell states, and strategically using the LSTM's output (often the final hidden state) as a feature representation for a subsequent classification layer. The synthetic data generation method ensures the task genuinely requires learning sequential patterns.

---

**A. More Complex LSTM for Sequence Classification**

- Task: We'll generate sequences where each element is a small vector. The classification task will be to determine if a sequence contains a specific "trigger" pattern followed by an "action" pattern within a certain window, making it a more complex temporal dependency problem than just detecting a single sub-sequence.

- Complexity Additions:

- Input features per time step are vectors.
- More complex sequential pattern for classification.
- Stacked Bidirectional LSTM.
- Dropout for regularization.
- Validation loop during training.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset, random_split
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import seaborn as sns

# --- 1. Device Configuration ---
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# --- 2. Hyperparameters ---
sequence_length = 30
feature_size = 5     # Each element in the sequence is a 5-dim vector
hidden_size_lstm = 128
num_layers_lstm = 2  # Stacked LSTM
num_classes_seq = 2  # Binary: specific complex pattern present or not
num_epochs_seq = 40
batch_size_seq = 64
learning_rate_seq = 0.001
dropout_lstm = 0.4

# --- 3. Generate Complex Synthetic Sequential Data ---
print("Generating complex synthetic sequential data...")
# Define patterns: trigger_pattern followed by action_pattern within a window
trigger_pattern = np.array([[1,0,0,0,0], [0,1,0,0,0]], dtype=np.float32) # Example 2-step trigger
action_pattern = np.array([[0,0,1,0,0], [0,0,0,1,0]], dtype=np.float32)  # Example 2-step action
max_gap_between_patterns = 5 # Max steps between end of trigger and start of action

def generate_complex_sequence(seq_len, feat_size, trigger, action, max_gap):
    sequence = np.random.rand(seq_len, feat_size).astype(np.float32) * 0.5 # Background noise
    label = 0 # Default: pattern not present

    if np.random.rand() > 0.5: # 50% chance to attempt inserting the complex pattern
        trigger_len = trigger.shape[0]
        action_len = action.shape[0]
        
        if seq_len >= trigger_len + action_len + 1: # Ensure enough space
            # Try to place trigger
            trigger_start = np.random.randint(0, seq_len - trigger_len - action_len - max_gap +1)
            sequence[trigger_start : trigger_start + trigger_len] = trigger
            
            # Try to place action after trigger within the gap
            action_start_min = trigger_start + trigger_len
            action_start_max = min(seq_len - action_len, trigger_start + trigger_len + max_gap)
            
            if action_start_min <= action_start_max :
                action_start = np.random.randint(action_start_min, action_start_max + 1)
                if action_start + action_len <= seq_len:
                    sequence[action_start : action_start + action_len] = action
                    label = 1 # Pattern successfully inserted
    return sequence, label

num_samples_seq = 3000
sequences, labels = [], []
for _ in range(num_samples_seq):
    seq, label = generate_complex_sequence(sequence_length, feature_size, trigger_pattern, action_pattern, max_gap_between_patterns)
    sequences.append(seq)
    labels.append(label)

sequences = np.array(sequences, dtype=np.float32)
labels = np.array(labels, dtype=np.int64)

print(f"Generated {num_samples_seq} sequences. Feature size per step: {feature_size}")
print(f"Class distribution: Class 0: {np.sum(labels==0)}, Class 1: {np.sum(labels==1)}")

# Split data into training, validation, and test sets
X_train_val, X_test_seq, y_train_val, y_test_seq = train_test_split(
    sequences, labels, test_size=0.2, random_state=42, stratify=labels
)
X_train_seq, X_val_seq, y_train_seq, y_val_seq = train_test_split(
    X_train_val, y_train_val, test_size=0.15, random_state=42, stratify=y_train_val # 0.15 of 0.8 = 0.12
)

# Convert to PyTorch Tensors
X_train_tensor_seq = torch.tensor(X_train_seq, dtype=torch.float32)
y_train_tensor_seq = torch.tensor(y_train_seq, dtype=torch.long)
X_val_tensor_seq = torch.tensor(X_val_seq, dtype=torch.float32)
y_val_tensor_seq = torch.tensor(y_val_seq, dtype=torch.long)
X_test_tensor_seq = torch.tensor(X_test_seq, dtype=torch.float32)
y_test_tensor_seq = torch.tensor(y_test_seq, dtype=torch.long)

# Create TensorDatasets and DataLoaders
train_dataset_seq = TensorDataset(X_train_tensor_seq, y_train_tensor_seq)
val_dataset_seq = TensorDataset(X_val_tensor_seq, y_val_tensor_seq)
test_dataset_seq = TensorDataset(X_test_tensor_seq, y_test_tensor_seq)

train_loader_seq = DataLoader(dataset=train_dataset_seq, batch_size=batch_size_seq, shuffle=True)
val_loader_seq = DataLoader(dataset=val_dataset_seq, batch_size=batch_size_seq, shuffle=False)
test_loader_seq = DataLoader(dataset=test_dataset_seq, batch_size=batch_size_seq, shuffle=False)

# --- 4. Define the Complex LSTM Model ---
class ComplexLSTMClassifier(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes, dropout_prob):
        super(ComplexLSTMClassifier, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers,
                            batch_first=True, dropout=dropout_prob if num_layers > 1 else 0,
                            bidirectional=True) # Bidirectional LSTM
        
        # Adjust linear layer input size for bidirectional LSTM
        # Output of bidirectional LSTM is (batch, seq, 2 * hidden_size)
        # We typically take the concatenation of the last hidden states from forward and backward passes
        self.fc = nn.Linear(hidden_size * 2, num_classes) # Multiply hidden_size by 2
        self.dropout = nn.Dropout(dropout_prob)

    def forward(self, x):
        # h0 and c0 shape: (num_layers * num_directions, batch, hidden_size)
        # num_directions is 2 for bidirectional
        h0 = torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size).to(device)
        c0 = torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size).to(device)

        out, _ = self.lstm(x, (h0, c0))
        
        # For bidirectional, out is (batch, seq_len, hidden_size * 2)
        # We can take the output of the last time step from both directions.
        # One way is to concatenate the last hidden state of the forward pass
        # and the first hidden state of the backward pass (which is the last time step's backward output).
        # Or, more simply, just use the full output of the last time step which already combines them.
        # out[:, -1, :] will have dimension hidden_size * 2
        out = self.dropout(out[:, -1, :])
        out = self.fc(out)
        return out

# --- 5. Instantiate the Model, Loss, and Optimizer ---
model_lstm_complex = ComplexLSTMClassifier(feature_size, hidden_size_lstm, num_layers_lstm, num_classes_seq, dropout_lstm).to(device)
print("\nComplex LSTM Model Architecture:")
print(model_lstm_complex)

criterion_seq_complex = nn.CrossEntropyLoss()
optimizer_seq_complex = optim.Adam(model_lstm_complex.parameters(), lr=learning_rate_seq)

# --- 6. Training Loop with Validation ---
print("\nStarting Complex LSTM Training...")
train_losses_c_lstm, val_losses_c_lstm = [], []
train_accs_c_lstm, val_accs_c_lstm = [], []

for epoch in range(num_epochs_seq):
    model_lstm_complex.train()
    running_train_loss, n_correct_train, n_samples_train = 0.0, 0, 0
    for seq_batch, labels_batch in train_loader_seq:
        seq_batch, labels_batch = seq_batch.to(device), labels_batch.to(device)
        outputs = model_lstm_complex(seq_batch)
        loss = criterion_seq_complex(outputs, labels_batch)
        optimizer_seq_complex.zero_grad()
        loss.backward()
        optimizer_seq_complex.step()
        running_train_loss += loss.item() * seq_batch.size(0)
        _, predicted = torch.max(outputs.data, 1)
        n_samples_train += labels_batch.size(0)
        n_correct_train += (predicted == labels_batch).sum().item()
    
    epoch_train_loss = running_train_loss / n_samples_train
    epoch_train_acc = 100.0 * n_correct_train / n_samples_train
    train_losses_c_lstm.append(epoch_train_loss)
    train_accs_c_lstm.append(epoch_train_acc)

    model_lstm_complex.eval()
    running_val_loss, n_correct_val, n_samples_val = 0.0, 0, 0
    with torch.no_grad():
        for seq_batch_val, labels_batch_val in val_loader_seq:
            seq_batch_val, labels_batch_val = seq_batch_val.to(device), labels_batch_val.to(device)
            outputs_val = model_lstm_complex(seq_batch_val)
            loss_val = criterion_seq_complex(outputs_val, labels_batch_val)
            running_val_loss += loss_val.item() * seq_batch_val.size(0)
            _, predicted_val = torch.max(outputs_val.data, 1)
            n_samples_val += labels_batch_val.size(0)
            n_correct_val += (predicted_val == labels_batch_val).sum().item()
            
    epoch_val_loss = running_val_loss / n_samples_val
    epoch_val_acc = 100.0 * n_correct_val / n_samples_val
    val_losses_c_lstm.append(epoch_val_loss)
    val_accs_c_lstm.append(epoch_val_acc)
    
    print(f'Epoch [{epoch+1}/{num_epochs_seq}], Train Loss: {epoch_train_loss:.4f}, Train Acc: {epoch_train_acc:.2f}%, '
          f'Val Loss: {epoch_val_loss:.4f}, Val Acc: {epoch_val_acc:.2f}%')

print("Finished Complex LSTM Training.")

# --- 7. Evaluation on Test Set ---
# (Similar to previous examples, calculate accuracy, confusion matrix, classification report)
model_lstm_complex.eval()
all_labels_lstm_c, all_predicted_lstm_c = [], []
with torch.no_grad():
    n_correct_test, n_samples_test = 0,0
    for sb, lb in test_loader_seq:
        sb, lb = sb.to(device), lb.to(device)
        outs = model_lstm_complex(sb)
        _, p = torch.max(outs.data,1)
        n_samples_test += lb.size(0)
        n_correct_test += (p == lb).sum().item()
        all_labels_lstm_c.extend(lb.cpu().numpy())
        all_predicted_lstm_c.extend(p.cpu().numpy())
accuracy_lstm_c = 100.0 * n_correct_test / n_samples_test
print(f'\nAccuracy of Complex LSTM on test sequences: {accuracy_lstm_c:.2f} %')
# Plotting and detailed reports can be added here as in previous examples

# --- 8. Plot Training and Validation Loss and Accuracy ---
epochs_range = range(1, num_epochs_seq + 1)
plt.figure(figsize=(14, 5))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, train_losses_c_lstm, 'bo-', label='Training Loss')
plt.plot(epochs_range, val_losses_c_lstm, 'ro-', label='Validation Loss')
plt.title('Complex LSTM: Training & Validation Loss')
plt.xlabel('Epochs'); plt.ylabel('Loss'); plt.legend(); plt.grid(True)
plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_accs_c_lstm, 'bs-', label='Training Accuracy')
plt.plot(epochs_range, val_accs_c_lstm, 'rs-', label='Validation Accuracy')
plt.title('Complex LSTM: Training & Validation Accuracy')
plt.xlabel('Epochs'); plt.ylabel('Accuracy (%)'); plt.legend(); plt.grid(True)
plt.tight_layout(); plt.show()



# Discussion: Complex LSTM for Sequence Classification Example

This document discusses the key components and enhancements in a more complex Long Short-Term Memory (LSTM) network designed for sequence classification, particularly when dealing with multi-featured sequences and intricate temporal dependencies.

## 1. Data Characteristics
- **Multi-Feature Time Steps**: Unlike simpler examples where each time step might have a single feature, the input sequences here consist of multiple features at each time step. This means `input_size` for the LSTM will be greater than 1.
- **Complex Temporal Relationships**: The classification task is designed to depend on more sophisticated temporal patterns. For instance, the label might be determined by the occurrence of a specific sequence of patterns (e.g., "pattern A" must be followed by "pattern B" with some variable delay or intervening elements). This requires the LSTM to learn longer-range and more nuanced dependencies.

## 2. Model Architecture (`ComplexLSTMClassifier`)
- **Bidirectional LSTM (`nn.LSTM(..., bidirectional=True)`)**:
    - **Mechanism**: A key enhancement is the use of a bidirectional LSTM. This involves two separate LSTMs:
        1.  One processes the input sequence in the forward direction (from the first time step to the last).
        2.  The other processes the input sequence in the backward direction (from the last time step to the first).
    - **Output Concatenation**: The hidden states (outputs) from the forward LSTM at each time step are typically concatenated with the hidden states from the backward LSTM at the corresponding time step.
    - **Benefit**: This allows the LSTM at any given time step to have information from both past and future contexts within the sequence. This can be very powerful for tasks where understanding the full context around a particular point is crucial for classification.
    - **Impact on Subsequent Layers**: If the `hidden_size` of the LSTM is `H`, the concatenated output for each time step will have a dimensionality of `hidden_size * 2`. This means the `in_features` for the subsequent `nn.Linear` layer (if using the output of the bidirectional LSTM) needs to be adjusted accordingly.

- **Dropout within `nn.LSTM` (`dropout` parameter)**:
    - **Functionality**: If `num_layers > 1` (i.e., a stacked LSTM) and the `dropout` parameter in the `nn.LSTM` constructor is set to a value greater than 0, dropout layers are automatically added on the outputs of each LSTM layer *except the last one*.
    - **Purpose**: This helps to regularize the connections between stacked LSTM layers, preventing them from becoming too co-dependent and improving generalization.

- **Additional `nn.Dropout` Layer**:
    - **Placement**: A standard `nn.Dropout` layer is typically added *after* the LSTM layers (or after concatenating outputs from a bidirectional LSTM) and *before* the final fully connected (`nn.Linear`) classification layer.
    - **Purpose**: This provides further regularization to the features that are fed into the classifier, reducing the risk of overfitting on the training data.

## 3. Training Process Enhancements
- **Validation Loop**:
    - **Integration**: The training process incorporates a validation loop that is executed after each training epoch.
    - **Mechanism**:
        1.  The model is switched to evaluation mode (`model.eval()`). This is critical as it disables dropout layers and ensures layers like Batch Normalization (if used) use their learned running statistics.
        2.  Performance metrics (e.g., loss, accuracy) are computed on a separate validation dataset (unseen during the training updates of that epoch).
        3.  All computations within the validation loop are performed under `with torch.no_grad():` to disable gradient tracking, saving memory and computation.
    - **Benefits**:
        - **Overfitting Detection**: By comparing training performance with validation performance, one can identify if the model is starting to overfit (e.g., training loss decreases while validation loss increases, or training accuracy improves while validation accuracy stagnates/drops).
        - **Hyperparameter Tuning**: Validation performance guides the tuning of hyperparameters.
        - **Early Stopping Potential**: Although not explicitly implemented in the base example, the validation metrics provide the signal needed for early stopping criteria (i.e., stopping training when validation performance no longer improves or starts to degrade, to prevent overfitting and save computational resources).

## Summary
This "Complex LSTM" example demonstrates how to build more powerful sequence classifiers by incorporating bidirectionality for richer contextual understanding and various dropout techniques for robust regularization. The inclusion of a validation loop is a critical best practice for developing reliable models, allowing for ongoing performance monitoring on unseen data and strategies to combat overfitting. These enhancements make the model better suited for challenging sequence classification tasks with multi-featured inputs and complex temporal dependencies.