# Transformer Classifier for Time Series

This notebook demonstrates how to use the `TransformerClassifier` from the transformer.py module for time series classification. Transformers have become a cornerstone of modern deep learning, originally designed for natural language processing but now widely applied to various domains including time series analysis.

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader, TensorDataset
from sklearn.preprocessing import StandardScaler
import numpy as np
from utils.set_seed import set_seed
from utils.load_data import load_and_split_data
from models.transformer import TransformerClassifier

# Set seed for reproducibility
set_seed(42)

## Introduction to Transformers for Time Series

Transformers use a self-attention mechanism that allows each point in a sequence to attend to all other points, capturing long-range dependencies more effectively than traditional recurrent models like LSTMs. Here's why transformers are powerful for time series analysis:

1. **Parallel Processing**: Unlike RNNs, transformers process the entire sequence in parallel rather than step-by-step, enabling faster training.
2. **Global Context**: Self-attention captures relationships between any two points in the sequence regardless of their distance.
3. **Positional Encoding**: Since transformers don't inherently understand sequence order, positional encodings are added to maintain temporal information.
4. **Multi-Head Attention**: This allows the model to focus on different aspects of the input sequence simultaneously.

In this notebook, we'll use the `TransformerClassifier` to predict classes from time series data.

## A Simple Example

🧠 Step 1: Understanding Transformers in PyTorch

PyTorch's `nn.TransformerEncoder` expects input of shape:
```
(batch_size, seq_len, d_model)
```
where `d_model` is the embedding dimension.

The transformer processes the entire sequence at once, with self-attention allowing each position to attend to all positions in the sequence.

In [2]:
# 🛠 Step 2: Creating a Simple Dataset
# Example sequence
data = np.array([i for i in range(1, 101)], dtype=np.float32)  # [1, 2, ..., 100]

# Sequence parameters
seq_length = 5
X = []
Y = []

for i in range(len(data) - seq_length):
    X.append(data[i:i+seq_length])
    Y.append(data[i+seq_length])

X = torch.tensor(X).unsqueeze(-1)  # Shape: (num_samples, seq_len, 1)
Y = torch.tensor(Y).unsqueeze(-1)  # Shape: (num_samples, 1)

  X = torch.tensor(X).unsqueeze(-1)  # Shape: (num_samples, seq_len, 1)


In [3]:
# 🧱 Step 3: Defining a Simple Transformer Model
class SimpleTransformer(nn.Module):
    def __init__(self, input_size=1, d_model=64, nhead=4, num_layers=2, output_size=1):
        super(SimpleTransformer, self).__init__()
        
        # Input projection
        self.input_proj = nn.Linear(input_size, d_model)
        
        # Positional encoding
        self.pos_encoder = nn.ModuleList([
            nn.Linear(d_model, d_model) for _ in range(seq_length)
        ])
        
        # Transformer encoder layers
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=d_model,
            nhead=nhead,
            dim_feedforward=128,
            batch_first=True
        )
        self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)
        
        # Output layer
        self.output_layer = nn.Linear(d_model, output_size)
        
    def forward(self, x):
        # Project input to d_model dimension
        x = self.input_proj(x)  # [batch_size, seq_len, d_model]
        
        # Apply transformer encoder
        x = self.transformer_encoder(x)  # [batch_size, seq_len, d_model]
        
        # Use the last sequence element for prediction
        x = x[:, -1, :]  # [batch_size, d_model]
        
        # Output projection
        x = self.output_layer(x)  # [batch_size, output_size]
        return x

In [4]:
# 🏋️ Step 4: Training the Simple Model
# Initialize model, loss, optimizer
model = SimpleTransformer()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Training loop
num_epochs = 100

for epoch in range(num_epochs):
    # Forward pass
    outputs = model(X)
    loss = criterion(outputs, Y)
    
    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Epoch [10/100], Loss: 2642.2563
Epoch [20/100], Loss: 1904.6705
Epoch [30/100], Loss: 1227.4397
Epoch [40/100], Loss: 823.4940
Epoch [50/100], Loss: 753.5926
Epoch [60/100], Loss: 728.4941
Epoch [40/100], Loss: 823.4940
Epoch [50/100], Loss: 753.5926
Epoch [60/100], Loss: 728.4941
Epoch [70/100], Loss: 753.8328
Epoch [80/100], Loss: 753.2046
Epoch [90/100], Loss: 755.1202
Epoch [70/100], Loss: 753.8328
Epoch [80/100], Loss: 753.2046
Epoch [90/100], Loss: 755.1202
Epoch [100/100], Loss: 754.7064
Epoch [100/100], Loss: 754.7064


In [5]:
# 🔮 Step 5: Making Predictions
# Predict the next value for a new sequence
with torch.no_grad():
    test_seq = torch.tensor([[96, 97, 98, 99, 100]], dtype=torch.float32).unsqueeze(-1)
    prediction = model(test_seq)
    print(f"Predicted next number: {prediction.item():.2f}")

Predicted next number: 53.93


## Using TransformerClassifier with SSA Data

Now let's apply the TransformerClassifier from our models module to the Stochastic Simulation Algorithm (SSA) time series data.

📦 Step 1: Data Preprocessing

Let's load the mRNA trajectories data, standardize it, and reshape it for the transformer model.

In [6]:
# Load SSA data
output_file = 'data/mRNA_trajectories_example.csv'
X_train, X_val, X_test, y_train, y_val, y_test = load_and_split_data(output_file, split_val_size=0.2)

# Standardize the data (important for transformer models)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)

# Reshape input for transformer: [batch_size, seq_len, features]
# In this case, each time step has a single feature
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_val = X_val.reshape((X_val.shape[0], X_val.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))

print(f"X_train shape: {X_train.shape}")
print(f"y_train shape: {y_train.shape}")

X_train shape: (256, 144, 1)
y_train shape: (256,)


🧱 Step 2: Convert to PyTorch Tensors and Dataloaders

In [7]:
# Convert to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
X_val_tensor = torch.tensor(X_val, dtype=torch.float32)
y_val_tensor = torch.tensor(y_val, dtype=torch.long)

# Create datasets and loaders
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
val_dataset = TensorDataset(X_val_tensor, y_val_tensor)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64)

🧠 Step 3: Initialize and Train TransformerClassifier

In [8]:
# Model hyperparameters
input_size = X_train.shape[2]  # Number of features per time step (1 in our case)
d_model = 64                   # Embedding dimension
nhead = 4                      # Number of attention heads
num_layers = 3                 # Number of transformer layers
output_size = len(np.unique(y_train))  # Number of classes
dropout_rate = 0.2
learning_rate = 0.001

# Initialize the model
model = TransformerClassifier(
    input_size=input_size,
    d_model=d_model,
    nhead=nhead,
    num_layers=num_layers,
    output_size=output_size,
    dropout_rate=dropout_rate,
    learning_rate=learning_rate,
    use_conv1d=True  # Optional: use Conv1D preprocessing
)

# Train the model
history = model.train_model(
    train_loader,
    val_loader=val_loader,
    epochs=50,
    patience=10,
    # save_path='best_transformer_model.pt'  # Uncomment to save the best model
)

🔄 Using device: cuda (1 GPUs available)
DEBUG: Optimizer initialized? True
✅ Running on CUDA!


  return self._call_impl(*args, **kwargs)


Epoch [1/50], Loss: 0.6937, Train Acc: 0.4922
Validation Acc: 0.5000
Epoch [2/50], Loss: 0.6807, Train Acc: 0.5664
Validation Acc: 0.6562
Epoch [3/50], Loss: 0.6641, Train Acc: 0.6445
Validation Acc: 0.6094
No improvement (1/10).
Epoch [4/50], Loss: 0.6397, Train Acc: 0.7227
Validation Acc: 0.6719
Epoch [5/50], Loss: 0.6241, Train Acc: 0.7148
Validation Acc: 0.6094
No improvement (1/10).
Epoch [6/50], Loss: 0.6155, Train Acc: 0.7500
Validation Acc: 0.6406
No improvement (2/10).
Epoch [7/50], Loss: 0.6121, Train Acc: 0.7344
Validation Acc: 0.6406
No improvement (3/10).
Epoch [8/50], Loss: 0.6113, Train Acc: 0.7266
Validation Acc: 0.6406
No improvement (4/10).
Epoch [9/50], Loss: 0.6160, Train Acc: 0.7109
Validation Acc: 0.6406
No improvement (5/10).
Epoch [5/50], Loss: 0.6241, Train Acc: 0.7148
Validation Acc: 0.6094
No improvement (1/10).
Epoch [6/50], Loss: 0.6155, Train Acc: 0.7500
Validation Acc: 0.6406
No improvement (2/10).
Epoch [7/50], Loss: 0.6121, Train Acc: 0.7344
Validation 

🔮 Step 4: Evaluate on Test Set

In [9]:
# Prepare test data
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)

test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
test_loader = DataLoader(test_dataset, batch_size=64)

# Evaluate
test_acc = model.evaluate(test_loader)
print(f"✅ Test accuracy: {test_acc:.4f}")

✅ Test accuracy: 0.7750


## Complete End-to-End Example

Let's put everything together into a single workflow:

In [10]:
# Load and process data
output_file = 'data/mRNA_trajectories_example.csv'
X_train, X_val, X_test, y_train, y_val, y_test = load_and_split_data(output_file, split_val_size=0.2)

# Standardize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)

# Reshape input for transformer: [batch_size, seq_len, features]
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_val = X_val.reshape((X_val.shape[0], X_val.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))

# Convert to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
X_val_tensor = torch.tensor(X_val, dtype=torch.float32)
y_val_tensor = torch.tensor(y_val, dtype=torch.long)

# Create datasets and loaders
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
val_dataset = TensorDataset(X_val_tensor, y_val_tensor)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64)

# Initialize the transformer model
input_size = X_train.shape[2]
d_model = 128
nhead = 8
num_layers = 4
output_size = len(np.unique(y_train))
dropout_rate = 0.3
learning_rate = 0.001

model = TransformerClassifier(
    input_size=input_size,
    d_model=d_model,
    nhead=nhead,
    num_layers=num_layers,
    output_size=output_size,
    dropout_rate=dropout_rate,
    learning_rate=learning_rate,
    use_conv1d=True,
    use_auxiliary=True  # Use auxiliary task for better learning
)

# Train the model
history = model.train_model(
    train_loader,
    val_loader=val_loader,
    epochs=100,
    patience=15,
    save_path='best_transformer_model.pt'
)

# Evaluate on test set
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)

test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
test_loader = DataLoader(test_dataset, batch_size=64)

# Evaluate the model
test_acc = model.evaluate(test_loader)
print(f"✅ Test accuracy: {test_acc:.4f}")

🔄 Using device: cuda (1 GPUs available)
DEBUG: Optimizer initialized? True
✅ Running on CUDA!


  return self._call_impl(*args, **kwargs)


Epoch [1/100], Loss: 17.9435, Train Acc: 0.5078
Validation Acc: 0.5000
✅ Model saved at best_transformer_model.pt (Best Validation Acc: 0.5000)
Epoch [2/100], Loss: 9.1000, Train Acc: 0.5234
Validation Acc: 0.5000
No improvement (1/15).
Epoch [3/100], Loss: 7.8625, Train Acc: 0.5273
Validation Acc: 0.5000
No improvement (2/15).
Epoch [4/100], Loss: 5.2047, Train Acc: 0.5391
Validation Acc: 0.6562
✅ Model saved at best_transformer_model.pt (Best Validation Acc: 0.6562)
Epoch [3/100], Loss: 7.8625, Train Acc: 0.5273
Validation Acc: 0.5000
No improvement (2/15).
Epoch [4/100], Loss: 5.2047, Train Acc: 0.5391
Validation Acc: 0.6562
✅ Model saved at best_transformer_model.pt (Best Validation Acc: 0.6562)
Epoch [5/100], Loss: 3.5383, Train Acc: 0.5703
Validation Acc: 0.6719
✅ Model saved at best_transformer_model.pt (Best Validation Acc: 0.6719)
Epoch [6/100], Loss: 2.7280, Train Acc: 0.6641
Validation Acc: 0.6875
✅ Model saved at best_transformer_model.pt (Best Validation Acc: 0.6875)
Epoch

## Advanced Features

The `TransformerClassifier` class has several advanced features that can improve performance:

1. **Conv1D Preprocessing**: Setting `use_conv1d=True` adds convolutional layers before the transformer to extract local features.

2. **Auxiliary Task Learning**: Setting `use_auxiliary=True` adds an auxiliary regression task that helps the model learn better representations.

3. **Different Optimizers**: You can choose between 'Adam', 'SGD', and 'AdamW' optimizers.

Let's explore these features:

In [11]:
# Example of using different configurations
def train_and_evaluate(use_conv1d=False, use_auxiliary=False, optimizer='Adam'):
    # Initialize the model with specified configuration
    model = TransformerClassifier(
        input_size=input_size,
        d_model=64,
        nhead=4,
        num_layers=2,
        output_size=output_size,
        dropout_rate=0.2,
        learning_rate=0.001,
        optimizer=optimizer,
        use_conv1d=use_conv1d,
        use_auxiliary=use_auxiliary
    )
    
    # Train the model
    history = model.train_model(
        train_loader,
        val_loader=val_loader,
        epochs=30,
        patience=5
    )
    
    # Evaluate on test set
    test_acc = model.evaluate(test_loader)
    return test_acc

# Try different configurations
print(f"Basic Transformer: {train_and_evaluate(use_conv1d=False, use_auxiliary=False):.4f}")
print(f"With Conv1D: {train_and_evaluate(use_conv1d=True, use_auxiliary=False):.4f}")
print(f"With Auxiliary: {train_and_evaluate(use_conv1d=False, use_auxiliary=True):.4f}")
print(f"With Both: {train_and_evaluate(use_conv1d=True, use_auxiliary=True):.4f}")

🔄 Using device: cuda (1 GPUs available)
DEBUG: Optimizer initialized? True
✅ Running on CUDA!
Epoch [1/30], Loss: 0.6799, Train Acc: 0.6172
Validation Acc: 0.6094
Epoch [2/30], Loss: 0.6627, Train Acc: 0.7031
Validation Acc: 0.6406
Epoch [3/30], Loss: 0.6552, Train Acc: 0.7266
Validation Acc: 0.6406
No improvement (1/5).
Epoch [4/30], Loss: 0.6416, Train Acc: 0.7266
Validation Acc: 0.6406
No improvement (2/5).
Epoch [5/30], Loss: 0.6327, Train Acc: 0.7461
Validation Acc: 0.6875
Epoch [6/30], Loss: 0.6259, Train Acc: 0.7656
Validation Acc: 0.6719
No improvement (1/5).
Epoch [7/30], Loss: 0.6147, Train Acc: 0.7500
Validation Acc: 0.6719
No improvement (2/5).
Epoch [8/30], Loss: 0.6056, Train Acc: 0.7383
Validation Acc: 0.6719
No improvement (3/5).
Epoch [9/30], Loss: 0.6047, Train Acc: 0.7695
Validation Acc: 0.6719
No improvement (4/5).
Epoch [10/30], Loss: 0.6083, Train Acc: 0.7578
Validation Acc: 0.6719
No improvement (5/5).
Stopping early! No improvement for 5 epochs.
Training complet

## Comparing with LSTM Classifier

Let's compare the performance of the transformer classifier with the LSTM classifier we used previously:

In [12]:
from models.lstm import LSTMClassifier

# Set up LSTM model
lstm_model = LSTMClassifier(
    input_size=input_size,
    hidden_size=64,
    num_layers=2,
    output_size=output_size,
    dropout_rate=0.3,
    learning_rate=0.001
)

# Train LSTM model
lstm_history = lstm_model.train_model(
    train_loader,
    val_loader=val_loader,
    epochs=50,
    patience=10
)

# Evaluate LSTM model
lstm_acc = lstm_model.evaluate(test_loader)
print(f"LSTM Test accuracy: {lstm_acc:.4f}")
print(f"Transformer Test accuracy: {test_acc:.4f}")

# Print comparison
comparison = "Transformer better" if test_acc > lstm_acc else "LSTM better"
print(f"Comparison: {comparison} by {abs(test_acc - lstm_acc):.4f}")

🔄 Using device: cuda (1 GPUs available)
DEBUG: Optimizer initialized? True
✅ Running on CUDA!
Epoch [1/50], Loss: 0.6934, Train Acc: 0.4961
Validation Acc: 0.5000
Epoch [2/50], Loss: 0.6931, Train Acc: 0.5195
Validation Acc: 0.5000
No improvement (1/10).
Epoch [3/50], Loss: 0.6930, Train Acc: 0.5117
Validation Acc: 0.5000
No improvement (2/10).
Epoch [4/50], Loss: 0.6931, Train Acc: 0.5000
Validation Acc: 0.4844
No improvement (3/10).
Epoch [5/50], Loss: 0.6932, Train Acc: 0.4492
Validation Acc: 0.4844
No improvement (4/10).
Epoch [6/50], Loss: 0.6927, Train Acc: 0.5117
Validation Acc: 0.4844
No improvement (5/10).
Epoch [7/50], Loss: 0.6918, Train Acc: 0.4922
Validation Acc: 0.5156
Epoch [8/50], Loss: 0.6901, Train Acc: 0.5273
Validation Acc: 0.5000
No improvement (1/10).
Epoch [9/50], Loss: 0.6885, Train Acc: 0.5117
Validation Acc: 0.5000
No improvement (2/10).
Epoch [10/50], Loss: 0.6752, Train Acc: 0.6133
Validation Acc: 0.5000
No improvement (3/10).
Epoch [11/50], Loss: 0.6615, Tr

## Conclusion

In this notebook, we've explored using the `TransformerClassifier` for time series classification tasks. Transformers offer several advantages for time series analysis:

1. They can capture long-range dependencies in the data through self-attention.
2. They process sequences in parallel, potentially leading to faster training.
3. The multi-head attention mechanism allows the model to focus on different aspects of the input.

Key settings that can improve transformer performance for time series:
- Using an appropriate number of attention heads (usually 4-8 heads works well)
- Adding Conv1D preprocessing to capture local patterns
- Using auxiliary tasks for more robust feature learning
- Proper standardization of the input data

Whether transformers outperform LSTMs depends on the specific dataset and problem, but they're a powerful addition to the time series modeling toolkit.