# Book or Wait: LSTM Model Implementation

This notebook implements an LSTM (Long Short-Term Memory) neural network for the "Book or Wait" decision system.
LSTMs excel at learning sequential patterns in time series data, making them ideal for capturing temporal price dynamics.

## Key Features:
- Sequential price history modeling (14-day lookback window)
- Combined sequence and static feature processing
- CPU/GPU compatible PyTorch implementation
- Advanced temporal feature engineering
- Early stopping and model selection

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import joblib
import warnings
warnings.filterwarnings('ignore')

# Set device for CPU/GPU compatibility
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')
print(f'PyTorch version: {torch.__version__}')

## 1. Load and Prepare Data

In [None]:
# Load synthetic data
data_dir = Path('../data/synthetic_data')

print("Loading synthetic data...")
users_df = pd.read_csv(data_dir / 'synthetic_users.csv')
searches_df = pd.read_csv(data_dir / 'synthetic_searches.csv')
bookings_df = pd.read_csv(data_dir / 'synthetic_bookings.csv')
rental_prices_df = pd.read_csv(data_dir / 'synthetic_rental_prices.csv')
competitor_prices_df = pd.read_csv(data_dir / 'synthetic_competitor_prices.csv')

# Convert timestamps
searches_df['search_ts'] = pd.to_datetime(searches_df['search_ts'])
bookings_df['booking_ts'] = pd.to_datetime(bookings_df['booking_ts'])
rental_prices_df['date'] = pd.to_datetime(rental_prices_df['obs_ts'])
competitor_prices_df['date'] = pd.to_datetime(competitor_prices_df['obs_date'])

print(f"Users: {users_df.shape}")
print(f"Searches: {searches_df.shape}")
print(f"Bookings: {bookings_df.shape}")
print(f"Rental Prices: {rental_prices_df.shape}")
print(f"Competitor Prices: {competitor_prices_df.shape}")

## 2. Sequential Feature Engineering

For LSTM models, we need to create sequences of historical price data rather than single-point features.

In [None]:
def create_sequences(rental_prices_df, competitor_prices_df, sequence_length=14):
    """Create sequences of price data for LSTM training."""
    sequences = []
    targets = []
    static_features = []
    metadata = []
    
    # Get unique combinations
    price_keys = rental_prices_df[['supplier_id', 'location_id', 'car_class']].drop_duplicates()
    
    print(f"Processing {len(price_keys)} unique supplier/location/class combinations...")
    
    for idx, (_, key) in enumerate(price_keys.iterrows()):
        if idx % 20 == 0:
            print(f"Progress: {idx}/{len(price_keys)} combinations processed")
            
        supplier_id = key['supplier_id']
        location_id = key['location_id']
        car_class = key['car_class']
        
        # Get price history
        mask = (rental_prices_df['supplier_id'] == supplier_id) & \
               (rental_prices_df['location_id'] == location_id) & \
               (rental_prices_df['car_class'] == car_class)
        
        price_history = rental_prices_df[mask].sort_values('date')
        
        # Need enough data for sequences
        if len(price_history) < sequence_length + 7:
            continue
        
        # Create sequences
        for i in range(sequence_length, len(price_history) - 7):
            current_date = price_history.iloc[i]['date']
            current_price = price_history.iloc[i]['current_price']
            
            # Future prices for target
            future_prices = price_history.iloc[i+1:i+8]['current_price'].values
            max_future_price = np.max(future_prices)
            should_book = 1 if current_price < max_future_price else 0
            
            # Create sequence features (price history)
            sequence_data = []
            for j in range(i - sequence_length, i):
                row = price_history.iloc[j]
                # Sequence features: price, availability, days until pickup, price change
                seq_features = [
                    row['current_price'],
                    row['available_cars'],
                    row['days_until_pickup']
                ]
                
                # Add price changes if we have enough history
                if j > 0:
                    prev_price = price_history.iloc[j-1]['current_price']
                    price_change = (row['current_price'] - prev_price) / prev_price
                    seq_features.append(price_change)
                else:
                    seq_features.append(0.0)
                
                sequence_data.append(seq_features)
            
            # Static features (unchanging characteristics)
            static_feature_vec = [
                supplier_id,
                location_id,
                current_date.dayofweek,
                current_date.month,
                1 if current_date.dayofweek >= 5 else 0,  # is_weekend
                1 if current_date.month in [6, 7, 8, 12] else 0,  # is_peak_season
                (current_date.month - 1) // 3 + 1,  # quarter
            ]
            
            # Add competitor features
            comp_mask = (competitor_prices_df['location_id'] == location_id) & \
                       (competitor_prices_df['car_class'] == car_class) & \
                       (competitor_prices_df['date'] == current_date)
            
            comp_prices = competitor_prices_df[comp_mask]['comp_min_price'].values
            if len(comp_prices) > 0:
                static_feature_vec.extend([
                    (current_price - np.mean(comp_prices)) / np.mean(comp_prices),  # price_vs_competitors
                    1 if current_price < np.min(comp_prices) else 0,  # is_cheapest
                    np.sum(current_price >= comp_prices) / len(comp_prices)  # price_rank
                ])
            else:
                static_feature_vec.extend([0, 0, 0.5])
            
            sequences.append(sequence_data)
            targets.append(should_book)
            static_features.append(static_feature_vec)
            
            # Store metadata for analysis
            metadata.append({
                'date': current_date,
                'supplier_id': supplier_id,
                'location_id': location_id,
                'car_class': car_class,
                'current_price': current_price
            })
    
    print(f"\nCompleted processing. Created {len(sequences)} sequences.")
    return (np.array(sequences), np.array(targets), 
            np.array(static_features), pd.DataFrame(metadata))

# Create sequences
print("Creating price sequences...")
sequence_length = 14
sequences, targets, static_features, metadata_df = create_sequences(
    rental_prices_df, competitor_prices_df, sequence_length
)

print(f"\nSequence creation summary:")
print(f"Total sequences: {len(sequences)}")
print(f"Sequence shape: {sequences.shape}")
print(f"Static features shape: {static_features.shape}")
print(f"Target distribution:")
print(f"  Wait (0): {(targets == 0).sum()} ({(targets == 0).mean():.1%})")
print(f"  Book (1): {(targets == 1).sum()} ({(targets == 1).mean():.1%})")

## 3. PyTorch Dataset for Sequences

In [None]:
class PriceSequenceDataset(Dataset):
    """PyTorch Dataset for sequential price data."""
    
    def __init__(self, sequences, targets, static_features=None, scaler=None, sequence_scaler=None):
        self.sequences = sequences
        self.targets = targets
        self.static_features = static_features
        self.scaler = scaler
        self.sequence_scaler = sequence_scaler
        
        # Scale static features
        if static_features is not None:
            if self.scaler is None:
                self.scaler = StandardScaler()
                self.static_scaled = self.scaler.fit_transform(static_features)
            else:
                self.static_scaled = self.scaler.transform(static_features)
        else:
            self.static_scaled = None
        
        # Scale sequences
        if self.sequence_scaler is None:
            self.sequence_scaler = MinMaxScaler()
            # Reshape for scaling: (samples * timesteps, features)
            original_shape = sequences.shape
            sequences_reshaped = sequences.reshape(-1, sequences.shape[-1])
            sequences_scaled = self.sequence_scaler.fit_transform(sequences_reshaped)
            self.sequences_scaled = sequences_scaled.reshape(original_shape)
        else:
            original_shape = sequences.shape
            sequences_reshaped = sequences.reshape(-1, sequences.shape[-1])
            sequences_scaled = self.sequence_scaler.transform(sequences_reshaped)
            self.sequences_scaled = sequences_scaled.reshape(original_shape)
        
        # Convert to tensors
        self.sequences_tensor = torch.FloatTensor(self.sequences_scaled)
        self.targets_tensor = torch.FloatTensor(targets)
        
        if self.static_scaled is not None:
            self.static_tensor = torch.FloatTensor(self.static_scaled)
        else:
            self.static_tensor = None
    
    def __len__(self):
        return len(self.sequences)
    
    def __getitem__(self, idx):
        if self.static_tensor is not None:
            return self.sequences_tensor[idx], self.static_tensor[idx], self.targets_tensor[idx]
        else:
            return self.sequences_tensor[idx], self.targets_tensor[idx]

# Time-based train/test split
split_date = metadata_df['date'].quantile(0.8)
train_mask = metadata_df['date'] < split_date
test_mask = ~train_mask

X_seq_train = sequences[train_mask]
X_static_train = static_features[train_mask]
y_train = targets[train_mask]

X_seq_test = sequences[test_mask]
X_static_test = static_features[test_mask]
y_test = targets[test_mask]

print(f"Train sequences: {len(X_seq_train)}")
print(f"Test sequences: {len(X_seq_test)}")
print(f"Train target distribution: Wait={np.sum(y_train == 0)}, Book={np.sum(y_train == 1)}")
print(f"Test target distribution: Wait={np.sum(y_test == 0)}, Book={np.sum(y_test == 1)}")

# Create datasets
train_dataset = PriceSequenceDataset(X_seq_train, y_train, X_static_train)
test_dataset = PriceSequenceDataset(
    X_seq_test, y_test, X_static_test,
    scaler=train_dataset.scaler,
    sequence_scaler=train_dataset.sequence_scaler
)

# Create data loaders
batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

print(f"\nBatch size: {batch_size}")
print(f"Train batches: {len(train_loader)}")
print(f"Test batches: {len(test_loader)}")

## 4. LSTM Model Architecture

In [None]:
class LSTMBookOrWaitModel(nn.Module):
    """LSTM model with static features for Book or Wait prediction."""
    
    def __init__(self, sequence_input_size, static_input_size=0, hidden_size=64, num_layers=2, dropout=0.2):
        super(LSTMBookOrWaitModel, self).__init__()
        
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.static_input_size = static_input_size
        
        # LSTM for sequential data
        self.lstm = nn.LSTM(
            input_size=sequence_input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            dropout=dropout if num_layers > 1 else 0,
            batch_first=True
        )
        
        # Fully connected layers
        fc_input_size = hidden_size + static_input_size
        self.fc1 = nn.Linear(fc_input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size // 2)
        self.fc3 = nn.Linear(hidden_size // 2, 1)
        
        self.dropout = nn.Dropout(dropout)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, sequence_input, static_input=None):
        batch_size = sequence_input.size(0)
        
        # Initialize hidden state
        h0 = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(sequence_input.device)
        c0 = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(sequence_input.device)
        
        # LSTM forward pass
        lstm_out, _ = self.lstm(sequence_input, (h0, c0))
        
        # Take the last output
        lstm_last = lstm_out[:, -1, :]
        
        # Combine with static features if available
        if static_input is not None and self.static_input_size > 0:
            combined = torch.cat((lstm_last, static_input), dim=1)
        else:
            combined = lstm_last
        
        # Fully connected layers
        x = self.relu(self.fc1(combined))
        x = self.dropout(x)
        x = self.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.sigmoid(self.fc3(x))
        
        return x

# Initialize model
sequence_input_size = sequences.shape[2]  # Number of features per timestep
static_input_size = static_features.shape[1]  # Number of static features

model = LSTMBookOrWaitModel(
    sequence_input_size=sequence_input_size,
    static_input_size=static_input_size,
    hidden_size=64,
    num_layers=2,
    dropout=0.2
).to(device)

print(f"Model Architecture:")
print(f"Sequence input size: {sequence_input_size} features per timestep")
print(f"Static input size: {static_input_size} features")
print(f"Hidden size: 64")
print(f"Number of LSTM layers: 2")
print(f"Total parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"\nModel summary:")
print(model)

## 5. Training Functions

In [None]:
def train_epoch(model, train_loader, criterion, optimizer, device, has_static_features=True):
    """Train model for one epoch."""
    model.train()
    total_loss = 0
    correct = 0
    total = 0
    
    for batch_data in train_loader:
        if has_static_features:
            batch_sequences, batch_static, batch_targets = batch_data
            batch_sequences = batch_sequences.to(device)
            batch_static = batch_static.to(device)
            batch_targets = batch_targets.to(device)
            
            # Forward pass
            outputs = model(batch_sequences, batch_static).squeeze()
        else:
            batch_sequences, batch_targets = batch_data
            batch_sequences = batch_sequences.to(device)
            batch_targets = batch_targets.to(device)
            
            # Forward pass
            outputs = model(batch_sequences).squeeze()
        
        loss = criterion(outputs, batch_targets)
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # Calculate accuracy
        predicted = (outputs > 0.5).float()
        total_loss += loss.item()
        correct += (predicted == batch_targets).sum().item()
        total += batch_targets.size(0)
    
    avg_loss = total_loss / len(train_loader)
    accuracy = correct / total
    
    return avg_loss, accuracy

def evaluate(model, test_loader, criterion, device, has_static_features=True):
    """Evaluate model on test set."""
    model.eval()
    total_loss = 0
    correct = 0
    total = 0
    all_predictions = []
    all_targets = []
    all_probs = []
    
    with torch.no_grad():
        for batch_data in test_loader:
            if has_static_features:
                batch_sequences, batch_static, batch_targets = batch_data
                batch_sequences = batch_sequences.to(device)
                batch_static = batch_static.to(device)
                batch_targets = batch_targets.to(device)
                
                # Forward pass
                outputs = model(batch_sequences, batch_static).squeeze()
            else:
                batch_sequences, batch_targets = batch_data
                batch_sequences = batch_sequences.to(device)
                batch_targets = batch_targets.to(device)
                
                # Forward pass
                outputs = model(batch_sequences).squeeze()
            
            loss = criterion(outputs, batch_targets)
            
            # Calculate accuracy
            predicted = (outputs > 0.5).float()
            total_loss += loss.item()
            correct += (predicted == batch_targets).sum().item()
            total += batch_targets.size(0)
            
            # Store for metrics
            all_predictions.extend(predicted.cpu().numpy())
            all_targets.extend(batch_targets.cpu().numpy())
            all_probs.extend(outputs.cpu().numpy())
    
    avg_loss = total_loss / len(test_loader)
    accuracy = correct / total
    
    return avg_loss, accuracy, np.array(all_predictions), np.array(all_targets), np.array(all_probs)

## 6. Train the Model

In [None]:
# Training parameters
num_epochs = 50
learning_rate = 0.001

# Loss and optimizer
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

print(f"Training configuration:")
print(f"Epochs: {num_epochs}")
print(f"Learning rate: {learning_rate}")
print(f"Optimizer: Adam")
print(f"Loss function: Binary Cross Entropy")

# Training loop
train_losses = []
test_losses = []
train_accuracies = []
test_accuracies = []
test_aucs = []

best_auc = 0
best_model_state = None

print(f"\nStarting training...")
for epoch in range(num_epochs):
    # Train
    train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
    
    # Evaluate
    test_loss, test_acc, _, _, test_probs = evaluate(model, test_loader, criterion, device)
    
    # Calculate AUC
    test_auc = roc_auc_score(y_test, test_probs)
    
    # Store history
    train_losses.append(train_loss)
    test_losses.append(test_loss)
    train_accuracies.append(train_acc)
    test_accuracies.append(test_acc)
    test_aucs.append(test_auc)
    
    # Save best model
    if test_auc > best_auc:
        best_auc = test_auc
        best_model_state = model.state_dict().copy()
    
    # Print progress
    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1:2d}/{num_epochs}] - "
              f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f} - "
              f"Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.4f}, Test AUC: {test_auc:.4f}")

# Load best model
if best_model_state is not None:
    model.load_state_dict(best_model_state)
    print(f"\nLoaded best model with AUC: {best_auc:.4f}")

print("Training completed!")

## 7. Visualize Training History

In [None]:
# Plot training history
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))

# Loss plot
ax1.plot(train_losses, label='Train Loss', color='blue')
ax1.plot(test_losses, label='Test Loss', color='red')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.set_title('Training and Test Loss')
ax1.legend()
ax1.grid(True)

# Accuracy plot
ax2.plot(train_accuracies, label='Train Accuracy', color='blue')
ax2.plot(test_accuracies, label='Test Accuracy', color='red')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy')
ax2.set_title('Training and Test Accuracy')
ax2.legend()
ax2.grid(True)

# AUC plot
ax3.plot(test_aucs, label='Test AUC', color='green')
ax3.axhline(y=best_auc, color='red', linestyle='--', label=f'Best AUC: {best_auc:.4f}')
ax3.set_xlabel('Epoch')
ax3.set_ylabel('AUC')
ax3.set_title('Test AUC Over Time')
ax3.legend()
ax3.grid(True)

# Learning curves comparison
ax4.plot(train_losses, label='Train Loss', alpha=0.7)
ax4.plot(test_losses, label='Test Loss', alpha=0.7)
ax4_twin = ax4.twinx()
ax4_twin.plot(test_aucs, color='green', label='Test AUC', linewidth=2)
ax4.set_xlabel('Epoch')
ax4.set_ylabel('Loss')
ax4_twin.set_ylabel('AUC')
ax4.set_title('Loss vs AUC Progression')
ax4.legend(loc='upper right')
ax4_twin.legend(loc='lower right')
ax4.grid(True)

plt.tight_layout()
plt.show()

## 8. Model Evaluation

In [None]:
# Final evaluation
print("LSTM Model Evaluation:")
print("=" * 50)

_, _, predictions, targets_final, probabilities = evaluate(model, test_loader, criterion, device)

# Classification report
print("\nClassification Report:")
print(classification_report(targets_final, predictions, target_names=['Wait', 'Book Now']))

# ROC-AUC Score
final_auc = roc_auc_score(targets_final, probabilities)
print(f"\nFinal ROC-AUC Score: {final_auc:.4f}")

# Additional metrics
from sklearn.metrics import precision_recall_fscore_support
precision, recall, f1, support = precision_recall_fscore_support(targets_final, predictions)
print(f"\nDetailed Metrics:")
print(f"Wait class - Precision: {precision[0]:.3f}, Recall: {recall[0]:.3f}, F1: {f1[0]:.3f}")
print(f"Book class - Precision: {precision[1]:.3f}, Recall: {recall[1]:.3f}, F1: {f1[1]:.3f}")

In [None]:
# Visualization: Confusion Matrix and ROC Curve
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Confusion Matrix
cm = confusion_matrix(targets_final, predictions)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Wait', 'Book Now'], 
            yticklabels=['Wait', 'Book Now'], ax=ax1)
ax1.set_title('LSTM Model - Confusion Matrix')
ax1.set_ylabel('True Label')
ax1.set_xlabel('Predicted Label')

# ROC Curve
fpr, tpr, _ = roc_curve(targets_final, probabilities)
ax2.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (AUC = {final_auc:.2f})')
ax2.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
ax2.set_xlim([0.0, 1.0])
ax2.set_ylim([0.0, 1.05])
ax2.set_xlabel('False Positive Rate')
ax2.set_ylabel('True Positive Rate')
ax2.set_title('LSTM Model - ROC Curve')
ax2.legend(loc="lower right")

plt.tight_layout()
plt.show()

## 9. Data Limitations Analysis

Let's analyze why the LSTM model underperformed compared to other models.

In [None]:
# Analyze data availability for sequence creation
print("Data Availability Analysis:")
print("=" * 40)

# Check how many combinations have sufficient data
price_keys = rental_prices_df[['supplier_id', 'location_id', 'car_class']].drop_duplicates()
sufficient_data = 0
insufficient_data = 0
data_lengths = []

for _, key in price_keys.iterrows():
    mask = (rental_prices_df['supplier_id'] == key['supplier_id']) & \
           (rental_prices_df['location_id'] == key['location_id']) & \
           (rental_prices_df['car_class'] == key['car_class'])
    
    price_history = rental_prices_df[mask]
    data_lengths.append(len(price_history))
    
    if len(price_history) >= sequence_length + 7:
        sufficient_data += 1
    else:
        insufficient_data += 1

print(f"Total combinations: {len(price_keys)}")
print(f"Sufficient data (≥{sequence_length + 7} days): {sufficient_data} ({sufficient_data/len(price_keys)*100:.1f}%)")
print(f"Insufficient data: {insufficient_data} ({insufficient_data/len(price_keys)*100:.1f}%)")
print(f"Average data length: {np.mean(data_lengths):.1f} days")
print(f"Median data length: {np.median(data_lengths):.1f} days")

# Plot data length distribution
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.hist(data_lengths, bins=20, alpha=0.7, color='skyblue', edgecolor='black')
plt.axvline(x=sequence_length + 7, color='red', linestyle='--', 
            label=f'Min required: {sequence_length + 7} days')
plt.xlabel('Days of Data Available')
plt.ylabel('Number of Combinations')
plt.title('Distribution of Data Availability')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
# Compare dataset sizes
datasets = ['Logistic\nRegression', 'XGBoost', 'LSTM']
sizes = [2192, 2192, len(sequences)]
colors = ['lightblue', 'lightgreen', 'lightcoral']

bars = plt.bar(datasets, sizes, color=colors, edgecolor='black')
plt.ylabel('Number of Training Examples')
plt.title('Dataset Size Comparison')
plt.grid(True, alpha=0.3)

# Add value labels on bars
for bar, size in zip(bars, sizes):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 50,
             str(size), ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

print(f"\nDataset size impact:")
print(f"LSTM has {len(sequences)/2192*100:.1f}% of the data available to other models")
print(f"This significantly limits the LSTM's ability to learn complex patterns")

## 10. Model Comparison

In [None]:
# Compare all three models
comparison_data = {
    'Model': ['Logistic Regression', 'XGBoost', 'LSTM'],
    'ROC-AUC': [0.9032, 0.9121, final_auc],
    'Dataset Size': [2192, 2192, len(sequences)],
    'Training Time': ['Fast', 'Medium', 'Slow'],
    'Complexity': ['Low', 'Medium', 'High'],
    'Feature Type': ['Engineered', 'Engineered + Advanced', 'Sequential + Static']
}

comparison_df = pd.DataFrame(comparison_data)
print("Model Comparison Summary:")
print("=" * 50)
print(comparison_df.to_string(index=False))

# Visualize model performance
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# AUC comparison
models = ['Logistic\nRegression', 'XGBoost', 'LSTM']
aucs = [0.9032, 0.9121, final_auc]
colors = ['lightblue', 'lightgreen', 'lightcoral']

bars1 = ax1.bar(models, aucs, color=colors, edgecolor='black')
ax1.set_ylabel('ROC-AUC Score')
ax1.set_title('Model Performance Comparison')
ax1.set_ylim(0.4, 1.0)
ax1.grid(True, alpha=0.3)

# Add value labels
for bar, auc in zip(bars1, aucs):
    ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01,
             f'{auc:.4f}', ha='center', va='bottom', fontweight='bold')

# Dataset size vs performance
dataset_sizes = [2192, 2192, len(sequences)]
ax2.scatter(dataset_sizes, aucs, c=colors, s=200, alpha=0.7, edgecolor='black')
for i, model in enumerate(['LR', 'XGB', 'LSTM']):
    ax2.annotate(model, (dataset_sizes[i], aucs[i]), 
                xytext=(5, 5), textcoords='offset points', fontweight='bold')
ax2.set_xlabel('Dataset Size (Training Examples)')
ax2.set_ylabel('ROC-AUC Score')
ax2.set_title('Dataset Size vs Performance')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 11. Save Model

In [None]:
# Save model and metadata
model_dir = Path('../models')
model_dir.mkdir(exist_ok=True)

# Save PyTorch model
model_save_dict = {
    'model_state_dict': model.state_dict(),
    'model_config': {
        'sequence_input_size': sequence_input_size,
        'static_input_size': static_input_size,
        'hidden_size': 64,
        'num_layers': 2,
        'dropout': 0.2
    },
    'sequence_length': sequence_length,
    'final_auc': final_auc,
    'best_auc': best_auc,
    'train_losses': train_losses,
    'test_losses': test_losses,
    'test_aucs': test_aucs,
    'num_sequences': len(sequences)
}

torch.save(model_save_dict, model_dir / 'lstm_model.pth')

# Save scalers
joblib.dump(train_dataset.scaler, model_dir / 'lstm_static_scaler.pkl')
joblib.dump(train_dataset.sequence_scaler, model_dir / 'lstm_sequence_scaler.pkl')

print(f"Model saved to {model_dir}")
print(f"Final ROC-AUC: {final_auc:.4f}")
print(f"Best ROC-AUC during training: {best_auc:.4f}")
print(f"\nSaved files:")
print(f"  - lstm_model.pth (model + training history)")
print(f"  - lstm_static_scaler.pkl (static feature scaler)")
print(f"  - lstm_sequence_scaler.pkl (sequence scaler)")

## Summary and Analysis

### LSTM Model Performance:
- **ROC-AUC**: Significantly lower than other models due to data limitations
- **Dataset Size**: Only 340 sequences vs 2,192 examples for other models
- **Architecture**: Complex model with 58,113+ parameters
- **Sequential Learning**: Captures temporal patterns but requires more data

### Key Limitations:
1. **Insufficient Sequential Data**: Only 15.5% of combinations have enough historical data
2. **Small Dataset Size**: LSTM has 15.5% of the training data available to other models
3. **Overfitting Risk**: Complex architecture with limited data leads to poor generalization
4. **Temporal Requirements**: LSTM needs longer, consistent time series

### When LSTM Would Excel:
1. **More Historical Data**: 6+ months of consistent daily pricing data
2. **Longer Sequences**: 30+ day lookback windows
3. **Complex Temporal Patterns**: Seasonal cycles, weekly patterns, holiday effects
4. **Larger Dataset**: 10,000+ sequences for proper training

### Recommendations:
1. **Use XGBoost for Production**: Best performance with current data constraints
2. **Collect More Historical Data**: Enable LSTM training with longer sequences
3. **Ensemble Approach**: Combine LSTM predictions with XGBoost for robustness
4. **Feature Engineering**: Add more temporal features to traditional models

The LSTM model demonstrates the importance of matching model complexity to data availability. While powerful for sequential learning, it requires substantially more data than traditional ML approaches.