# Deep Learning Momentum Trading Strategy - Complete Pipeline

Implementation of "Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks" (Taki & Lee, 2013)

## Overview

This notebook demonstrates the complete workflow:
1. **Data Preparation** - Download, filter, and engineer features
2. **Model Training** - Build and train neural network with rolling window validation
3. **Signal Generation** - Rank stocks by predicted probability
4. **Strategy Backtesting** - Long-short portfolio construction
5. **Performance Evaluation** - Analyze results vs benchmark

## Key Results (Expected)
- **Annualized Return**: ~12.8% (vs S&P 500: 7.0%)
- **Sharpe Ratio**: ~1.03 (vs S&P 500: 0.5)
- **Maximum Drawdown**: ~24% (vs S&P 500: 52.6%)

## The "Bitter Lesson" Applied to Finance

This strategy demonstrates that general-purpose deep learning on relatively raw features (simple returns) can discover patterns without complex hand-crafted financial rules. The value is in **ranking** stocks consistently, not predicting returns perfectly.

In [None]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import yaml
import warnings
from datetime import datetime
from pathlib import Path

warnings.filterwarnings('ignore')

# Set plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("Libraries imported successfully!")
print(f"Execution started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

In [None]:
# Load configuration
with open('config.yaml', 'r') as f:
    config = yaml.safe_load(f)

print("Configuration loaded:")
print(f"  Data period: {config['data']['start_date']} to {config['data']['end_date']}")
print(f"  Min price filter: ${config['data']['min_price']}")
print(f"  Features: {config['model']['input_size']}")
print(f"  Model architecture: {config['model']['hidden_layers']}")
print(f"  Strategy: Long Q{config['strategy']['long_quantile']}, Short Q{config['strategy']['short_quantile']}")

## Part 1: Data Preparation

### Step 1: Define Stock Universe

Start with a broad universe of US stocks. In production, you would use all stocks from NYSE, AMEX, and NASDAQ. For this demo, we'll use a subset of liquid, large-cap stocks.

In [None]:
# Define stock universe (subset for demo)
# In production, use full universe from NYSE, AMEX, NASDAQ
stock_universe = [
    # Technology
    'AAPL', 'MSFT', 'GOOGL', 'AMZN', 'NVDA', 'META', 'TSLA', 'NFLX', 'ADBE', 'CRM',
    'ORCL', 'INTC', 'AMD', 'QCOM', 'AVGO', 'TXN', 'MU', 'AMAT', 'LRCX', 'KLAC',
    
    # Financials
    'JPM', 'BAC', 'WFC', 'GS', 'MS', 'C', 'BLK', 'SCHW', 'AXP', 'USB',
    
    # Healthcare
    'UNH', 'JNJ', 'PFE', 'ABBV', 'MRK', 'TMO', 'ABT', 'DHR', 'BMY', 'AMGN',
    
    # Consumer
    'WMT', 'HD', 'MCD', 'NKE', 'SBUX', 'TGT', 'LOW', 'COST', 'TJX', 'DG',
    
    # Industrials
    'BA', 'CAT', 'HON', 'UPS', 'GE', 'MMM', 'LMT', 'DE', 'UNP', 'RTX',
    
    # Energy
    'XOM', 'CVX', 'COP', 'SLB', 'EOG', 'PXD', 'MPC', 'PSX', 'VLO', 'OXY',
    
    # Materials
    'LIN', 'APD', 'NEM', 'FCX', 'DOW', 'NUE', 'ECL', 'SHW', 'DD', 'ALB'
]

print(f"Stock universe: {len(stock_universe)} tickers")
print(f"Sectors: Technology, Financials, Healthcare, Consumer, Industrials, Energy, Materials")

### Step 2: Download and Process Data

This step:
1. Downloads historical price data
2. Filters stocks by minimum price ($5)
3. Engineers 33 momentum features:
   - 12 long-term (monthly returns from t-13 to t-2)
   - 20 short-term (daily returns from recent month)
   - 1 anomaly (January Effect)
4. Applies **cross-sectional z-score standardization** (critical!)
5. Generates binary classification labels

In [None]:
from data_processor import DataProcessor

# Initialize processor
processor = DataProcessor(config)

# Run complete pipeline
features, labels = processor.prepare_dataset(stock_universe)

print(f"\nFeatures shape: {features.shape}")
print(f"Labels shape: {labels.shape}")
print(f"\nFeature columns: {list(features.columns)}")
print(f"\nLabel distribution:")
print(labels.value_counts())

In [None]:
# Visualize feature distributions (after z-score normalization)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Sample a few features
sample_features = ['ret_m2', 'ret_m6', 'ret_d1', 'ret_d5']

for idx, feature in enumerate(sample_features):
    ax = axes[idx // 2, idx % 2]
    features[feature].hist(bins=50, ax=ax, alpha=0.7)
    ax.set_title(f'{feature} Distribution (Z-Scored)')
    ax.set_xlabel('Z-Score')
    ax.set_ylabel('Frequency')
    ax.axvline(0, color='red', linestyle='--', alpha=0.5, label='Mean')
    ax.legend()

plt.tight_layout()
plt.show()

print("\nNote: All features are centered around 0 with unit variance (z-scored)")

## Part 2: Model Training with Rolling Window Validation

### The Importance of Rolling Windows

We use **Rolling Window Cross-Validation** to:
1. Prevent look-ahead bias
2. Simulate realistic live trading
3. Account for market regime changes

Process:
- **Train** on 3 years of data
- **Validate** on next 1 year (model selection, early stopping)
- **Test** on following 1 year (performance evaluation)
- **Roll forward** 6 months and repeat

In [None]:
# Prepare data for PyTorch
import torch
from torch.utils.data import TensorDataset, DataLoader

# Convert to numpy arrays
X = features.values.astype(np.float32)
y = labels.values.astype(np.float32)

# Get date index for time-based splitting
dates = features.index.get_level_values('date')

print(f"Dataset prepared:")
print(f"  X shape: {X.shape}")
print(f"  y shape: {y.shape}")
print(f"  Date range: {dates.min()} to {dates.max()}")
print(f"  GPU available: {torch.cuda.is_available()}")

In [None]:
from model import create_model, print_model_summary, EarlyStopping
import torch.optim as optim
import torch.nn as nn

# Create model
model = create_model(config)
print_model_summary(model)

# Define loss and optimizer
criterion = nn.BCELoss()  # Binary cross-entropy for classification
optimizer = optim.Adam(model.parameters(), lr=config['training']['learning_rate'])

print(f"\nTraining configuration:")
print(f"  Optimizer: Adam")
print(f"  Learning rate: {config['training']['learning_rate']}")
print(f"  Batch size: {config['training']['batch_size']}")
print(f"  Max epochs: {config['training']['epochs']}")
print(f"  Early stopping patience: {config['training']['early_stopping_patience']}")

### Simplified Training (Single Window)

For demonstration, we'll train on one window. In production, you would:
1. Loop through all rolling windows
2. Train a separate model for each window
3. Use each model's predictions for its test period
4. Concatenate all predictions for final strategy evaluation

In [None]:
# Simple train/val/test split (70/15/15)
n_samples = len(X)
train_end = int(n_samples * 0.70)
val_end = int(n_samples * 0.85)

X_train, y_train = X[:train_end], y[:train_end]
X_val, y_val = X[train_end:val_end], y[train_end:val_end]
X_test, y_test = X[val_end:], y[val_end:]

print(f"Data splits:")
print(f"  Train: {X_train.shape[0]:,} samples")
print(f"  Val:   {X_val.shape[0]:,} samples")
print(f"  Test:  {X_test.shape[0]:,} samples")

# Create data loaders
train_dataset = TensorDataset(torch.FloatTensor(X_train), torch.FloatTensor(y_train))
val_dataset = TensorDataset(torch.FloatTensor(X_val), torch.FloatTensor(y_val))

train_loader = DataLoader(train_dataset, batch_size=config['training']['batch_size'], shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=config['training']['batch_size'], shuffle=False)

In [None]:
# Training loop
from tqdm import tqdm

early_stopping = EarlyStopping(patience=config['training']['early_stopping_patience'])
train_losses = []
val_losses = []

print("Starting training...\n")

for epoch in range(config['training']['epochs']):
    # Training phase
    model.train()
    train_loss = 0.0
    
    for batch_X, batch_y in train_loader:
        if torch.cuda.is_available():
            batch_X, batch_y = batch_X.cuda(), batch_y.cuda()
        
        # Forward pass
        outputs = model(batch_X).squeeze()
        loss = criterion(outputs, batch_y)
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        train_loss += loss.item()
    
    train_loss /= len(train_loader)
    train_losses.append(train_loss)
    
    # Validation phase
    model.eval()
    val_loss = 0.0
    
    with torch.no_grad():
        for batch_X, batch_y in val_loader:
            if torch.cuda.is_available():
                batch_X, batch_y = batch_X.cuda(), batch_y.cuda()
            
            outputs = model(batch_X).squeeze()
            loss = criterion(outputs, batch_y)
            val_loss += loss.item()
    
    val_loss /= len(val_loader)
    val_losses.append(val_loss)
    
    # Print progress
    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1}/{config['training']['epochs']} - Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}")
    
    # Early stopping
    if early_stopping(val_loss):
        print(f"\nEarly stopping at epoch {epoch+1}")
        break

print("\nTraining completed!")

In [None]:
# Plot training curves
plt.figure(figsize=(12, 5))

plt.plot(train_losses, label='Train Loss', linewidth=2)
plt.plot(val_losses, label='Val Loss', linewidth=2)
plt.xlabel('Epoch')
plt.ylabel('Loss (Binary Cross-Entropy)')
plt.title('Training and Validation Loss')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"Final train loss: {train_losses[-1]:.4f}")
print(f"Final val loss: {val_losses[-1]:.4f}")

## Part 3: Model Evaluation & Signal Generation

### Prediction Accuracy vs Ranking Ability

**Key Insight**: Raw prediction accuracy (~52%) may seem low, but the model's **ranking ability** is what matters. The spread between high-confidence and low-confidence predictions creates profitable opportunities.

In [None]:
# Generate predictions on test set
model.eval()
with torch.no_grad():
    test_probs = model.predict_proba(X_test)

# Calculate accuracy
test_preds = (test_probs > 0.5).astype(int)
accuracy = (test_preds == y_test).mean()

print(f"Test Set Performance:")
print(f"  Accuracy: {accuracy:.2%}")
print(f"  Predicted Class 1: {test_preds.sum() / len(test_preds):.2%}")
print(f"  Actual Class 1: {y_test.mean():.2%}")

# Confusion matrix
from sklearn.metrics import confusion_matrix, classification_report

cm = confusion_matrix(y_test, test_preds)
print(f"\nConfusion Matrix:")
print(cm)

print(f"\nClassification Report:")
print(classification_report(y_test, test_preds, target_names=['Below Median', 'Above Median']))

In [None]:
# Analyze prediction distribution
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.hist(test_probs, bins=50, alpha=0.7, edgecolor='black')
plt.axvline(0.5, color='red', linestyle='--', label='Threshold=0.5')
plt.xlabel('Predicted Probability')
plt.ylabel('Frequency')
plt.title('Distribution of Predicted Probabilities')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.hist(test_probs[y_test == 0], bins=30, alpha=0.5, label='Actual: Below Median', edgecolor='black')
plt.hist(test_probs[y_test == 1], bins=30, alpha=0.5, label='Actual: Above Median', edgecolor='black')
plt.xlabel('Predicted Probability')
plt.ylabel('Frequency')
plt.title('Predictions by True Label')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### Generate Trading Signals via Quantile Ranking

**Strategy**:
1. Rank all stocks by predicted probability (0 to 1)
2. Divide into 10 quantiles (deciles)
3. **LONG** top quantile (Q10) - highest confidence
4. **SHORT** bottom quantile (Q1) - lowest confidence
5. Rebalance monthly

In [None]:
# Create test dataframe with predictions
test_dates = dates[val_end:]
test_tickers = features.index.get_level_values('ticker')[val_end:]

test_df = pd.DataFrame({
    'date': test_dates,
    'ticker': test_tickers,
    'predicted_prob': test_probs,
    'actual_label': y_test
})

# Add quantile rankings (cross-sectional, per day)
test_df['quantile'] = test_df.groupby('date')['predicted_prob'].transform(
    lambda x: pd.qcut(x, q=10, labels=range(1, 11), duplicates='drop')
)

print(f"Test data with predictions:")
print(test_df.head(20))

print(f"\nQuantile distribution:")
print(test_df['quantile'].value_counts().sort_index())

In [None]:
# Analyze performance by quantile
quantile_performance = test_df.groupby('quantile').agg({
    'predicted_prob': ['mean', 'std'],
    'actual_label': 'mean'  # % that actually outperformed
}).round(4)

quantile_performance.columns = ['Avg Predicted Prob', 'Std Predicted Prob', 'Actual Outperform %']

print("Performance by Quantile:")
print("="*80)
print(quantile_performance)
print("\nKey Insight: Q10 should have highest actual outperformance rate")
print("            Q1 should have lowest actual outperformance rate")
print("            The SPREAD between Q10 and Q1 drives strategy returns")

In [None]:
# Visualize quantile performance
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Predicted probability by quantile
ax1 = axes[0]
quantile_performance['Avg Predicted Prob'].plot(kind='bar', ax=ax1, color='steelblue', alpha=0.7)
ax1.set_xlabel('Quantile')
ax1.set_ylabel('Average Predicted Probability')
ax1.set_title('Model Confidence by Quantile')
ax1.grid(True, alpha=0.3)
ax1.set_xticklabels(ax1.get_xticklabels(), rotation=0)

# Plot 2: Actual outperformance by quantile
ax2 = axes[1]
quantile_performance['Actual Outperform %'].plot(kind='bar', ax=ax2, color='coral', alpha=0.7)
ax2.axhline(0.5, color='red', linestyle='--', label='50% (Random)')
ax2.set_xlabel('Quantile')
ax2.set_ylabel('Actual Outperformance Rate')
ax2.set_title('Realized Outperformance by Quantile')
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.set_xticklabels(ax2.get_xticklabels(), rotation=0)

plt.tight_layout()
plt.show()

# Calculate spread
q10_rate = quantile_performance.loc[10.0, 'Actual Outperform %']
q1_rate = quantile_performance.loc[1.0, 'Actual Outperform %']
spread = q10_rate - q1_rate

print(f"\nQuantile Spread Analysis:")
print(f"  Q10 (Long) outperformance rate: {q10_rate:.2%}")
print(f"  Q1 (Short) outperformance rate: {q1_rate:.2%}")
print(f"  Spread: {spread:.2%}")
print(f"\nA positive spread indicates the model successfully ranks stocks!")

## Part 4: Strategy Backtesting

### Long-Short Portfolio Construction

- **Long**: Equal-weight portfolio of Q10 stocks
- **Short**: Equal-weight portfolio of Q1 stocks  
- **Rebalance**: Monthly
- **Leverage**: 100% long + 100% short = 200% gross, 0% net

In [None]:
print("="*80)
print("STRATEGY IMPLEMENTATION NOTE")
print("="*80)
print("""
For a complete backtest, you would need:

1. Monthly forward returns for each stock
2. Portfolio rebalancing logic
3. Transaction cost modeling
4. Short selling cost/borrow fee modeling
5. Position sizing and risk management

This requires actual price data for all stocks in the test period.
The following cells demonstrate the logic with simplified assumptions.
""")
print("="*80)

In [None]:
# Simplified strategy simulation
# Assumes:
# - actual_label=1 means stock returned above median
# - actual_label=0 means stock returned below median
# - Simplified: Long Q10 earns +1 if actual_label=1, -1 if actual_label=0
# - Simplified: Short Q1 earns +1 if actual_label=0, -1 if actual_label=1

# Calculate strategy returns by quantile
def calculate_quantile_returns(df):
    """
    Simplified return calculation:
    - Long positions: return = 2 * (actual_label - 0.5)
    - Short positions: return = -2 * (actual_label - 0.5)
    """
    results = []
    
    for q in range(1, 11):
        q_df = df[df['quantile'] == float(q)]
        
        # For long strategy (Q10)
        if q == 10:
            long_return = 2 * (q_df['actual_label'].mean() - 0.5)
            results.append({'Quantile': q, 'Type': 'Long', 'Return': long_return})
        
        # For short strategy (Q1)
        elif q == 1:
            short_return = -2 * (q_df['actual_label'].mean() - 0.5)
            results.append({'Quantile': q, 'Type': 'Short', 'Return': short_return})
        
        # Other quantiles (for comparison)
        else:
            neutral_return = 2 * (q_df['actual_label'].mean() - 0.5)
            results.append({'Quantile': q, 'Type': 'Neutral', 'Return': neutral_return})
    
    return pd.DataFrame(results)

quantile_returns = calculate_quantile_returns(test_df)

print("Simplified Quantile Returns (Per Period):")
print(quantile_returns)

# Calculate long-short return
long_ret = quantile_returns[quantile_returns['Type'] == 'Long']['Return'].values[0]
short_ret = quantile_returns[quantile_returns['Type'] == 'Short']['Return'].values[0]
long_short_ret = long_ret + short_ret

print(f"\nLong-Short Strategy (Simplified):")
print(f"  Long Q10 return: {long_ret:+.2%}")
print(f"  Short Q1 return: {short_ret:+.2%}")
print(f"  Combined return: {long_short_ret:+.2%}")

## Summary & Key Takeaways

### What We've Demonstrated

1. **Data Preparation**: Engineered 33 momentum features and applied critical cross-sectional standardization

2. **Model Architecture**: Built a neural network with bottleneck layer to learn compressed momentum representation

3. **Validation Strategy**: Emphasized rolling window cross-validation (though simplified here for demo)

4. **Signal Generation**: Showed that ranking by predicted probability creates a spread between top and bottom quantiles

5. **The Bitter Lesson**: Demonstrated that general-purpose deep learning on relatively raw features can discover patterns without complex hand-crafted rules

### Expected Production Results (From Paper)

- **Annualized Return**: 12.8% vs S&P 500: 7.0%
- **Sharpe Ratio**: 1.03 vs S&P 500: 0.5
- **Max Drawdown**: 24% vs S&P 500: 52.6%
- **Market Correlation**: Negative (diversification benefit)

### Important Caveats

⚠️ **This is Educational**: Not investment advice

⚠️ **Simplified Backtest**: Production requires:
- Complete rolling window implementation
- Realistic transaction costs
- Short selling costs and constraints
- Market impact modeling
- Risk management systems

⚠️ **Results Will Vary**: Due to:
- Different data sources
- Market evolution
- Model architecture choices
- Training procedures

### Next Steps

To extend this research:

1. **Complete Rolling Windows**: Implement full rolling window validation
2. **Expand Universe**: Use all NYSE/AMEX/NASDAQ stocks
3. **Feature Engineering**: Add more sophisticated features (volume, volatility, etc.)
4. **Architecture Search**: Experiment with different network architectures
5. **Ensemble Methods**: Combine multiple models
6. **Risk Management**: Add position limits, stop-losses, etc.
7. **Cost Modeling**: Include realistic transaction and borrowing costs

### References

1. Taki, D., & Lee, A. (2013). "Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks"
2. Sutton, R. (2019). "The Bitter Lesson"
3. Jegadeesh, N., & Titman, S. (1993). "Returns to Buying Winners and Selling Losers"

In [None]:
# Save model and results
output_dir = Path(config['output']['models_dir'])
output_dir.mkdir(exist_ok=True, parents=True)

# Save model
model_path = output_dir / 'momentum_ranker_model.pth'
torch.save(model.state_dict(), model_path)
print(f"Model saved to: {model_path}")

# Save predictions
results_dir = Path(config['output']['results_dir'])
results_dir.mkdir(exist_ok=True, parents=True)

test_df.to_csv(results_dir / 'test_predictions.csv', index=False)
quantile_returns.to_csv(results_dir / 'quantile_returns.csv', index=False)

print(f"Results saved to: {results_dir}")
print("\nPipeline completed successfully!")