# Deep Learning for Trading - Chapters 16-21

This notebook covers the deep learning cluster from the Puffin project, demonstrating
feedforward networks, CNNs, LSTMs, autoencoders, GANs, and deep reinforcement learning
for algorithmic trading applications.

**Chapters covered:**
- Ch 16: Deep learning fundamentals (feedforward NNs, training utilities)
- Ch 17: CNNs for financial time series
- Ch 18: RNNs/LSTMs for multivariate time series
- Ch 19: Autoencoders (standard, denoising, variational)
- Ch 20: GANs for synthetic data generation
- Ch 21-22: Deep RL (Q-learning, DQN, DDQN, PPO trading agents)

## 1. Feedforward Neural Networks (Ch 16)

The simplest deep learning architecture for trading: a multi-layer feedforward network
that maps feature vectors to return predictions. `FeedforwardNet` is the raw PyTorch
module, while `TradingFFN` wraps it with a scikit-learn-style `.fit()` / `.predict()` API.

In [None]:
import numpy as np
import torch

from puffin.deep import FeedforwardNet, TradingFFN

# --- Inspect the raw PyTorch module ---
net = FeedforwardNet(input_dim=10, hidden_dims=[64, 32], output_dim=1, dropout=0.3)
print("FeedforwardNet architecture:")
print(net)
print(f"\nTotal parameters: {sum(p.numel() for p in net.parameters()):,}")

# --- Train with the high-level wrapper ---
np.random.seed(42)
n_samples, n_features = 500, 10
X = np.random.randn(n_samples, n_features).astype(np.float32)
# Target: noisy linear combination (simulates return prediction)
true_weights = np.random.randn(n_features)
y = X @ true_weights + 0.1 * np.random.randn(n_samples)

model = TradingFFN(input_dim=n_features, hidden_dims=[64, 32], output_dim=1, device='cpu')
history = model.fit(X, y, epochs=50, lr=0.001, batch_size=64, verbose=True)

# Predict on held-out data
X_test = np.random.randn(50, n_features).astype(np.float32)
y_test = X_test @ true_weights
preds = model.predict(X_test)
mse = np.mean((preds - y_test) ** 2)
print(f"\nTest MSE: {mse:.4f}")
print(f"Model metadata: {model.metadata}")

## 2. Training Utilities (Ch 16)

Production training needs early stopping, learning-rate scheduling, and a reusable
training loop. The `puffin.deep.training` module provides these as composable building blocks.

In [None]:
import torch.nn as nn
import torch.optim as optim

from puffin.deep import EarlyStopping, LRScheduler, training_loop, create_dataloaders, set_seed

set_seed(42)

# Reuse the synthetic data from above
train_loader, val_loader = create_dataloaders(
    X, y, batch_size=64, val_split=0.2, random_seed=42
)

# Build a fresh model
model_pt = FeedforwardNet(input_dim=10, hidden_dims=[64, 32], output_dim=1)
optimizer = optim.Adam(model_pt.parameters(), lr=0.01)
criterion = nn.MSELoss()

# Set up early stopping as a callback
early_stop = EarlyStopping(patience=5, verbose=True)

def es_callback(epoch, train_loss, val_loss, model):
    return early_stop(val_loss, model)

# Set up LR scheduler
scheduler = LRScheduler(optimizer, schedule_type='step', step_size=10, gamma=0.5)

def lr_callback(epoch, train_loss, val_loss, model):
    scheduler.step()
    return False

# Run the generic training loop with both callbacks
history = training_loop(
    model=model_pt,
    train_loader=train_loader,
    val_loader=val_loader,
    epochs=60,
    optimizer=optimizer,
    criterion=criterion,
    callbacks=[lr_callback, es_callback],
    device=torch.device('cpu'),
    verbose=True
)

print(f"\nTraining stopped after {len(history['train_loss'])} epochs")
print(f"Best val loss: {early_stop.best_loss:.6f}")

## 3. CNN for Time Series (Ch 17)

1D convolutions slide learnable filters over a price/feature sequence, detecting
local patterns such as momentum bursts or mean-reversion setups. `Conv1DNet` stacks
multiple conv layers with max-pooling before a fully-connected head.

In [None]:
from puffin.deep import Conv1DNet

np.random.seed(42)

# Simulate a price series with trend + noise
n_steps = 500
raw_prices = np.cumsum(np.random.randn(n_steps) * 0.02) + 100

# Create sliding-window sequences (lookback=20, 1 feature)
lookback = 20
X_seq, y_seq = [], []
for i in range(lookback, len(raw_prices)):
    X_seq.append(raw_prices[i - lookback:i])
    y_seq.append(raw_prices[i])

X_seq = np.array(X_seq, dtype=np.float32).reshape(-1, lookback, 1)  # (N, seq, channels)
y_seq = np.array(y_seq, dtype=np.float32).reshape(-1, 1)

# Instantiate and inspect
cnn = Conv1DNet(input_channels=1, seq_length=lookback, n_filters=[32, 64], kernel_sizes=[3, 3])
print("Conv1DNet architecture:")
print(cnn)
print(f"Parameters: {sum(p.numel() for p in cnn.parameters()):,}")

# Quick forward pass sanity check
sample = torch.FloatTensor(X_seq[:8])
out = cnn(sample)
print(f"\nInput shape:  {sample.shape}  ->  Output shape: {out.shape}")

## 4. LSTM Networks (Ch 18)

LSTMs maintain a hidden state across time steps, making them well-suited for
sequential financial data. `TradingLSTM` handles a single univariate series,
while `MultivariateLSTM` accepts a DataFrame of features and predicts a target column.

In [None]:
from puffin.deep import TradingLSTM, MultivariateLSTM

np.random.seed(42)

# --- Univariate LSTM ---
series = np.cumsum(np.random.randn(300) * 0.5) + 100

lstm = TradingLSTM()
history = lstm.fit(series, lookback=20, epochs=30, lr=0.001, batch_size=32)

# Forecast the next 5 steps
forecast = lstm.predict(series, steps=5)
print(f"\nLast 3 actual values:  {series[-3:]}")
print(f"5-step forecast:       {forecast}")

# --- Multivariate LSTM (concept) ---
import pandas as pd

n = 300
df = pd.DataFrame({
    'open':   np.cumsum(np.random.randn(n) * 0.3) + 100,
    'high':   np.cumsum(np.random.randn(n) * 0.3) + 101,
    'low':    np.cumsum(np.random.randn(n) * 0.3) + 99,
    'close':  np.cumsum(np.random.randn(n) * 0.3) + 100,
    'volume': np.abs(np.random.randn(n) * 1000) + 5000,
})

mv_lstm = MultivariateLSTM()
mv_history = mv_lstm.fit(
    df, target_col='close', lookback=20, epochs=30, lr=0.001,
    hidden_dims=[64, 32]
)

pred = mv_lstm.predict(df)
print(f"\nMultivariate LSTM next-step prediction: {pred[0]:.2f}")
print(f"Actual last close: {df['close'].iloc[-1]:.2f}")

## 5. Autoencoders (Ch 19)

Autoencoders learn a compressed latent representation of the input. Three variants are
provided:

- **Autoencoder** -- standard encoder-decoder for dimensionality reduction
- **DenoisingAutoencoder** -- adds Gaussian noise during training for robustness
- **VAE** -- variational autoencoder with a probabilistic latent space, enabling
  generation of new samples

In [None]:
from puffin.deep import Autoencoder, DenoisingAutoencoder, VAE, AETrainer

np.random.seed(42)

# Synthetic market features (50 dims -> compress to 8)
n_samples, input_dim, encoding_dim = 400, 50, 8
X_ae = np.random.randn(n_samples, input_dim).astype(np.float32)

trainer = AETrainer(device='cpu')

# --- Standard Autoencoder ---
ae = Autoencoder(input_dim=input_dim, encoding_dim=encoding_dim, hidden_dims=[128, 64])
ae_hist = trainer.fit(ae, X_ae, epochs=30, lr=0.001, verbose=True)
features_ae = trainer.extract_features(ae, X_ae)
print(f"\nAutoencoder: {input_dim}D -> {features_ae.shape[1]}D latent")

# --- Denoising Autoencoder ---
dae = DenoisingAutoencoder(input_dim=input_dim, encoding_dim=encoding_dim, noise_factor=0.3)
dae_hist = trainer.fit(dae, X_ae, epochs=30, lr=0.001, verbose=False)
features_dae = trainer.extract_features(dae, X_ae)
print(f"DenoisingAE: {input_dim}D -> {features_dae.shape[1]}D latent  (final loss {dae_hist['val_loss'][-1]:.4f})")

# --- Variational Autoencoder ---
vae = VAE(input_dim=input_dim, latent_dim=encoding_dim, hidden_dims=[128, 64])
vae_hist = trainer.fit(vae, X_ae, epochs=30, lr=0.001, verbose=False)
features_vae = trainer.extract_features(vae, X_ae)
print(f"VAE:         {input_dim}D -> {features_vae.shape[1]}D latent  (final loss {vae_hist['val_loss'][-1]:.4f})")

# Generate new samples from the VAE
vae.eval()
with torch.no_grad():
    generated = vae.sample(n=5, device=torch.device('cpu')).numpy()
print(f"\nGenerated {generated.shape[0]} synthetic feature vectors of dim {generated.shape[1]}")

## 6. GANs for Synthetic Data (Ch 20)

A GAN pits a Generator against a Discriminator to produce realistic synthetic
market data. This is useful for data augmentation, stress testing, and privacy.

In [None]:
from puffin.deep import GAN, SyntheticDataEvaluator

np.random.seed(42)

# Real data: 5-dim feature vectors drawn from a mixture of Gaussians
data_dim = 5
cluster_a = np.random.randn(200, data_dim) * 0.5 + 1.0
cluster_b = np.random.randn(200, data_dim) * 0.5 - 1.0
real_data = np.vstack([cluster_a, cluster_b]).astype(np.float32)
np.random.shuffle(real_data)

# Train a GAN
gan = GAN(latent_dim=16, data_dim=data_dim, device='cpu')
gan_hist = gan.train(real_data, epochs=50, batch_size=64, lr=0.0002, verbose=True)

# Generate synthetic samples
synthetic_data = gan.generate(n_samples=400)
print(f"\nGenerated shape: {synthetic_data.shape}")
print(f"Real mean:      {real_data.mean(axis=0)[:3]}")
print(f"Synthetic mean: {synthetic_data.mean(axis=0)[:3]}")

# Evaluate quality
evaluator = SyntheticDataEvaluator()
dist_results = evaluator.compare_distributions(real_data, synthetic_data)
print(f"\nAvg KS statistic:  {dist_results['avg_ks_statistic']:.4f}")
print(f"Avg mean diff:     {dist_results['avg_mean_diff']:.4f}")
print(f"Avg std diff:      {dist_results['avg_std_diff']:.4f}")

## 7. Deep Reinforcement Learning (Ch 21)

RL agents learn a trading policy by interacting with a simulated environment.
The `puffin.rl` module provides:

- **QLearningAgent** -- tabular Q-learning for discretized state spaces
- **DQNAgent / DDQNAgent** -- deep Q-networks with experience replay
- **TradingEnvironment** -- a Gymnasium-compatible env with configurable reward types

In [None]:
from puffin.rl import QLearningAgent, DQNAgent, DDQNAgent, TradingEnvironment, evaluate_agent

np.random.seed(42)

# Generate synthetic prices (random walk with drift)
prices = np.cumsum(np.random.randn(500) * 0.5) + 100
prices = np.maximum(prices, 1.0)  # keep positive

# --- TradingEnvironment walkthrough ---
env = TradingEnvironment(prices, initial_cash=100_000, commission=0.001, reward_type='pnl')
obs, info = env.reset()
print(f"Observation shape: {obs.shape}")
print(f"Action space: {env.action_space}  (0=sell, 1=hold, 2=buy)")
print(f"Initial info: {info}")

# --- Tabular Q-Learning (discretized) ---
# For Q-learning we need integer states -- use a tiny example
q_agent = QLearningAgent(n_states=100, n_actions=3, lr=0.1, gamma=0.99, epsilon=1.0)
print(f"\nQ-table shape: {q_agent.q_table.shape}")

# Manually run a few updates to demonstrate the API
q_agent.update(state=0, action=2, reward=10.0, next_state=1, done=False)
q_agent.update(state=1, action=1, reward=-2.0, next_state=2, done=False)
print(f"Q[0, buy] after update: {q_agent.q_table[0, 2]:.2f}")

# --- DQN Agent ---
obs_dim = env.observation_space.shape[0]
dqn = DQNAgent(
    state_dim=obs_dim, action_dim=3, lr=1e-3,
    gamma=0.99, buffer_size=5000, batch_size=32,
    target_update=50, device='cpu'
)

# Train for a small number of episodes (demo only)
rewards_dqn = dqn.train(env, episodes=20, epsilon_start=1.0, epsilon_decay=0.95, verbose=False)
print(f"\nDQN -- episodes: {len(rewards_dqn)}, mean reward: {np.mean(rewards_dqn):.2f}")

# --- Double DQN ---
ddqn = DDQNAgent(
    state_dim=obs_dim, action_dim=3, lr=1e-3,
    gamma=0.99, buffer_size=5000, batch_size=32,
    target_update=50, device='cpu'
)
rewards_ddqn = ddqn.train(env, episodes=20, epsilon_start=1.0, epsilon_decay=0.95, verbose=False)
print(f"DDQN -- episodes: {len(rewards_ddqn)}, mean reward: {np.mean(rewards_ddqn):.2f}")

## 8. PPO Trading Agent (Ch 22)

`PPOTrader` wraps stable-baselines3 PPO for on-policy training. It requires
`stable-baselines3` to be installed. The cell below demonstrates the API
and evaluation workflow; set `run_ppo = True` to actually train (requires the
dependency).

In [None]:
run_ppo = False  # flip to True if stable-baselines3 is installed

if run_ppo:
    from puffin.rl import PPOTrader, evaluate_agent, TradingEnvironment

    np.random.seed(42)
    prices_ppo = np.cumsum(np.random.randn(1000) * 0.5) + 100
    prices_ppo = np.maximum(prices_ppo, 1.0)
    env_ppo = TradingEnvironment(prices_ppo, initial_cash=100_000)

    ppo = PPOTrader(env_ppo, learning_rate=3e-4, n_steps=256, batch_size=64, verbose=0)
    ppo.train(total_timesteps=5000)

    results = evaluate_agent(ppo, env_ppo, n_episodes=5)
    print("PPO evaluation results:")
    for k, v in results.items():
        print(f"  {k}: {v:.4f}")
else:
    print("PPO demo skipped (set run_ppo = True to execute).")
    print("\nPPOTrader API summary:")
    print("  ppo = PPOTrader(env, learning_rate=3e-4, n_steps=2048)")
    print("  ppo.train(total_timesteps=100_000)")
    print("  action = ppo.predict(obs, deterministic=True)")
    print("  results = ppo.evaluate(env, n_episodes=10)")

## Summary

| Module | Class / Function | Purpose |
|--------|-----------------|--------|
| `puffin.deep` | `FeedforwardNet`, `TradingFFN` | Feedforward NN for return prediction |
| `puffin.deep` | `EarlyStopping`, `LRScheduler`, `training_loop` | Reusable training utilities |
| `puffin.deep` | `Conv1DNet`, `TradingCNN` | 1D CNN for sequential pattern detection |
| `puffin.deep` | `TradingLSTM`, `MultivariateLSTM` | LSTM for univariate / multivariate time series |
| `puffin.deep` | `Autoencoder`, `DenoisingAutoencoder`, `VAE` | Latent representations and generation |
| `puffin.deep` | `GAN`, `TimeGAN`, `SyntheticDataEvaluator` | Synthetic data generation and evaluation |
| `puffin.rl` | `QLearningAgent` | Tabular Q-learning |
| `puffin.rl` | `DQNAgent`, `DDQNAgent` | Deep Q-Network agents |
| `puffin.rl` | `TradingEnvironment` | Gymnasium trading environment |
| `puffin.rl` | `PPOTrader`, `evaluate_agent` | PPO agent and evaluation toolkit |