# Futures Trading Models: Deep Learning & Reinforcement Learning

This notebook demonstrates training AI models for futures trading using **exactly the same data processing** as DolphinDB to ensure consistency between research and production environments.

## Key Innovation: Unified Data Processing
- ✅ **Same feature engineering**: Use DolphinDB for both training data prep and production inference
- ✅ **No duplication**: Single source of truth for all technical indicators
- ✅ **Production consistency**: What you train is what you deploy


In [26]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib
matplotlib.use('Agg')  # Non-interactive backend for server environments
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# PyTorch imports
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import torch.nn.functional as F

# DolphinDB import for consistent data processing
import dolphindb as ddb

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Set plot style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("Libraries imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")


Libraries imported successfully!
PyTorch version: 2.8.0+cpu
CUDA available: False
Using device: cpu


## 1. Data Loading Using DolphinDB (Same as Production)

**Critical**: We use DolphinDB for data processing to ensure 100% consistency with production environment.


In [27]:
# Connect to DolphinDB for consistent data processing
session = ddb.session()

# Try to connect to DolphinDB
connection_attempts = [
    {'host': 'localhost', 'port': 8848}
]

connected = False
for attempt in connection_attempts:
    try:
        print(f"Trying {attempt['host']}:{attempt['port']}...")
        session.connect(attempt['host'], attempt['port'])
        print(f"✓ Connected to DolphinDB at {attempt['host']}:{attempt['port']}")
        connected = True
        break
    except Exception as e:
        print(f"✗ Connection failed: {e}")
        continue

if not connected:
    print("\\n⚠️ DolphinDB not available - falling back to pandas processing")
    use_ddb = False
else:
    try:
        session.login("admin", "123456")
        print("✓ Logged in to DolphinDB")
        use_ddb = True
    except Exception as e:
        print(f"Login failed: {e}")
        use_ddb = True  # Continue without login


Trying localhost:8848...
✓ Connected to DolphinDB at localhost:8848
✓ Logged in to DolphinDB


In [28]:
# Load and process data using EXACTLY the same method as final_prediction.dos
if use_ddb:
    print("🔄 Processing data with DolphinDB (same as production)...")
    
    # Use EXACTLY the same code as final_prediction.dos and return data in one call
    df_processed = session.run('''
    try {
        loadPlugin("parquet")
        print("✓ parquet loaded")
    } catch(ex) {
        print("✗ parquet already loaded")
    }
    
    sampleData = parquet::loadParquet("sample_M_Y_IH_IF")
    sampleData = select * from sampleData where order_book_id in ['IH2303','IF2303']
    
    // Feature engineering - EXACTLY same as production
    cleanData = select *,
        mavg(close, 5) as sma_5,
        mavg(close, 20) as sma_20,
        close - prev(close) as price_change,
        mstd(close, 10) as volatility
    from sampleData context by order_book_id
    
    // Remove NULL rows
    cleanData = select * from cleanData where not isNull(sma_5) and not isNull(sma_20) and not isNull(volatility)
    
    // Return the cleaned data
    cleanData
    ''')
    print(f"✓ Processed {len(df_processed)} records using DolphinDB")
    
else:
    print("📁 Fallback: Loading data with pandas...")
    df_processed = pd.read_parquet('data/sample_M_Y_IH_IF.parquet')
    print(f"Loaded {len(df_processed)} records with pandas")

# Display basic info
print(f"\\nData shape: {df_processed.shape}")
print(f"Columns: {list(df_processed.columns)}")
if 'order_book_id' in df_processed.columns:
    print(f"Symbols: {df_processed['order_book_id'].unique()}")
elif 'symbol' in df_processed.columns:
    print(f"Symbols: {df_processed['symbol'].unique()}")

df_processed.head()


🔄 Processing data with DolphinDB (same as production)...
✗ parquet already loaded
✓ Processed 7192 records using DolphinDB
\nData shape: (7192, 17)
Columns: ['order_book_id', 'datetime', 'trading_date', 'open_interest', 'open', 'volume', 'total_turnover', 'low', 'close', 'high', 'parent_order_book_id', 'UTC_datetime', 'maturity_month', 'sma_5', 'sma_20', 'price_change', 'volatility']
Symbols: ['IF2303' 'IH2303']


Unnamed: 0,order_book_id,datetime,trading_date,open_interest,open,volume,total_turnover,low,close,high,parent_order_book_id,UTC_datetime,maturity_month,sma_5,sma_20,price_change,volatility
0,IF2303,2023-01-03 09:35:00,2023-01-03,72064.0,3884.4,149.0,173548440.0,3881.0,3881.6,3884.4,IF,2023-01-03 01:35:00,2023-03-01,3880.92,,-2.8,
1,IF2303,2023-01-03 09:36:00,2023-01-03,72013.0,3881.6,142.0,165279480.0,3878.0,3878.2,3881.6,IF,2023-01-03 01:36:00,2023-03-01,3881.28,,-3.4,
2,IF2303,2023-01-03 09:37:00,2023-01-03,71917.0,3878.2,254.0,295034760.0,3869.4,3870.2,3878.2,IF,2023-01-03 01:37:00,2023-03-01,3879.44,,-8.0,
3,IF2303,2023-01-03 09:38:00,2023-01-03,71848.0,3870.6,273.0,316679400.0,3863.4,3865.6,3870.6,IF,2023-01-03 01:38:00,2023-03-01,3876.0,,-4.6,
4,IF2303,2023-01-03 09:39:00,2023-01-03,71854.0,3865.0,211.0,244541940.0,3861.8,3861.8,3865.0,IF,2023-01-03 01:39:00,2023-03-01,3871.48,,-3.8,


## 2. Dataset Preparation for Model Training

Now we'll create PyTorch datasets using the cleaned data from DolphinDB.


In [29]:
# Dataset class for LSTM training - NO DATA CLEANING (already done in DolphinDB)
class FuturesDataset(Dataset):
    def __init__(self, data, sequence_length=20):
        self.sequences = []
        self.targets = []
        self.sequence_length = sequence_length
        
        # Use ALL processed features from DolphinDB (already cleaned and ready)
        feature_cols = ['open', 'high', 'low', 'close', 'volume']
        
        # Add technical indicators if available (from DolphinDB processing)
        if 'sma_5' in data.columns:
            feature_cols.extend(['sma_5', 'sma_20', 'volatility'])
        
        symbol_col = 'order_book_id' if 'order_book_id' in data.columns else 'symbol'
        
        for symbol in data[symbol_col].unique():
            symbol_data = data[data[symbol_col] == symbol].reset_index(drop=True)
            
            # Get available features - NO NULL CHECKING (DolphinDB already cleaned)
            available_cols = [col for col in feature_cols if col in symbol_data.columns]
            features = symbol_data[available_cols].values.astype(np.float32)
            
            # Create sequences directly - NO NORMALIZATION (rely on DolphinDB preprocessing)
            for i in range(len(features) - sequence_length):
                window = features[i:i + sequence_length]
                
                # Target: next close price (raw value - no normalization)
                close_idx = available_cols.index('close')
                next_close = features[i + sequence_length, close_idx]
                
                self.sequences.append(window)
                self.targets.append(next_close)
        
        print(f"Created {len(self.sequences)} training sequences")
        print(f"Feature dimensions: {len(available_cols)} features")
        print(f"Available features: {available_cols}")
        print("✅ Using raw DolphinDB features - NO Python normalization")
        
        self.feature_count = len(available_cols)
    
    def __len__(self):
        return len(self.sequences)
    
    def __getitem__(self, idx):
        return torch.FloatTensor(self.sequences[idx]), torch.FloatTensor([self.targets[idx]])

# Create dataset from DolphinDB processed data
dataset = FuturesDataset(df_processed)

# Train/validation split
train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [train_size, val_size])

# Data loaders
batch_size = 64
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

print(f"\\nTraining set: {len(train_dataset)} samples")
print(f"Validation set: {len(val_dataset)} samples")
print(f"Batch size: {batch_size}")


Created 7152 training sequences
Feature dimensions: 8 features
Available features: ['open', 'high', 'low', 'close', 'volume', 'sma_5', 'sma_20', 'volatility']
✅ Using raw DolphinDB features - NO Python normalization
\nTraining set: 5721 samples
Validation set: 1431 samples
Batch size: 64


## 3. LSTM Model Definition and Training

Define and train the LSTM model for price prediction.


In [30]:
# LSTM Model for price prediction
class LSTMPricePredictor(nn.Module):
    def __init__(self, input_size, hidden_size=64, num_layers=2, dropout=0.2):
        super().__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, 
                           batch_first=True, dropout=dropout if num_layers > 1 else 0)
        self.fc1 = nn.Linear(hidden_size, hidden_size // 2)
        self.fc2 = nn.Linear(hidden_size // 2, 1)
        self.dropout = nn.Dropout(dropout)
        self.relu = nn.ReLU()
    
    def forward(self, x):
        lstm_out, _ = self.lstm(x)
        last_output = lstm_out[:, -1, :]
        out = self.dropout(self.relu(self.fc1(last_output)))
        out = self.fc2(out)
        return out

# Initialize LSTM model
input_size = dataset.feature_count
lstm_model = LSTMPricePredictor(input_size=input_size).to(device)

# Training setup
optimizer = optim.Adam(lstm_model.parameters(), lr=0.001, weight_decay=1e-5)
criterion = nn.MSELoss()
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, factor=0.5)

print(f"LSTM Model initialized:")
print(f"- Input size: {input_size} features")
print(f"- Hidden size: 64")
print(f"- Device: {device}")
print(f"- Parameters: {sum(p.numel() for p in lstm_model.parameters())}")


LSTM Model initialized:
- Input size: 8 features
- Hidden size: 64
- Device: cpu
- Parameters: 54337


In [31]:
pre_set_number = 10
global pre_set_number

In [32]:
# LSTM Training with progress tracking
train_losses = []
val_losses = []
epochs = pre_set_number

print("🚀 Starting LSTM training...")
print(f"Epochs: {epochs}, Batch size: {batch_size}, Learning rate: 0.001")

for epoch in range(epochs):
    # Training phase
    lstm_model.train()
    epoch_train_loss = 0
    
    for batch_x, batch_y in train_loader:
        batch_x, batch_y = batch_x.to(device), batch_y.to(device)
        
        optimizer.zero_grad()
        pred = lstm_model(batch_x)
        loss = criterion(pred, batch_y)
        
        # Check for NaN/inf values in loss
        if torch.isnan(loss) or torch.isinf(loss):
            print(f"Warning: Invalid loss detected: {loss.item()}, skipping batch...")
            continue
            
        loss.backward()
        
        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(lstm_model.parameters(), max_norm=1.0)
        
        optimizer.step()
        epoch_train_loss += loss.item()
    
    avg_train_loss = epoch_train_loss / len(train_loader)
    train_losses.append(avg_train_loss)
    
    # Validation phase
    lstm_model.eval()
    epoch_val_loss = 0
    
    with torch.no_grad():
        for batch_x, batch_y in val_loader:
            batch_x, batch_y = batch_x.to(device), batch_y.to(device)
            pred = lstm_model(batch_x)
            loss = criterion(pred, batch_y)
            epoch_val_loss += loss.item()
    
    avg_val_loss = epoch_val_loss / len(val_loader)
    val_losses.append(avg_val_loss)
    
    # Learning rate scheduling
    scheduler.step(avg_val_loss)
    
    # Print progress
    if epoch % 5 == 0 or epoch == epochs - 1:
        print(f"Epoch {epoch+1:2d}/{epochs}: Train Loss: {avg_train_loss:.6f}, Val Loss: {avg_val_loss:.6f}")

print("✅ LSTM training completed!")

# Store final metrics for model info
final_train_loss = train_losses[-1] if train_losses else 0.0
final_val_loss = val_losses[-1] if val_losses else 0.0
min_val_loss = min(val_losses) if val_losses else 0.0

print(f"Final training loss: {final_train_loss:.6f}")
print(f"Final validation loss: {final_val_loss:.6f}")
print(f"Best validation loss: {min_val_loss:.6f}")


🚀 Starting LSTM training...
Epochs: 10, Batch size: 64, Learning rate: 0.001
Epoch  1/10: Train Loss: 9059563.100000, Val Loss: nan
Epoch  6/10: Train Loss: 8457240.666667, Val Loss: nan
Epoch 10/10: Train Loss: 7878577.411111, Val Loss: nan
✅ LSTM training completed!
Final training loss: 7878577.411111
Final validation loss: nan
Best validation loss: nan


In [33]:
# Plot LSTM training progress
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Loss curves
axes[0].plot(train_losses, label='Training Loss', color='blue', linewidth=2)
axes[0].plot(val_losses, label='Validation Loss', color='red', linewidth=2)
axes[0].set_title('LSTM Training Progress')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('MSE Loss')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
axes[0].set_yscale('log')  # Log scale for better visualization

# Learning curve analysis
epochs_range = range(1, len(train_losses) + 1)
axes[1].plot(epochs_range, train_losses, 'o-', label='Training Loss', color='blue', markersize=4)
axes[1].plot(epochs_range, val_losses, 'o-', label='Validation Loss', color='red', markersize=4)
axes[1].set_title('Learning Curve Analysis')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('MSE Loss')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Add performance metrics
final_train_loss = train_losses[-1]
final_val_loss = val_losses[-1]
min_val_loss = min(val_losses)
best_epoch = val_losses.index(min_val_loss) + 1

axes[1].axhline(y=min_val_loss, color='green', linestyle='--', alpha=0.7, label=f'Best Val Loss: {min_val_loss:.6f}')
axes[1].axvline(x=best_epoch, color='green', linestyle='--', alpha=0.7)

plt.tight_layout()
plt.show()

# Print training summary
print("📊 LSTM Training Summary:")
print(f"   Final Training Loss: {final_train_loss:.6f}")
print(f"   Final Validation Loss: {final_val_loss:.6f}")
print(f"   Best Validation Loss: {min_val_loss:.6f} (Epoch {best_epoch})")
print(f"   Overfitting Check: {'⚠️ Possible overfitting' if final_val_loss > min_val_loss * 1.1 else '✅ Good generalization'}")


📊 LSTM Training Summary:
   Final Training Loss: 7878577.411111
   Final Validation Loss: nan
   Best Validation Loss: nan (Epoch 1)
   Overfitting Check: ✅ Good generalization


## 4. DQN Reinforcement Learning Models

Now we'll create and train the DQN models for trading decisions.


In [34]:
# Trading Environment for DQN training
class TradingEnvironment:
    def __init__(self, data, lookback_window=20):
        self.data = data
        self.lookback_window = lookback_window
        self.feature_cols = ['open', 'high', 'low', 'close', 'volume']
        if 'sma_5' in data.columns:
            self.feature_cols.extend(['sma_5', 'sma_20', 'volatility'])
        self.reset()
    
    def reset(self):
        self.current_step = self.lookback_window
        self.balance = 100000.0
        self.position = 0  # -1: short, 0: neutral, 1: long
        self.entry_price = 0.0
        return self._get_state()
    
    def _get_state(self):
        # Get current market state - NO NORMALIZATION (DolphinDB already processed)
        start_idx = self.current_step - self.lookback_window
        end_idx = self.current_step
        
        window_data = self.data.iloc[start_idx:end_idx]
        features = window_data[self.feature_cols].values.astype(np.float32)
        
        # Use raw features directly - NO Python normalization
        features_flat = features.flatten()
        
        # Add trading state features (raw values - let DolphinDB handle scaling)
        state = np.concatenate([
            features_flat,  # raw market features from DolphinDB
            [self.position],  # raw position: -1, 0, 1
            [self.balance],     # raw balance
            [self.entry_price]  # raw entry price
        ])
        
        return state.astype(np.float32)
    
    def step(self, action):
        current_price = self.data.iloc[self.current_step]['close']
        reward = 0.0
        
        # Execute action: 0=hold, 1=buy, 2=sell
        if action == 1 and self.position <= 0:  # Buy
            if self.position == -1:  # Close short position
                reward = (self.entry_price - current_price) / self.entry_price * 100
                self.balance += reward
            self.position = 1
            self.entry_price = current_price
            
        elif action == 2 and self.position >= 0:  # Sell
            if self.position == 1:  # Close long position
                reward = (current_price - self.entry_price) / self.entry_price * 100
                self.balance += reward
            self.position = -1
            self.entry_price = current_price
        
        # Move to next step
        self.current_step += 1
        done = self.current_step >= len(self.data) - 1
        
        # Small penalty for holding (encourage trading)
        if action == 0:
            reward -= 0.01
        
        next_state = self._get_state() if not done else None
        
        return next_state, reward, done

# Test environment setup
symbol_col = 'order_book_id' if 'order_book_id' in df_processed.columns else 'symbol'
first_symbol = df_processed[symbol_col].unique()[0]
symbol_data = df_processed[df_processed[symbol_col] == first_symbol].reset_index(drop=True)

env = TradingEnvironment(symbol_data)
state = env.reset()
state_size = len(state)

print(f"Trading Environment initialized:")
print(f"- State size: {state_size}")
print(f"- Feature columns: {env.feature_cols}")
print(f"- Sample state shape: {state.shape}")


Trading Environment initialized:
- State size: 163
- Feature columns: ['open', 'high', 'low', 'close', 'volume', 'sma_5', 'sma_20', 'volatility']
- Sample state shape: (163,)


In [35]:
# DQN Model Definitions
class DQNModel(nn.Module):
    """Deep Q-Network for trading decisions."""
    
    def __init__(self, state_size, action_size=3, hidden_size=256):
        super(DQNModel, self).__init__()
        
        self.fc1 = nn.Linear(state_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, hidden_size // 2)
        self.fc4 = nn.Linear(hidden_size // 2, action_size)
        
        self.dropout = nn.Dropout(0.2)
        self.relu = nn.ReLU()
    
    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.relu(self.fc3(x))
        x = self.fc4(x)
        return x

class DQNActionPredictor(nn.Module):
    """DQN model that directly outputs action predictions (argmax)."""
    
    def __init__(self, state_size, hidden_size=256):
        super(DQNActionPredictor, self).__init__()
        
        self.fc1 = nn.Linear(state_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, hidden_size // 2)
        self.fc4 = nn.Linear(hidden_size // 2, 3)  # Q-values for 3 actions
        
        self.dropout = nn.Dropout(0.2)
        self.relu = nn.ReLU()
    
    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.relu(self.fc3(x))
        q_values = self.fc4(x)
        
        # Return argmax as action prediction
        actions = torch.argmax(q_values, dim=-1, keepdim=True).float()
        return actions

# Initialize DQN models
dqn_model = DQNModel(state_size=state_size).to(device)
dqn_action_predictor = DQNActionPredictor(state_size=state_size).to(device)

print(f"DQN Models initialized:")
print(f"- DQN model parameters: {sum(p.numel() for p in dqn_model.parameters())}")
print(f"- Action predictor parameters: {sum(p.numel() for p in dqn_action_predictor.parameters())}")
print(f"- State size: {state_size}")


DQN Models initialized:
- DQN model parameters: 141059
- Action predictor parameters: 141059
- State size: 163


In [36]:
# Fixed DQN Training (with proper error handling)
print("🔧 Running FIXED DQN training...")

# Training setup
optimizer_dqn = optim.Adam(dqn_model.parameters(), lr=0.001)
optimizer_action = optim.Adam(dqn_action_predictor.parameters(), lr=0.001)

# Generate training data through environment interaction
states = []
actions = []
rewards = []
episode_rewards = []

episodes = 20

print("Generating training data through environment interaction...")
for episode in range(episodes):
    state = env.reset()
    total_reward = 0
    
    while env.current_step < len(env.data) - 1:
        # Epsilon-greedy action selection
        if np.random.random() < 0.3:  # 30% random actions for exploration
            action = np.random.randint(0, 3)
        else:
            with torch.no_grad():
                q_values = dqn_model(torch.FloatTensor(state).unsqueeze(0).to(device))
                action = q_values.argmax().item()
        
        next_state, reward, done = env.step(action)
        
        states.append(state)
        actions.append(action)
        rewards.append(reward)
        total_reward += reward
        
        if done:
            break
            
        state = next_state
    
    episode_rewards.append(total_reward)
    
    # Print progress
    if episode % 20 == 0:
        print(f"Episode {episode}: Total Reward = {total_reward:.2f}")

print(f"Generated {len(states)} training samples from {episodes} episodes")

# Convert to tensors and validate data
if len(states) > 0:
    states_array = np.array(states)
    actions_array = np.array(actions)
    rewards_array = np.array(rewards)
    
    # Validate and filter data
    valid_mask = (
        ~np.isnan(states_array).any(axis=1) & 
        ~np.isinf(states_array).any(axis=1) &
        (actions_array >= 0) & 
        (actions_array <= 2) &  # Ensure actions are in valid range [0, 1, 2]
        ~np.isnan(rewards_array) &
        ~np.isinf(rewards_array)
    )
    
    print(f"Data validation: {valid_mask.sum()}/{len(valid_mask)} valid samples")
    
    # Apply filter
    states_clean = states_array[valid_mask]
    actions_clean = actions_array[valid_mask]
    rewards_clean = rewards_array[valid_mask]
    
    if len(states_clean) == 0:
        print("❌ No valid training data after filtering!")
    else:
        # Convert to tensors
        states_tensor = torch.FloatTensor(states_clean).to(device)
        actions_tensor = torch.LongTensor(actions_clean).to(device)
        rewards_tensor = torch.FloatTensor(rewards_clean).to(device)
        
        # Debug: Check tensor shapes and ranges
        print(f"States tensor shape: {states_tensor.shape}")
        print(f"Actions tensor shape: {actions_tensor.shape}, range: [{actions_tensor.min()}, {actions_tensor.max()}]")
        print(f"Rewards tensor shape: {rewards_tensor.shape}")
        
        print("Training DQN neural networks...")
        
        
        # Train DQN model (Q-values) - This part works fine
        dqn_losses = []
        for epoch in range(pre_set_number):
            optimizer_dqn.zero_grad()
            q_values = dqn_model(states_tensor)
            q_values_selected = q_values.gather(1, actions_tensor.unsqueeze(1))
            loss = F.mse_loss(q_values_selected.squeeze(), rewards_tensor)
            
            # Check for invalid loss
            if torch.isnan(loss) or torch.isinf(loss):
                print(f"Warning: Invalid DQN loss at epoch {epoch}, skipping...")
                continue
                
            loss.backward()
            torch.nn.utils.clip_grad_norm_(dqn_model.parameters(), max_norm=1.0)
            optimizer_dqn.step()
            dqn_losses.append(loss.item())
            
            if epoch % 10 == 0:
                print(f"DQN Epoch {epoch}, Loss: {loss.item():.6f}")
        
        # Train action predictor with FIXED CrossEntropyLoss
        action_losses = []
        for epoch in range(pre_set_number):
            optimizer_action.zero_grad()
            predicted_actions = dqn_action_predictor(states_tensor)
            
            # DEBUG: Check output dimensions
            if epoch == 0:
                print(f"Predicted actions shape: {predicted_actions.shape}")
                print(f"Actions tensor shape: {actions_tensor.shape}")
                print(f"Predicted actions range: [{predicted_actions.min():.3f}, {predicted_actions.max():.3f}]")
            
            # Use CrossEntropyLoss for classification (3 classes: 0, 1, 2)
            loss = F.cross_entropy(predicted_actions, actions_tensor)
            
            # Check for invalid loss
            if torch.isnan(loss) or torch.isinf(loss):
                print(f"Warning: Invalid Action loss at epoch {epoch}, skipping...")
                continue
                
            loss.backward()
            torch.nn.utils.clip_grad_norm_(dqn_action_predictor.parameters(), max_norm=1.0)
            optimizer_action.step()
            action_losses.append(loss.item())
            
            if epoch % 10 == 0:
                print(f"Action Predictor Epoch {epoch}, Loss: {loss.item():.6f}")
        
        print("✅ DQN training completed successfully!")
        
        # Plot DQN training results
        if dqn_losses and action_losses:
            plt.figure(figsize=(12, 5))
            
            plt.subplot(1, 2, 1)
            plt.plot(dqn_losses, 'b-', linewidth=2)
            plt.title('DQN Q-Value Training Loss')
            plt.xlabel('Epoch')
            plt.ylabel('MSE Loss')
            plt.grid(True, alpha=0.3)
            
            plt.subplot(1, 2, 2)
            plt.plot(action_losses, 'r-', linewidth=2)
            plt.title('DQN Action Predictor Loss')
            plt.xlabel('Epoch')
            plt.ylabel('CrossEntropy Loss')
            plt.grid(True, alpha=0.3)
            
            plt.tight_layout()
            plt.show()
            
            print(f"📊 DQN Training Summary:")
            print(f"   Final DQN Loss: {dqn_losses[-1]:.6f}")
            print(f"   Final Action Loss: {action_losses[-1]:.6f}")
else:
    print("❌ No training data available!")


🔧 Running FIXED DQN training...
Generating training data through environment interaction...
Episode 0: Total Reward = -5.44
Generated 35750 training samples from 10 episodes
Data validation: 35600/35750 valid samples
States tensor shape: torch.Size([35600, 163])
Actions tensor shape: torch.Size([35600]), range: [0, 2]
Rewards tensor shape: torch.Size([35600])
Training DQN neural networks...
DQN Epoch 0, Loss: 162809.187500
Predicted actions shape: torch.Size([35600, 1])
Actions tensor shape: torch.Size([35600])
Predicted actions range: [0.000, 2.000]


IndexError: Target 2 is out of bounds.

In [37]:
# DQN Training
print("🚀 Training DQN models...")

# Training setup
optimizer_dqn = optim.Adam(dqn_model.parameters(), lr=0.001)
optimizer_action = optim.Adam(dqn_action_predictor.parameters(), lr=0.001)

# Generate training data through environment interaction
states = []
actions = []
rewards = []
episode_rewards = []

episodes = pre_set_number

for episode in range(episodes):
    state = env.reset()
    total_reward = 0
    
    while env.current_step < len(env.data) - 1:
        # Epsilon-greedy action selection
        if np.random.random() < 0.3:  # 30% random actions for exploration
            action = np.random.randint(0, 3)
        else:
            with torch.no_grad():
                q_values = dqn_model(torch.FloatTensor(state).unsqueeze(0).to(device))
                action = q_values.argmax().item()
        
        next_state, reward, done = env.step(action)
        
        states.append(state)
        actions.append(action)
        rewards.append(reward)
        
        if done:
            break
        
        state = next_state
        total_reward += reward
    
    episode_rewards.append(total_reward)
    
    if episode % 20 == 0:
        print(f"Episode {episode}, Total Reward: {total_reward:.2f}")

print(f"DQN environment training completed! Generated {len(states)} training samples")

# Convert to tensors and train neural networks
if len(states) > 0:
    states_tensor = torch.FloatTensor(np.array(states)).to(device)
    actions_tensor = torch.LongTensor(actions).to(device)
    rewards_tensor = torch.FloatTensor(rewards).to(device)
    
    print("Training DQN neural networks...")
    
    # Train DQN model (Q-values)
    dqn_losses = []
    for epoch in range(pre_set_number):
        optimizer_dqn.zero_grad()
        q_values = dqn_model(states_tensor)
        q_values_selected = q_values.gather(1, actions_tensor.unsqueeze(1))
        loss = F.mse_loss(q_values_selected.squeeze(), rewards_tensor)
        loss.backward()
        optimizer_dqn.step()
        dqn_losses.append(loss.item())
        
        if epoch % 10 == 0:
            print(f"DQN Epoch {epoch}, Loss: {loss.item():.6f}")
    
    # Train action predictor
    action_losses = []
    for epoch in range(pre_set_number):
        optimizer_action.zero_grad()
        predicted_actions = dqn_action_predictor(states_tensor)
        loss = F.cross_entropy(predicted_actions, actions_tensor)
        loss.backward()
        optimizer_action.step()
        action_losses.append(loss.item())
        
        if epoch % 10 == 0:
            print(f"Action Predictor Epoch {epoch}, Loss: {loss.item():.6f}")

print("✅ DQN training completed!")


🚀 Training DQN models...
Episode 0, Total Reward: -4.36
DQN environment training completed! Generated 35750 training samples
Training DQN neural networks...
DQN Epoch 0, Loss: nan


IndexError: Target 2 is out of bounds.

In [38]:
# Plot DQN training results
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('DQN Training Results', fontsize=16)

# Episode rewards
axes[0, 0].plot(episode_rewards, color='blue', alpha=0.7)
axes[0, 0].set_title('Episode Rewards')
axes[0, 0].set_xlabel('Episode')
axes[0, 0].set_ylabel('Total Reward')
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].axhline(y=0, color='red', linestyle='--', alpha=0.5)

# Reward distribution
axes[0, 1].hist(episode_rewards, bins=20, alpha=0.7, color='blue', edgecolor='black')
axes[0, 1].set_title('Reward Distribution')
axes[0, 1].set_xlabel('Total Reward')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].grid(True, alpha=0.3)
axes[0, 1].axvline(x=np.mean(episode_rewards), color='red', linestyle='--', 
                   label=f'Mean: {np.mean(episode_rewards):.2f}')
axes[0, 1].legend()

# DQN loss
if 'dqn_losses' in locals():
    axes[1, 0].plot(dqn_losses, color='red', linewidth=2)
    axes[1, 0].set_title('DQN Training Loss')
    axes[1, 0].set_xlabel('Epoch')
    axes[1, 0].set_ylabel('MSE Loss')
    axes[1, 0].grid(True, alpha=0.3)
    axes[1, 0].set_yscale('log')

# Action predictor loss
if 'action_losses' in locals():
    axes[1, 1].plot(action_losses, color='green', linewidth=2)
    axes[1, 1].set_title('Action Predictor Loss')
    axes[1, 1].set_xlabel('Epoch')
    axes[1, 1].set_ylabel('MSE Loss')
    axes[1, 1].grid(True, alpha=0.3)
    axes[1, 1].set_yscale('log')

plt.tight_layout()
plt.show()

# Print DQN training summary
print("📊 DQN Training Summary:")
print(f"   Episodes: {episodes}")
print(f"   Training samples generated: {len(states)}")
print(f"   Average episode reward: {np.mean(episode_rewards):.2f}")
print(f"   Best episode reward: {max(episode_rewards):.2f}")
print(f"   Worst episode reward: {min(episode_rewards):.2f}")
if 'dqn_losses' in locals():
    print(f"   Final DQN loss: {dqn_losses[-1]:.6f}")
if 'action_losses' in locals():
    print(f"   Final Action Predictor loss: {action_losses[-1]:.6f}")


📊 DQN Training Summary:
   Episodes: 10
   Training samples generated: 35750
   Average episode reward: -7.82
   Best episode reward: -3.71
   Worst episode reward: -12.97
   Final DQN loss: nan


IndexError: list index out of range

## 5. Model Export for DolphinDB Deployment

Export all trained models to TorchScript format for production deployment.


In [43]:
print("📦 Exporting models to TorchScript format...")

# Set models to evaluation mode
lstm_model.eval()
dqn_model.eval()
dqn_action_predictor.eval()

# Export LSTM model
lstm_input = torch.randn(1, 20, input_size).to(device)
with torch.no_grad():
    traced_lstm = torch.jit.trace(lstm_model, lstm_input)
    traced_lstm.save('lstm_price_predictor.pth')
    
    # Test the traced model
    original_output = lstm_model(lstm_input)
    traced_output = traced_lstm(lstm_input)
    print(f"✅ LSTM model exported - Output difference: {abs(original_output.item() - traced_output.item()):.8f}")

# Export DQN models
dqn_input = torch.randn(1, state_size).to(device)
with torch.no_grad():
    # Export main DQN model
    traced_dqn = torch.jit.trace(dqn_model, dqn_input)
    traced_dqn.save('dqn_trading_agent.pth')
    
    # Export DQN action predictor
    traced_dqn_action = torch.jit.trace(dqn_action_predictor, dqn_input)
    traced_dqn_action.save('dqn_action_predictor.pth')
    
    # Test the traced models
    original_dqn_output = dqn_model(dqn_input)
    traced_dqn_output = traced_dqn(dqn_input)
    print(f"✅ DQN model exported - Max output difference: {(original_dqn_output - traced_dqn_output).abs().max().item():.8f}")
    
    original_action_output = dqn_action_predictor(dqn_input)
    traced_action_output = traced_dqn_action(dqn_input)
    print(f"✅ DQN action predictor exported - Output difference: {abs(original_action_output.item() - traced_action_output.item()):.8f}")

# Create model info summary
model_info = {
    'training_date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
    'lstm_model': {
        'input_shape': [20, input_size],
        'features': input_size,
        'final_train_loss': final_train_loss,
        'final_val_loss': final_val_loss,
        'best_val_loss': min_val_loss,
        'epochs_trained': epochs
    },
    'dqn_models': {
        'state_size': state_size,
        'actions': ['hold', 'buy', 'sell'],
        'episodes_trained': episodes,
        'avg_episode_reward': np.mean(episode_rewards),
        'training_samples': len(states)
    },
    'data_processing': 'DolphinDB (consistent with production)',
    'feature_columns': env.feature_cols
}

import json
with open('demo_pytorch/model_training_info.json', 'w') as f:
    json.dump(model_info, f, indent=2)

print("\\n" + "="*60)
print("🎉 MODELS SUCCESSFULLY EXPORTED FOR DOLPHINDB!")
print("="*60)
print("Files created:")
print("1. demo_pytorch/lstm_price_predictor.pth - LSTM model (TorchScript)")
print("2. demo_pytorch/dqn_trading_agent.pth - DQN Q-values model (TorchScript)")
print("3. demo_pytorch/dqn_action_predictor.pth - DQN action predictor (TorchScript)")
print("4. demo_pytorch/model_training_info.json - Training metrics and info")
print(f"\\n🔧 Model Specifications:")
print(f"   LSTM Input: [1, 20, {input_size}] - 20 time steps, {input_size} features")
print(f"   DQN Input: [1, {state_size}] - Flattened market state + trading state")
print(f"   Data Processing: IDENTICAL to production DolphinDB pipeline")
print("\\n🚀 Ready for deployment in DolphinDB with LibTorch plugin!")
print("="*60)


📦 Exporting models to TorchScript format...
✅ LSTM model exported - Output difference: 0.00000000
✅ DQN model exported - Max output difference: nan
✅ DQN action predictor exported - Output difference: 0.00000000


FileNotFoundError: [Errno 2] No such file or directory: 'demo_pytorch/model_training_info.json'

In [None]:
!cp /home/uat/dolphindb_dev/demo_pytorch/*.pth /home/uat/ddb/server/  