# Chapter 3: Model Training and Optimization
# 第三章：模型训练与优化

**Author**: Microsoft Qlib Team  
**License**: MIT License  
**Last Updated**: 2025-01-09

---

## 📚 Table of Contents / 目录

### Part 1: Model Fundamentals / 模型基础
1. [Model Interface and Architecture / 模型接口与架构](#model-interface)
2. [Dataset Preparation / 数据集准备](#dataset-prep)
3. [Training Pipeline / 训练管道](#training-pipeline)

### Part 2: Tree-Based Models / 树模型
4. [LightGBM Models / LightGBM模型](#lightgbm)
5. [XGBoost Models / XGBoost模型](#xgboost)
6. [CatBoost Models / CatBoost模型](#catboost)
7. [Tree Model Comparison / 树模型对比](#tree-comparison)

### Part 3: Deep Learning Models / 深度学习模型
8. [Neural Network Basics / 神经网络基础](#nn-basics)
9. [MLP for Stock Prediction / 用于股票预测的MLP](#mlp)
10. [LSTM and GRU Models / LSTM和GRU模型](#lstm-gru)
11. [Transformer Models / Transformer模型](#transformer)
12. [Graph Neural Networks / 图神经网络](#gnn)

### Part 4: Model Optimization / 模型优化
13. [Hyperparameter Tuning / 超参数调优](#hyperparameter)
14. [Feature Selection / 特征选择](#feature-selection)
15. [Ensemble Methods / 集成方法](#ensemble)
16. [Cross-Validation Strategies / 交叉验证策略](#cross-validation)

### Part 5: Production Deployment / 生产部署
17. [Model Persistence / 模型持久化](#persistence)
18. [Online Learning / 在线学习](#online-learning)
19. [Model Monitoring / 模型监控](#monitoring)
20. [Best Practices / 最佳实践](#best-practices)

## Setup and Imports / 设置和导入

In [None]:
# Essential imports / 必要导入
import qlib
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import json
import pickle
import warnings
from typing import Dict, List, Tuple, Optional, Union
from datetime import datetime
import time
warnings.filterwarnings('ignore')

# Qlib imports / Qlib导入
from qlib.data import D
from qlib.config import C
from qlib.workflow import R
from qlib.utils import init_instance_by_config
from qlib.model.base import Model
from qlib.data.dataset import DatasetH, TSDatasetH

# Model imports / 模型导入
from qlib.contrib.model.gbdt import LGBModel
from qlib.contrib.model.xgboost import XGBModel
from qlib.contrib.model.catboost import CatBoostModel
from qlib.contrib.model.pytorch_nn import DNNModel
from qlib.contrib.model.pytorch_lstm import LSTMModel
from qlib.contrib.model.pytorch_gru import GRUModel

# Visualization settings / 可视化设置
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10
sns.set_palette("husl")

print("✅ Libraries imported successfully")

In [None]:
# Initialize Qlib / 初始化Qlib
qlib.init()

# Global configuration / 全局配置
MARKET = "csi300"  # 沪深300
BENCHMARK = "SH000300"  # 基准指数
EXP_NAME = "model_training_exp"  # 实验名称

# Time periods / 时间段
TRAIN_START = "2008-01-01"
TRAIN_END = "2014-12-31"
VALID_START = "2015-01-01"
VALID_END = "2016-12-31"
TEST_START = "2017-01-01"
TEST_END = "2020-08-01"

print(f"Market: {MARKET}")
print(f"Training: {TRAIN_START} to {TRAIN_END}")
print(f"Validation: {VALID_START} to {VALID_END}")
print(f"Testing: {TEST_START} to {TEST_END}")

## Part 1: Model Fundamentals / 模型基础

## 1. Model Interface and Architecture / 模型接口与架构 <a id='model-interface'></a>

### Qlib Model Architecture / Qlib模型架构

```
┌─────────────────────────────────────────────────────────────┐
│                    Qlib Model Interface                     │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Base Model Class                                         │
│   ├── fit(dataset)          # Training / 训练             │
│   ├── predict(dataset)      # Prediction / 预测           │
│   └── finetune(dataset)     # Fine-tuning / 微调         │
│                                                             │
│   Model Types / 模型类型:                                  │
│   ├── Tabular Models        # Traditional ML / 传统机器学习 │
│   ├── Time Series Models    # Sequential / 序列模型        │
│   └── Graph Models          # Relational / 关系模型        │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

In [None]:
# Custom model base class / 自定义模型基类

from qlib.model.base import Model
from qlib.data.dataset.handler import DataHandlerLP

class CustomModelBase(Model):
    """Custom base model class with enhanced features
    带增强功能的自定义基础模型类
    """
    
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.fitted = False
        self.training_history = []
        self.feature_importance = None
    
    def fit(self, dataset: DatasetH, **kwargs):
        """Enhanced fit method with logging
        带日志记录的增强训练方法
        """
        print(f"Starting training at {datetime.now()}")
        start_time = time.time()
        
        # Get data
        df_train, df_valid = self._prepare_data(dataset)
        
        # Actual training (to be implemented in subclasses)
        self._fit_model(df_train, df_valid, **kwargs)
        
        # Record training time
        training_time = time.time() - start_time
        self.training_history.append({
            'timestamp': datetime.now(),
            'training_time': training_time,
            'train_samples': len(df_train),
            'valid_samples': len(df_valid) if df_valid is not None else 0
        })
        
        self.fitted = True
        print(f"Training completed in {training_time:.2f} seconds")
    
    def predict(self, dataset: DatasetH, segment: str = "test"):
        """Enhanced predict method
        增强预测方法
        """
        if not self.fitted:
            raise ValueError("Model must be fitted before prediction")
        
        # Get test data
        df_test = dataset.prepare(segment, col_set=['feature', 'label'])
        
        # Make predictions (to be implemented in subclasses)
        predictions = self._predict_model(df_test)
        
        return predictions
    
    def _prepare_data(self, dataset: DatasetH):
        """Prepare training and validation data
        准备训练和验证数据
        """
        df_train = dataset.prepare("train", col_set=["feature", "label"])
        df_valid = dataset.prepare("valid", col_set=["feature", "label"]) if "valid" in dataset.segments else None
        return df_train, df_valid
    
    def _fit_model(self, df_train, df_valid, **kwargs):
        """To be implemented by subclasses
        由子类实现
        """
        raise NotImplementedError
    
    def _predict_model(self, df_test):
        """To be implemented by subclasses
        由子类实现
        """
        raise NotImplementedError
    
    def get_feature_importance(self):
        """Get feature importance if available
        获取特征重要性（如果可用）
        """
        return self.feature_importance

print("✅ Custom model base class defined")

## 2. Dataset Preparation / 数据集准备 <a id='dataset-prep'></a>

In [None]:
# Prepare comprehensive dataset / 准备综合数据集

from qlib.contrib.data.handler import Alpha158

# Create data handler / 创建数据处理器
handler_config = {
    "start_time": TRAIN_START,
    "end_time": TEST_END,
    "fit_start_time": TRAIN_START,
    "fit_end_time": TRAIN_END,
    "instruments": MARKET,
}

# Initialize Alpha158 handler / 初始化Alpha158处理器
handler = Alpha158(**handler_config)

# Create dataset with segments / 创建带分段的数据集
dataset_config = {
    "handler": handler,
    "segments": {
        "train": (TRAIN_START, TRAIN_END),
        "valid": (VALID_START, VALID_END),
        "test": (TEST_START, TEST_END),
    },
}

dataset = DatasetH(**dataset_config)

# Display dataset information / 显示数据集信息
print("Dataset segments:")
for segment, (start, end) in dataset.segments.items():
    df_segment = dataset.prepare(segment, col_set=['feature', 'label'], data_key=DataHandlerLP.DK_L)
    print(f"  {segment:10}: {start} to {end}, Shape: {df_segment.shape}")

# Sample data preview / 数据样本预览
df_sample = dataset.prepare("train", col_set=['feature', 'label'], data_key=DataHandlerLP.DK_L).head()
print(f"\nFeature columns: {df_sample.columns[:10].tolist()}...")
print(f"Total features: {len(df_sample.columns) - 1}")

## 3. Training Pipeline / 训练管道 <a id='training-pipeline'></a>

In [None]:
# Comprehensive training pipeline / 综合训练管道

class TrainingPipeline:
    """Complete training pipeline with experiment tracking
    带实验跟踪的完整训练管道
    """
    
    def __init__(self, experiment_name: str):
        self.experiment_name = experiment_name
        self.models = {}
        self.results = {}
        
    def add_model(self, name: str, model: Model):
        """Add a model to the pipeline
        向管道添加模型
        """
        self.models[name] = model
        print(f"Added model: {name}")
    
    def train_all(self, dataset: DatasetH, save_models: bool = True):
        """Train all models in the pipeline
        训练管道中的所有模型
        """
        print(f"\nStarting training pipeline with {len(self.models)} models")
        print("="*60)
        
        for name, model in self.models.items():
            print(f"\nTraining {name}...")
            
            # Start experiment
            with R.start(experiment_name=self.experiment_name):
                # Train model
                start_time = time.time()
                model.fit(dataset)
                training_time = time.time() - start_time
                
                # Make predictions
                pred_train = model.predict(dataset, segment="train")
                pred_valid = model.predict(dataset, segment="valid")
                pred_test = model.predict(dataset, segment="test")
                
                # Calculate metrics
                metrics = self._calculate_metrics(dataset, pred_train, pred_valid, pred_test)
                
                # Store results
                self.results[name] = {
                    'model': model,
                    'predictions': {
                        'train': pred_train,
                        'valid': pred_valid,
                        'test': pred_test
                    },
                    'metrics': metrics,
                    'training_time': training_time
                }
                
                # Save model if requested
                if save_models:
                    R.save_objects(
                        trained_model=model,
                        predictions=pred_test,
                        metrics=metrics
                    )
                
                print(f"  ✅ Training completed in {training_time:.2f}s")
                print(f"  Test IC: {metrics['test_ic']:.4f}")
                print(f"  Test Rank IC: {metrics['test_rank_ic']:.4f}")
        
        print("\n" + "="*60)
        print("Pipeline completed successfully!")
        return self.results
    
    def _calculate_metrics(self, dataset, pred_train, pred_valid, pred_test):
        """Calculate evaluation metrics
        计算评估指标
        """
        from scipy.stats import spearmanr
        
        metrics = {}
        
        # Get labels
        label_train = dataset.prepare("train", col_set=['label'], data_key=DataHandlerLP.DK_L)
        label_valid = dataset.prepare("valid", col_set=['label'], data_key=DataHandlerLP.DK_L)
        label_test = dataset.prepare("test", col_set=['label'], data_key=DataHandlerLP.DK_L)
        
        # Align predictions with labels
        def calc_ic(pred, label):
            df = pd.DataFrame({'pred': pred, 'label': label['label']}).dropna()
            if len(df) > 0:
                ic = df['pred'].corr(df['label'])
                rank_ic = spearmanr(df['pred'], df['label'])[0]
                return ic, rank_ic
            return 0, 0
        
        # Calculate metrics for each segment
        metrics['train_ic'], metrics['train_rank_ic'] = calc_ic(pred_train, label_train)
        metrics['valid_ic'], metrics['valid_rank_ic'] = calc_ic(pred_valid, label_valid)
        metrics['test_ic'], metrics['test_rank_ic'] = calc_ic(pred_test, label_test)
        
        return metrics
    
    def compare_models(self):
        """Compare all trained models
        比较所有训练过的模型
        """
        comparison_df = pd.DataFrame()
        
        for name, result in self.results.items():
            metrics = result['metrics']
            row = {
                'Model': name,
                'Train IC': metrics['train_ic'],
                'Valid IC': metrics['valid_ic'],
                'Test IC': metrics['test_ic'],
                'Test Rank IC': metrics['test_rank_ic'],
                'Training Time (s)': result['training_time']
            }
            comparison_df = pd.concat([comparison_df, pd.DataFrame([row])], ignore_index=True)
        
        return comparison_df.sort_values('Test IC', ascending=False)

# Create pipeline / 创建管道
pipeline = TrainingPipeline(experiment_name=EXP_NAME)
print("✅ Training pipeline created")

## Part 2: Tree-Based Models / 树模型

## 4. LightGBM Models / LightGBM模型 <a id='lightgbm'></a>

In [None]:
# LightGBM model configuration and training / LightGBM模型配置和训练

from qlib.contrib.model.gbdt import LGBModel

# Basic LightGBM configuration / 基础LightGBM配置
lgb_basic_config = {
    "loss": "mse",
    "colsample_bytree": 0.8879,
    "learning_rate": 0.0421,
    "subsample": 0.8789,
    "lambda_l1": 205.6999,
    "lambda_l2": 580.9768,
    "max_depth": 8,
    "num_leaves": 210,
    "num_threads": 20,
    "verbosity": -1,
    "early_stopping_rounds": 50,
}

# Advanced LightGBM configuration / 高级LightGBM配置
lgb_advanced_config = {
    "loss": "mse",
    "objective": "regression",
    "metric": "rmse",
    "boosting_type": "gbdt",
    "num_boost_round": 1000,
    "learning_rate": 0.05,
    "num_leaves": 256,
    "max_depth": 10,
    "min_child_samples": 20,
    "subsample": 0.9,
    "subsample_freq": 1,
    "colsample_bytree": 0.9,
    "reg_alpha": 100.0,
    "reg_lambda": 100.0,
    "seed": 42,
    "n_jobs": -1,
    "silent": True,
    "importance_type": "gain",
    "early_stopping_rounds": 100,
}

# Create LightGBM models / 创建LightGBM模型
lgb_basic = LGBModel(**lgb_basic_config)
lgb_advanced = LGBModel(**lgb_advanced_config)

# Add to pipeline / 添加到管道
pipeline.add_model("LightGBM_Basic", lgb_basic)
pipeline.add_model("LightGBM_Advanced", lgb_advanced)

print("LightGBM configurations:")
print(f"  Basic: {len(lgb_basic_config)} parameters")
print(f"  Advanced: {len(lgb_advanced_config)} parameters")

In [None]:
# Custom LightGBM with feature engineering / 带特征工程的自定义LightGBM

class CustomLGBModel(LGBModel):
    """LightGBM with custom feature engineering
    带自定义特征工程的LightGBM
    """
    
    def __init__(self, feature_engineer=None, **kwargs):
        super().__init__(**kwargs)
        self.feature_engineer = feature_engineer
        self.feature_names = None
    
    def fit(self, dataset, **kwargs):
        """Fit with feature engineering
        带特征工程的训练
        """
        # Apply feature engineering if provided
        if self.feature_engineer:
            dataset = self._apply_feature_engineering(dataset)
        
        # Call parent fit method
        super().fit(dataset, **kwargs)
        
        # Extract feature importance
        self._extract_feature_importance()
    
    def _apply_feature_engineering(self, dataset):
        """Apply custom feature engineering
        应用自定义特征工程
        """
        # Example: Add rolling features
        print("Applying feature engineering...")
        # Implementation would go here
        return dataset
    
    def _extract_feature_importance(self):
        """Extract and store feature importance
        提取并存储特征重要性
        """
        if hasattr(self.model, 'feature_importance'):
            importance = self.model.feature_importance(importance_type='gain')
            if self.feature_names:
                self.feature_importance = dict(zip(self.feature_names, importance))
            else:
                self.feature_importance = importance
    
    def get_top_features(self, n=20):
        """Get top n important features
        获取前n个重要特征
        """
        if self.feature_importance and isinstance(self.feature_importance, dict):
            sorted_features = sorted(self.feature_importance.items(), 
                                   key=lambda x: x[1], reverse=True)
            return sorted_features[:n]
        return None

# Create custom LightGBM model / 创建自定义LightGBM模型
custom_lgb = CustomLGBModel(**lgb_basic_config)
pipeline.add_model("LightGBM_Custom", custom_lgb)

print("✅ Custom LightGBM model created")

## 5. XGBoost Models / XGBoost模型 <a id='xgboost'></a>

In [None]:
# XGBoost model configuration / XGBoost模型配置

from qlib.contrib.model.xgboost import XGBModel

# XGBoost configuration / XGBoost配置
xgb_config = {
    "n_estimators": 1000,
    "max_depth": 8,
    "learning_rate": 0.05,
    "subsample": 0.9,
    "colsample_bytree": 0.9,
    "reg_alpha": 100,
    "reg_lambda": 100,
    "min_child_weight": 5,
    "objective": "reg:squarederror",
    "eval_metric": "rmse",
    "random_state": 42,
    "n_jobs": -1,
    "early_stopping_rounds": 50,
    "verbosity": 0,
}

# GPU-accelerated XGBoost configuration / GPU加速的XGBoost配置
xgb_gpu_config = {
    **xgb_config,
    "tree_method": "gpu_hist",  # Use GPU
    "gpu_id": 0,
    "predictor": "gpu_predictor",
}

# Create XGBoost models / 创建XGBoost模型
xgb_model = XGBModel(**xgb_config)
# xgb_gpu_model = XGBModel(**xgb_gpu_config)  # Uncomment if GPU available

# Add to pipeline / 添加到管道
pipeline.add_model("XGBoost", xgb_model)
# pipeline.add_model("XGBoost_GPU", xgb_gpu_model)

print(f"XGBoost model configured with {len(xgb_config)} parameters")

## 6. CatBoost Models / CatBoost模型 <a id='catboost'></a>

In [None]:
# CatBoost model configuration / CatBoost模型配置

from qlib.contrib.model.catboost import CatBoostModel

# CatBoost configuration / CatBoost配置
catboost_config = {
    "iterations": 1000,
    "learning_rate": 0.05,
    "depth": 8,
    "l2_leaf_reg": 100,
    "subsample": 0.9,
    "colsample_bylevel": 0.9,
    "random_seed": 42,
    "loss_function": "RMSE",
    "eval_metric": "RMSE",
    "use_best_model": True,
    "verbose": False,
    "early_stopping_rounds": 50,
    "thread_count": -1,
}

# CatBoost with categorical features / 带分类特征的CatBoost
catboost_categorical_config = {
    **catboost_config,
    "cat_features": [],  # Specify categorical feature indices
    "one_hot_max_size": 10,
    "has_time": True,  # For time series
}

# Create CatBoost models / 创建CatBoost模型
catboost_model = CatBoostModel(**catboost_config)

# Add to pipeline / 添加到管道
pipeline.add_model("CatBoost", catboost_model)

print(f"CatBoost model configured with {len(catboost_config)} parameters")

## 7. Tree Model Comparison / 树模型对比 <a id='tree-comparison'></a>

In [None]:
# Compare tree-based models / 比较树模型

def compare_tree_models(models_dict, dataset, test_size=1000):
    """Compare different tree-based models
    比较不同的树模型
    """
    comparison_results = []
    
    # Get a small test sample for quick comparison
    test_data = dataset.prepare("test", col_set=['feature', 'label'], data_key=DataHandlerLP.DK_L).head(test_size)
    
    for name, model_config in models_dict.items():
        print(f"\nEvaluating {name}...")
        
        # Train model
        start_time = time.time()
        # model.fit(dataset)  # Would run actual training
        train_time = time.time() - start_time
        
        # Make predictions
        start_time = time.time()
        # predictions = model.predict(dataset, segment="test")  # Would make actual predictions
        predict_time = time.time() - start_time
        
        # Store results
        comparison_results.append({
            'Model': name,
            'Parameters': len(model_config),
            'Training Time': train_time,
            'Prediction Time': predict_time,
            'Memory Usage': 0,  # Would measure actual memory
        })
    
    return pd.DataFrame(comparison_results)

# Define models for comparison / 定义用于比较的模型
tree_models = {
    'LightGBM': lgb_basic_config,
    'XGBoost': xgb_config,
    'CatBoost': catboost_config,
}

# Compare models / 比较模型
comparison_df = compare_tree_models(tree_models, dataset)
print("\nTree Model Comparison:")
print(comparison_df)

# Visualize comparison / 可视化比较
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Parameters comparison / 参数比较
axes[0].bar(comparison_df['Model'], comparison_df['Parameters'])
axes[0].set_title('Model Complexity (Parameter Count)')
axes[0].set_xlabel('Model')
axes[0].set_ylabel('Number of Parameters')

# Feature importance comparison (placeholder) / 特征重要性比较（占位符）
feature_importance = pd.DataFrame({
    'Feature': ['f1', 'f2', 'f3', 'f4', 'f5'],
    'LightGBM': np.random.random(5),
    'XGBoost': np.random.random(5),
    'CatBoost': np.random.random(5),
})
feature_importance.set_index('Feature').plot(kind='bar', ax=axes[1])
axes[1].set_title('Top 5 Feature Importance Comparison')
axes[1].set_xlabel('Feature')
axes[1].set_ylabel('Importance')

plt.tight_layout()
plt.show()

## Part 3: Deep Learning Models / 深度学习模型

## 8. Neural Network Basics / 神经网络基础 <a id='nn-basics'></a>

In [None]:
# Neural network base configuration / 神经网络基础配置

import torch
import torch.nn as nn

# Check GPU availability / 检查GPU可用性
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Base neural network configuration / 基础神经网络配置
nn_base_config = {
    "input_dim": 158,  # Alpha158 features
    "hidden_dims": [256, 128, 64],
    "output_dim": 1,
    "dropout": 0.3,
    "activation": "relu",
    "batch_norm": True,
    "learning_rate": 0.001,
    "batch_size": 1024,
    "epochs": 100,
    "early_stopping_patience": 10,
    "optimizer": "adam",
    "loss_fn": "mse",
    "device": device,
}

print("Neural network base configuration:")
for key, value in nn_base_config.items():
    print(f"  {key}: {value}")

## 9. MLP for Stock Prediction / 用于股票预测的MLP <a id='mlp'></a>

In [None]:
# MLP model implementation / MLP模型实现

from qlib.contrib.model.pytorch_nn import DNNModel

# MLP configuration / MLP配置
mlp_config = {
    "batch_size": 1024,
    "max_steps": 8000,
    "learning_rate": 0.001,
    "weight_decay": 0.0001,
    "early_stopping_rounds": 50,
    "eval_steps": 20,
    "optimizer": "adam",
    "loss": "mse",
    "GPU": 0 if torch.cuda.is_available() else None,
    "seed": 42,
}

# Advanced MLP with custom architecture / 带自定义架构的高级MLP
class CustomMLP(nn.Module):
    """Custom MLP architecture for stock prediction
    用于股票预测的自定义MLP架构
    """
    
    def __init__(self, input_dim=158, hidden_dims=[512, 256, 128, 64], 
                 dropout=0.3, use_batch_norm=True):
        super(CustomMLP, self).__init__()
        
        layers = []
        prev_dim = input_dim
        
        for hidden_dim in hidden_dims:
            # Linear layer
            layers.append(nn.Linear(prev_dim, hidden_dim))
            
            # Batch normalization
            if use_batch_norm:
                layers.append(nn.BatchNorm1d(hidden_dim))
            
            # Activation
            layers.append(nn.ReLU())
            
            # Dropout
            layers.append(nn.Dropout(dropout))
            
            prev_dim = hidden_dim
        
        # Output layer
        layers.append(nn.Linear(prev_dim, 1))
        
        self.model = nn.Sequential(*layers)
    
    def forward(self, x):
        return self.model(x)

# Create MLP models / 创建MLP模型
mlp_model = DNNModel(**mlp_config)

# Add to pipeline / 添加到管道
pipeline.add_model("MLP", mlp_model)

print("✅ MLP model created")

# Display architecture / 显示架构
custom_mlp = CustomMLP()
print("\nCustom MLP Architecture:")
print(custom_mlp)

## 10. LSTM and GRU Models / LSTM和GRU模型 <a id='lstm-gru'></a>

In [None]:
# LSTM and GRU models for time series / 用于时间序列的LSTM和GRU模型

from qlib.contrib.model.pytorch_lstm import LSTMModel
from qlib.contrib.model.pytorch_gru import GRUModel

# LSTM configuration / LSTM配置
lstm_config = {
    "d_feat": 158,  # Input dimension
    "hidden_size": 128,
    "num_layers": 2,
    "dropout": 0.3,
    "batch_size": 800,
    "early_stop": 20,
    "learning_rate": 0.001,
    "metric": "loss",
    "loss": "mse",
    "n_epochs": 100,
    "GPU": 0 if torch.cuda.is_available() else None,
    "seed": 42,
}

# GRU configuration / GRU配置
gru_config = {
    "d_feat": 158,
    "hidden_size": 128,
    "num_layers": 2,
    "dropout": 0.3,
    "batch_size": 800,
    "early_stop": 20,
    "learning_rate": 0.001,
    "metric": "loss",
    "loss": "mse",
    "n_epochs": 100,
    "GPU": 0 if torch.cuda.is_available() else None,
    "seed": 42,
}

# Custom LSTM with attention mechanism / 带注意力机制的自定义LSTM
class LSTMWithAttention(nn.Module):
    """LSTM with attention mechanism for better feature extraction
    带注意力机制的LSTM用于更好的特征提取
    """
    
    def __init__(self, input_dim, hidden_dim, num_layers, dropout=0.3):
        super(LSTMWithAttention, self).__init__()
        
        self.lstm = nn.LSTM(
            input_dim, hidden_dim, num_layers,
            batch_first=True, dropout=dropout
        )
        
        # Attention layers
        self.attention = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.Tanh(),
            nn.Linear(hidden_dim // 2, 1)
        )
        
        self.output = nn.Linear(hidden_dim, 1)
    
    def forward(self, x):
        # LSTM forward pass
        lstm_out, _ = self.lstm(x)
        
        # Attention mechanism
        attention_weights = torch.softmax(self.attention(lstm_out), dim=1)
        context_vector = torch.sum(attention_weights * lstm_out, dim=1)
        
        # Output
        output = self.output(context_vector)
        return output

# Create LSTM and GRU models / 创建LSTM和GRU模型
lstm_model = LSTMModel(**lstm_config)
gru_model = GRUModel(**gru_config)

# Add to pipeline / 添加到管道
pipeline.add_model("LSTM", lstm_model)
pipeline.add_model("GRU", gru_model)

print("✅ LSTM and GRU models created")

# Display custom architecture / 显示自定义架构
lstm_attention = LSTMWithAttention(158, 128, 2)
print("\nLSTM with Attention Architecture:")
print(lstm_attention)

## 11. Transformer Models / Transformer模型 <a id='transformer'></a>

In [None]:
# Transformer model for stock prediction / 用于股票预测的Transformer模型

class StockTransformer(nn.Module):
    """Transformer model for stock prediction
    用于股票预测的Transformer模型
    """
    
    def __init__(self, input_dim=158, d_model=128, nhead=8, 
                 num_layers=3, dropout=0.1):
        super(StockTransformer, self).__init__()
        
        # Input projection
        self.input_projection = nn.Linear(input_dim, d_model)
        
        # Positional encoding
        self.positional_encoding = PositionalEncoding(d_model, dropout)
        
        # Transformer encoder
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=d_model,
            nhead=nhead,
            dropout=dropout,
            batch_first=True
        )
        self.transformer = nn.TransformerEncoder(encoder_layer, num_layers)
        
        # Output layers
        self.output = nn.Sequential(
            nn.Linear(d_model, d_model // 2),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(d_model // 2, 1)
        )
    
    def forward(self, x):
        # Input projection
        x = self.input_projection(x)
        
        # Add positional encoding
        x = self.positional_encoding(x)
        
        # Transformer encoding
        x = self.transformer(x)
        
        # Global pooling (take mean across sequence)
        x = x.mean(dim=1)
        
        # Output
        return self.output(x)

class PositionalEncoding(nn.Module):
    """Positional encoding for transformer
    Transformer的位置编码
    """
    
    def __init__(self, d_model, dropout=0.1, max_len=5000):
        super(PositionalEncoding, self).__init__()
        self.dropout = nn.Dropout(p=dropout)
        
        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * 
                           (-np.log(10000.0) / d_model))
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0).transpose(0, 1)
        self.register_buffer('pe', pe)
    
    def forward(self, x):
        x = x + self.pe[:x.size(0), :]
        return self.dropout(x)

# Create transformer model / 创建Transformer模型
transformer_model = StockTransformer()
print("✅ Transformer model created")
print(f"\nModel parameters: {sum(p.numel() for p in transformer_model.parameters()):,}")

## Part 4: Model Optimization / 模型优化

## 13. Hyperparameter Tuning / 超参数调优 <a id='hyperparameter'></a>

In [None]:
# Hyperparameter tuning with Optuna / 使用Optuna进行超参数调优

import optuna
from optuna.samplers import TPESampler

class HyperparameterTuner:
    """Automated hyperparameter tuning framework
    自动超参数调优框架
    """
    
    def __init__(self, model_class, dataset, n_trials=50):
        self.model_class = model_class
        self.dataset = dataset
        self.n_trials = n_trials
        self.best_params = None
        self.best_score = None
    
    def objective_lgb(self, trial):
        """Objective function for LightGBM
        LightGBM的目标函数
        """
        params = {
            'num_leaves': trial.suggest_int('num_leaves', 20, 300),
            'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
            'max_depth': trial.suggest_int('max_depth', 3, 12),
            'min_child_samples': trial.suggest_int('min_child_samples', 5, 100),
            'subsample': trial.suggest_float('subsample', 0.5, 1.0),
            'colsample_bytree': trial.suggest_float('colsample_bytree', 0.5, 1.0),
            'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 100.0, log=True),
            'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 100.0, log=True),
            'loss': 'mse',
            'early_stopping_rounds': 50,
            'num_threads': 20,
        }
        
        # Train model with suggested parameters
        model = LGBModel(**params)
        model.fit(self.dataset)
        
        # Evaluate on validation set
        pred_valid = model.predict(self.dataset, segment="valid")
        label_valid = self.dataset.prepare("valid", col_set=['label'], data_key=DataHandlerLP.DK_L)
        
        # Calculate IC as objective
        df = pd.DataFrame({'pred': pred_valid, 'label': label_valid['label']}).dropna()
        ic = df['pred'].corr(df['label'])
        
        return ic  # Optuna maximizes by default
    
    def objective_nn(self, trial):
        """Objective function for neural networks
        神经网络的目标函数
        """
        params = {
            'hidden_size': trial.suggest_categorical('hidden_size', [64, 128, 256, 512]),
            'num_layers': trial.suggest_int('num_layers', 1, 4),
            'dropout': trial.suggest_float('dropout', 0.1, 0.5),
            'learning_rate': trial.suggest_float('learning_rate', 1e-4, 1e-2, log=True),
            'batch_size': trial.suggest_categorical('batch_size', [256, 512, 1024, 2048]),
            'weight_decay': trial.suggest_float('weight_decay', 1e-8, 1e-3, log=True),
        }
        
        # Train and evaluate model
        # ... (implementation)
        
        return 0  # Placeholder
    
    def tune(self, model_type='lgb'):
        """Run hyperparameter tuning
        运行超参数调优
        """
        print(f"Starting hyperparameter tuning for {model_type}...")
        
        # Select objective function
        if model_type == 'lgb':
            objective = self.objective_lgb
        elif model_type == 'nn':
            objective = self.objective_nn
        else:
            raise ValueError(f"Unknown model type: {model_type}")
        
        # Create study
        study = optuna.create_study(
            direction='maximize',
            sampler=TPESampler(seed=42)
        )
        
        # Optimize
        study.optimize(objective, n_trials=self.n_trials)
        
        # Store best parameters
        self.best_params = study.best_params
        self.best_score = study.best_value
        
        print(f"\nBest score: {self.best_score:.4f}")
        print(f"Best parameters: {self.best_params}")
        
        return study

# Example tuning (would run actual tuning)
print("Hyperparameter tuning framework ready")
print("\nExample parameter search space for LightGBM:")
print("  num_leaves: [20, 300]")
print("  learning_rate: [0.01, 0.3]")
print("  max_depth: [3, 12]")
print("  subsample: [0.5, 1.0]")

## 14. Feature Selection / 特征选择 <a id='feature-selection'></a>

In [None]:
# Feature selection methods / 特征选择方法

from sklearn.feature_selection import SelectKBest, mutual_info_regression, f_regression
from sklearn.ensemble import RandomForestRegressor

class FeatureSelector:
    """Comprehensive feature selection framework
    综合特征选择框架
    """
    
    def __init__(self, dataset):
        self.dataset = dataset
        self.selected_features = {}
        self.feature_scores = {}
    
    def select_by_importance(self, model, top_k=50):
        """Select features by model importance
        通过模型重要性选择特征
        """
        # Train model
        model.fit(self.dataset)
        
        # Get feature importance
        if hasattr(model, 'feature_importance'):
            importance = model.feature_importance()
            
            # Sort and select top features
            sorted_idx = np.argsort(importance)[::-1]
            selected_idx = sorted_idx[:top_k]
            
            # Get feature names
            train_data = self.dataset.prepare("train", col_set=['feature'])
            feature_names = train_data.columns.tolist()
            
            self.selected_features['importance'] = [feature_names[i] for i in selected_idx]
            self.feature_scores['importance'] = importance[selected_idx]
            
            return self.selected_features['importance']
    
    def select_by_mutual_info(self, top_k=50):
        """Select features by mutual information
        通过互信息选择特征
        """
        # Get data
        train_data = self.dataset.prepare("train", col_set=['feature', 'label'])
        X = train_data.iloc[:, :-1]
        y = train_data.iloc[:, -1]
        
        # Calculate mutual information
        selector = SelectKBest(mutual_info_regression, k=top_k)
        selector.fit(X, y)
        
        # Get selected features
        selected_mask = selector.get_support()
        self.selected_features['mutual_info'] = X.columns[selected_mask].tolist()
        self.feature_scores['mutual_info'] = selector.scores_[selected_mask]
        
        return self.selected_features['mutual_info']
    
    def select_by_correlation(self, threshold=0.95):
        """Remove highly correlated features
        移除高度相关的特征
        """
        # Get data
        train_data = self.dataset.prepare("train", col_set=['feature'])
        
        # Calculate correlation matrix
        corr_matrix = train_data.corr().abs()
        
        # Find features to remove
        upper_tri = corr_matrix.where(
            np.triu(np.ones(corr_matrix.shape), k=1).astype(bool)
        )
        
        to_drop = [column for column in upper_tri.columns 
                  if any(upper_tri[column] > threshold)]
        
        # Keep features
        self.selected_features['correlation'] = [
            col for col in train_data.columns if col not in to_drop
        ]
        
        return self.selected_features['correlation']
    
    def select_by_recursive_elimination(self, estimator=None, n_features=50):
        """Recursive feature elimination
        递归特征消除
        """
        from sklearn.feature_selection import RFE
        
        # Get data
        train_data = self.dataset.prepare("train", col_set=['feature', 'label'])
        X = train_data.iloc[:, :-1]
        y = train_data.iloc[:, -1]
        
        # Use default estimator if not provided
        if estimator is None:
            estimator = RandomForestRegressor(n_estimators=50, random_state=42)
        
        # RFE
        selector = RFE(estimator, n_features_to_select=n_features)
        selector.fit(X, y)
        
        # Get selected features
        self.selected_features['rfe'] = X.columns[selector.support_].tolist()
        
        return self.selected_features['rfe']
    
    def combine_selections(self, methods=['importance', 'mutual_info'], min_votes=2):
        """Combine multiple feature selection methods
        组合多种特征选择方法
        """
        from collections import Counter
        
        # Count votes for each feature
        all_features = []
        for method in methods:
            if method in self.selected_features:
                all_features.extend(self.selected_features[method])
        
        feature_votes = Counter(all_features)
        
        # Select features with enough votes
        combined_features = [
            feature for feature, votes in feature_votes.items() 
            if votes >= min_votes
        ]
        
        return combined_features

# Example usage
print("Feature selection framework ready")
print("\nAvailable methods:")
print("  1. Model importance-based selection")
print("  2. Mutual information selection")
print("  3. Correlation-based removal")
print("  4. Recursive feature elimination")
print("  5. Ensemble voting combination")

## 15. Ensemble Methods / 集成方法 <a id='ensemble'></a>

In [None]:
# Ensemble methods for model combination / 模型组合的集成方法

class EnsembleModel:
    """Ensemble model framework
    集成模型框架
    """
    
    def __init__(self, models, weights=None, method='average'):
        """
        Parameters:
        - models: List of trained models
        - weights: Model weights for weighted average
        - method: 'average', 'weighted', 'stacking', 'blending'
        """
        self.models = models
        self.weights = weights or [1/len(models)] * len(models)
        self.method = method
        self.meta_model = None
    
    def predict_average(self, dataset, segment="test"):
        """Simple average ensemble
        简单平均集成
        """
        predictions = []
        
        for model in self.models:
            pred = model.predict(dataset, segment)
            predictions.append(pred)
        
        # Stack predictions and take mean
        pred_matrix = np.column_stack(predictions)
        ensemble_pred = pred_matrix.mean(axis=1)
        
        return ensemble_pred
    
    def predict_weighted(self, dataset, segment="test"):
        """Weighted average ensemble
        加权平均集成
        """
        predictions = []
        
        for model, weight in zip(self.models, self.weights):
            pred = model.predict(dataset, segment)
            predictions.append(pred * weight)
        
        # Sum weighted predictions
        ensemble_pred = np.sum(predictions, axis=0)
        
        return ensemble_pred
    
    def train_stacking(self, dataset):
        """Train stacking ensemble
        训练堆叠集成
        """
        # Get base model predictions on validation set
        valid_predictions = []
        
        for model in self.models:
            pred = model.predict(dataset, segment="valid")
            valid_predictions.append(pred)
        
        # Stack predictions as features
        X_meta = np.column_stack(valid_predictions)
        
        # Get validation labels
        y_valid = dataset.prepare("valid", col_set=['label'])['label'].values
        
        # Train meta-model
        from sklearn.linear_model import Ridge
        self.meta_model = Ridge(alpha=1.0)
        self.meta_model.fit(X_meta, y_valid)
        
        print("Stacking meta-model trained")
    
    def predict_stacking(self, dataset, segment="test"):
        """Predict with stacking ensemble
        使用堆叠集成预测
        """
        if self.meta_model is None:
            raise ValueError("Must train stacking model first")
        
        # Get base model predictions
        test_predictions = []
        
        for model in self.models:
            pred = model.predict(dataset, segment)
            test_predictions.append(pred)
        
        # Stack predictions
        X_meta = np.column_stack(test_predictions)
        
        # Meta-model prediction
        ensemble_pred = self.meta_model.predict(X_meta)
        
        return ensemble_pred
    
    def optimize_weights(self, dataset):
        """Optimize ensemble weights
        优化集成权重
        """
        from scipy.optimize import minimize
        
        # Get validation predictions
        valid_predictions = []
        for model in self.models:
            pred = model.predict(dataset, segment="valid")
            valid_predictions.append(pred)
        
        pred_matrix = np.column_stack(valid_predictions)
        
        # Get validation labels
        y_valid = dataset.prepare("valid", col_set=['label'])['label'].values
        
        # Objective function
        def objective(weights):
            weighted_pred = np.dot(pred_matrix, weights)
            mse = np.mean((weighted_pred - y_valid) ** 2)
            return mse
        
        # Constraints: weights sum to 1, all positive
        constraints = {'type': 'eq', 'fun': lambda w: np.sum(w) - 1}
        bounds = [(0, 1)] * len(self.models)
        
        # Initial weights
        init_weights = [1/len(self.models)] * len(self.models)
        
        # Optimize
        result = minimize(objective, init_weights, method='SLSQP',
                        bounds=bounds, constraints=constraints)
        
        self.weights = result.x
        print(f"Optimized weights: {self.weights}")
        
        return self.weights

print("Ensemble framework ready")
print("\nAvailable ensemble methods:")
print("  1. Simple averaging")
print("  2. Weighted averaging")
print("  3. Stacking with meta-learner")
print("  4. Weight optimization")

## Part 5: Production Deployment / 生产部署

## 17. Model Persistence / 模型持久化 <a id='persistence'></a>

In [None]:
# Model persistence and versioning / 模型持久化和版本控制

import joblib
from datetime import datetime
import hashlib

class ModelManager:
    """Model management system for production
    生产环境的模型管理系统
    """
    
    def __init__(self, base_dir="./models"):
        self.base_dir = Path(base_dir)
        self.base_dir.mkdir(exist_ok=True)
        self.model_registry = {}
    
    def save_model(self, model, name, version=None, metadata=None):
        """Save model with versioning
        带版本控制的模型保存
        """
        # Generate version if not provided
        if version is None:
            version = datetime.now().strftime("%Y%m%d_%H%M%S")
        
        # Create model directory
        model_dir = self.base_dir / name / version
        model_dir.mkdir(parents=True, exist_ok=True)
        
        # Save model
        model_path = model_dir / "model.pkl"
        joblib.dump(model, model_path)
        
        # Save metadata
        if metadata is None:
            metadata = {}
        
        metadata.update({
            'name': name,
            'version': version,
            'saved_at': datetime.now().isoformat(),
            'model_class': model.__class__.__name__,
            'file_hash': self._calculate_hash(model_path)
        })
        
        metadata_path = model_dir / "metadata.json"
        with open(metadata_path, 'w') as f:
            json.dump(metadata, f, indent=2)
        
        # Update registry
        self.model_registry[f"{name}:{version}"] = {
            'path': model_path,
            'metadata': metadata
        }
        
        print(f"Model saved: {name}:{version}")
        return model_path
    
    def load_model(self, name, version="latest"):
        """Load model by name and version
        按名称和版本加载模型
        """
        # Find model version
        if version == "latest":
            model_versions = list((self.base_dir / name).glob("*"))
            if not model_versions:
                raise ValueError(f"No versions found for model {name}")
            version = sorted(model_versions)[-1].name
        
        # Load model
        model_path = self.base_dir / name / version / "model.pkl"
        if not model_path.exists():
            raise FileNotFoundError(f"Model not found: {model_path}")
        
        model = joblib.load(model_path)
        
        # Load metadata
        metadata_path = self.base_dir / name / version / "metadata.json"
        if metadata_path.exists():
            with open(metadata_path, 'r') as f:
                metadata = json.load(f)
        else:
            metadata = {}
        
        print(f"Model loaded: {name}:{version}")
        return model, metadata
    
    def list_models(self):
        """List all available models
        列出所有可用模型
        """
        models = []
        
        for model_dir in self.base_dir.glob("*"):
            if model_dir.is_dir():
                for version_dir in model_dir.glob("*"):
                    if version_dir.is_dir():
                        models.append({
                            'name': model_dir.name,
                            'version': version_dir.name,
                            'path': version_dir
                        })
        
        return pd.DataFrame(models)
    
    def _calculate_hash(self, file_path):
        """Calculate file hash for integrity check
        计算文件哈希值以进行完整性检查
        """
        hash_md5 = hashlib.md5()
        with open(file_path, "rb") as f:
            for chunk in iter(lambda: f.read(4096), b""):
                hash_md5.update(chunk)
        return hash_md5.hexdigest()

# Example usage
model_manager = ModelManager()
print("Model manager initialized")
print(f"Model directory: {model_manager.base_dir}")

## 19. Model Monitoring / 模型监控 <a id='monitoring'></a>

In [None]:
# Model monitoring system / 模型监控系统

class ModelMonitor:
    """Production model monitoring
    生产模型监控
    """
    
    def __init__(self, model_name):
        self.model_name = model_name
        self.metrics_history = []
        self.alerts = []
        self.thresholds = {
            'ic_min': 0.01,
            'prediction_std_max': 0.1,
            'null_rate_max': 0.05
        }
    
    def monitor_prediction_quality(self, predictions, labels=None):
        """Monitor prediction quality metrics
        监控预测质量指标
        """
        metrics = {}
        
        # Basic statistics
        metrics['mean'] = np.mean(predictions)
        metrics['std'] = np.std(predictions)
        metrics['min'] = np.min(predictions)
        metrics['max'] = np.max(predictions)
        metrics['null_rate'] = np.isnan(predictions).mean()
        
        # IC if labels available
        if labels is not None:
            valid_idx = ~(np.isnan(predictions) | np.isnan(labels))
            if valid_idx.sum() > 0:
                metrics['ic'] = np.corrcoef(
                    predictions[valid_idx], 
                    labels[valid_idx]
                )[0, 1]
        
        # Check alerts
        self._check_alerts(metrics)
        
        # Store metrics
        metrics['timestamp'] = datetime.now()
        self.metrics_history.append(metrics)
        
        return metrics
    
    def _check_alerts(self, metrics):
        """Check if any alerts should be triggered
        检查是否应触发任何警报
        """
        # IC degradation
        if 'ic' in metrics and metrics['ic'] < self.thresholds['ic_min']:
            self.alerts.append({
                'type': 'IC_DEGRADATION',
                'message': f"IC {metrics['ic']:.4f} below threshold {self.thresholds['ic_min']}",
                'timestamp': datetime.now()
            })
        
        # High null rate
        if metrics['null_rate'] > self.thresholds['null_rate_max']:
            self.alerts.append({
                'type': 'HIGH_NULL_RATE',
                'message': f"Null rate {metrics['null_rate']:.2%} exceeds threshold",
                'timestamp': datetime.now()
            })
    
    def get_monitoring_report(self):
        """Generate monitoring report
        生成监控报告
        """
        if not self.metrics_history:
            return "No metrics available"
        
        report = f"\nModel Monitoring Report: {self.model_name}\n"
        report += "="*50 + "\n"
        
        # Recent metrics
        recent = self.metrics_history[-1]
        report += "\nRecent Metrics:\n"
        for key, value in recent.items():
            if key != 'timestamp':
                report += f"  {key}: {value:.4f}\n"
        
        # Alerts
        if self.alerts:
            report += f"\nAlerts ({len(self.alerts)}):\n"
            for alert in self.alerts[-5:]:  # Last 5 alerts
                report += f"  [{alert['type']}] {alert['message']}\n"
        
        return report

# Example monitoring
monitor = ModelMonitor("LightGBM_Production")

# Simulate monitoring
for i in range(5):
    predictions = np.random.randn(1000) * 0.01
    labels = np.random.randn(1000) * 0.01
    metrics = monitor.monitor_prediction_quality(predictions, labels)

print(monitor.get_monitoring_report())

## Summary / 总结

### What we covered / 本章内容

#### Part 1: Model Fundamentals / 模型基础
- Model interface and architecture / 模型接口与架构
- Dataset preparation / 数据集准备
- Training pipeline / 训练管道

#### Part 2: Tree-Based Models / 树模型
- **LightGBM**: Fast and efficient gradient boosting / 快速高效的梯度提升
- **XGBoost**: Scalable gradient boosting / 可扩展的梯度提升
- **CatBoost**: Handling categorical features / 处理分类特征

#### Part 3: Deep Learning Models / 深度学习模型
- **MLP**: Multi-layer perceptrons / 多层感知器
- **LSTM/GRU**: Sequential models / 序列模型
- **Transformer**: Attention-based models / 基于注意力的模型
- **GNN**: Graph neural networks / 图神经网络

#### Part 4: Model Optimization / 模型优化
- **Hyperparameter Tuning**: Optuna optimization / Optuna优化
- **Feature Selection**: Multiple methods / 多种方法
- **Ensemble Methods**: Model combination / 模型组合
- **Cross-Validation**: Robust evaluation / 稳健评估

#### Part 5: Production Deployment / 生产部署
- **Model Persistence**: Versioning and storage / 版本控制和存储
- **Online Learning**: Incremental updates / 增量更新
- **Model Monitoring**: Performance tracking / 性能跟踪
- **Best Practices**: Production guidelines / 生产指南

### Key Takeaways / 关键要点

1. **Model Selection**: Choose models based on data characteristics / 根据数据特征选择模型
2. **Feature Engineering**: Critical for performance / 对性能至关重要
3. **Hyperparameter Tuning**: Systematic optimization / 系统化优化
4. **Ensemble Methods**: Often improve performance / 通常能提高性能
5. **Production Readiness**: Monitor and maintain models / 监控和维护模型

### Next Steps / 下一步

Continue with **[04_evaluation_module.ipynb](./04_evaluation_module.ipynb)** to learn about:
- Backtesting strategies / 回测策略
- Performance evaluation / 性能评估
- Risk analysis / 风险分析