# Deep Reinforcement Learning for Portfolio Optimization

**IEDA4000F - Deep Learning for Decision Analytics**  
**The Hong Kong University of Science and Technology (HKUST)**

---

This notebook demonstrates the complete pipeline for training and evaluating DRL agents for portfolio optimization.

## Table of Contents
1. [Setup and Data Loading](#1-setup-and-data-loading)
2. [Environment Creation](#2-environment-creation)
3. [Training DRL Agents](#3-training-drl-agents)
4. [Running Benchmarks](#4-running-benchmarks)
5. [Performance Evaluation](#5-performance-evaluation)
6. [Visualization and Analysis](#6-visualization-and-analysis)
7. [Conclusions](#7-conclusions)

## 1. Setup and Data Loading

In [None]:
# Import required libraries
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings('ignore')

# Add src to path
sys.path.insert(0, os.path.join(os.getcwd(), '..'))

from src.data_loader import DataLoader
from src.portfolio_env import PortfolioEnv
from src.agents import create_agent, train_agent
from src.benchmarks import run_all_benchmarks
from src.metrics import PerformanceMetrics, compare_strategies
from src.visualization import *

# Set random seed for reproducibility
np.random.seed(42)

print("✓ All libraries imported successfully!")

### 1.1 Define Asset Universe and Parameters

In [None]:
# Define assets for portfolio
ASSETS = ['AAPL', 'NVDA', 'TSLA', 'MSFT', 'GOOGL', 'AMZN', 'SPY', 'GLD']

# Date range
START_DATE = '2015-01-01'
END_DATE = '2024-12-31'

# Training parameters
TRAIN_RATIO = 0.7
TRANSACTION_COST = 0.001  # 0.1%
INITIAL_BALANCE = 100000.0

print(f"Asset Universe: {ASSETS}")
print(f"Number of Assets: {len(ASSETS)}")
print(f"Date Range: {START_DATE} to {END_DATE}")
print(f"Transaction Cost: {TRANSACTION_COST*100}%")

### 1.2 Download and Process Data

In [None]:
# Initialize data loader
loader = DataLoader(
    assets=ASSETS,
    start_date=START_DATE,
    end_date=END_DATE,
    data_dir='../data'
)

# Download data
print("Downloading data from Yahoo Finance...")
prices = loader.download_data()

print(f"\nData shape: {prices.shape}")
print(f"Date range: {prices.index[0]} to {prices.index[-1]}")
print(f"\nFirst few rows:")
prices.head()

In [None]:
# Compute returns
returns, log_returns = loader.compute_returns()

print(f"Returns shape: {returns.shape}")
print(f"\nBasic statistics:")
print(returns.describe())

### 1.3 Feature Engineering

In [None]:
# Build features with technical indicators
print("Building features with technical indicators...")
features = loader.build_features(
    sma_periods=[5, 10, 20],
    ema_periods=[5, 10, 20],
    momentum_periods=[5, 10, 20],
    include_volatility=True,
    normalize=True,
    normalize_method='zscore',
    rolling_window=60
)

print(f"\nFeature set shape: {features.shape}")
print(f"Number of features: {features.shape[1]}")
print(f"\nFeature columns (first 10):")
print(features.columns[:10].tolist())

### 1.4 Train-Test Split

In [None]:
# Split data into training and testing sets
train_data, test_data = loader.train_test_split(train_ratio=TRAIN_RATIO)

print("Data split summary:")
print(f"Train period: {train_data['prices'].index[0]} to {train_data['prices'].index[-1]}")
print(f"Train samples: {len(train_data['prices'])}")
print(f"\nTest period: {test_data['prices'].index[0]} to {test_data['prices'].index[-1]}")
print(f"Test samples: {len(test_data['prices'])}")

### 1.5 Asset Statistics

In [None]:
# Get asset statistics
stats = loader.get_asset_statistics()

print("Asset Statistics:")
print(stats.round(4))

## 2. Environment Creation

In [None]:
# Create training environment
train_env = PortfolioEnv(
    prices=train_data['prices'],
    returns=train_data['returns'],
    features=train_data['features'],
    initial_balance=INITIAL_BALANCE,
    transaction_cost=TRANSACTION_COST,
    lookback_window=20,
    reward_type='risk_adjusted',
    risk_penalty_lambda=0.5,
    allow_short=False,
)

print("Training Environment:")
print(f"  Number of assets: {train_env.n_assets}")
print(f"  Max steps: {train_env.max_steps}")
print(f"  Observation space: {train_env.observation_space.shape}")
print(f"  Action space: {train_env.action_space.shape}")
print(f"  Transaction cost: {train_env.transaction_cost*100}%")

In [None]:
# Test environment with random actions
print("Testing environment with random actions...")
obs, info = train_env.reset()
print(f"Initial observation shape: {obs.shape}")
print(f"Initial portfolio value: ${info['portfolio_value']:,.2f}")

# Take a few random steps
for i in range(5):
    action = train_env.action_space.sample()
    obs, reward, terminated, truncated, info = train_env.step(action)
    print(f"Step {i+1}: Reward={reward:.4f}, Value=${info['portfolio_value']:,.2f}, Turnover={info['turnover']:.4f}")

## 3. Training DRL Agents

We'll train PPO (Proximal Policy Optimization) agent, which is well-suited for continuous action spaces.

In [None]:
# Create PPO agent
print("Creating PPO agent...")
ppo_agent = create_agent(
    agent_type='ppo',
    env=train_env,
    learning_rate=0.0003,
    n_steps=2048,
    batch_size=64,
    n_epochs=10,
    gamma=0.99,
    gae_lambda=0.95,
    clip_range=0.2,
    ent_coef=0.01,
    net_arch=[128, 128],
    verbose=1
)

print("✓ PPO agent created successfully!")

In [None]:
# Train PPO agent
# Note: Use fewer timesteps for quick demo. Increase for better performance.
TRAIN_TIMESTEPS = 50000  # Increase to 100000+ for production

print(f"Training PPO agent for {TRAIN_TIMESTEPS} timesteps...")
print("This may take several minutes...\n")

ppo_agent.learn(total_timesteps=TRAIN_TIMESTEPS, log_interval=10)

print("\n✓ Training completed!")

In [None]:
# Save trained model
os.makedirs('../models', exist_ok=True)
model_path = '../models/ppo_demo.zip'
ppo_agent.save(model_path)
print(f"Model saved to: {model_path}")

## 4. Running Benchmarks

In [None]:
# Run benchmark strategies on test data
print("Running benchmark strategies on test data...")

benchmark_results = run_all_benchmarks(
    returns=test_data['returns'],
    transaction_cost=TRANSACTION_COST,
    initial_value=INITIAL_BALANCE,
    mv_lookback=60,
    momentum_lookback=20,
    momentum_top_k=3,
)

print("\n✓ Benchmark strategies completed!")
print(f"\nBenchmarks run: {list(benchmark_results.keys())}")

## 5. Performance Evaluation

### 5.1 Evaluate PPO Agent on Test Set

In [None]:
# Create test environment
test_env = PortfolioEnv(
    prices=test_data['prices'],
    returns=test_data['returns'],
    features=test_data['features'],
    initial_balance=INITIAL_BALANCE,
    transaction_cost=TRANSACTION_COST,
    lookback_window=20,
    reward_type='risk_adjusted',
    risk_penalty_lambda=0.5,
    allow_short=False,
)

# Evaluate PPO agent
print("Evaluating PPO agent on test set...")
obs, info = test_env.reset()
done = False

while not done:
    action, _ = ppo_agent.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = test_env.step(action)
    done = terminated or truncated

# Get PPO results
ppo_history = test_env.get_portfolio_history()
ppo_results = {
    'returns': ppo_history['returns'],
    'values': ppo_history['values'],
    'weights': ppo_history['weights'],
    'turnover': ppo_history['turnover'],
}

print(f"✓ PPO evaluation completed!")
print(f"Final portfolio value: ${ppo_results['values'][-1]:,.2f}")
print(f"Total return: {(ppo_results['values'][-1]/INITIAL_BALANCE - 1)*100:.2f}%")

### 5.2 Compare All Strategies

In [None]:
# Combine all results
all_results = {'PPO': ppo_results, **benchmark_results}

# Calculate metrics
metrics_df = compare_strategies(
    all_results,
    risk_free_rate=0.02,
    periods_per_year=252
)

print("\n" + "="*80)
print("Performance Metrics Comparison")
print("="*80)
print(metrics_df.to_string())
print("="*80)

### 5.3 Detailed Metrics for Each Strategy

In [None]:
# Print detailed metrics
for name, data in all_results.items():
    metrics = PerformanceMetrics(
        returns=data['returns'],
        values=data['values'],
        weights=data.get('weights'),
        turnover=data.get('turnover'),
        risk_free_rate=0.02,
        periods_per_year=252
    )
    metrics.print_metrics(name)

## 6. Visualization and Analysis

### 6.1 Cumulative Returns

In [None]:
# Plot cumulative returns
returns_dict = {name: data['returns'] for name, data in all_results.items()}
plot_cumulative_returns(returns_dict, figsize=(14, 7))
plt.show()

### 6.2 Portfolio Values

In [None]:
# Plot portfolio values
values_dict = {name: data['values'] for name, data in all_results.items()}
plot_portfolio_values(values_dict, figsize=(14, 7))
plt.show()

### 6.3 Return Distribution

In [None]:
# Plot return distributions
plot_return_distribution(returns_dict, figsize=(14, 6))
plt.show()

### 6.4 Performance Metrics Comparison

In [None]:
# Plot metrics comparison
plot_metrics_comparison(metrics_df, figsize=(16, 10))
plt.show()

### 6.5 Drawdown Analysis

In [None]:
# Plot drawdown for PPO agent
plot_drawdown(ppo_results['values'], title="Drawdown Analysis - PPO Agent", figsize=(14, 8))
plt.show()

### 6.6 Portfolio Allocation Over Time

In [None]:
# Plot PPO agent's portfolio allocation
plot_weights_stacked(
    ppo_results['weights'],
    test_data['prices'].columns.tolist(),
    title="PPO Agent - Portfolio Allocation Over Time",
    figsize=(14, 7)
)
plt.show()

### 6.7 Turnover Analysis

In [None]:
# Plot turnover analysis
turnover_dict = {name: data['turnover'] for name, data in all_results.items() if 'turnover' in data}
plot_turnover_analysis(turnover_dict, figsize=(14, 6))
plt.show()

## 7. Conclusions

In [None]:
# Summary statistics
print("\n" + "="*80)
print("SUMMARY AND CONCLUSIONS")
print("="*80)

print("\n1. Best Performing Strategy (by Sharpe Ratio):")
best_sharpe = metrics_df['Sharpe Ratio'].idxmax()
print(f"   Strategy: {best_sharpe}")
print(f"   Sharpe Ratio: {metrics_df.loc[best_sharpe, 'Sharpe Ratio']:.4f}")
print(f"   Annualized Return: {metrics_df.loc[best_sharpe, 'Annualized Return']:.2%}")
print(f"   Max Drawdown: {metrics_df.loc[best_sharpe, 'Max Drawdown']:.2%}")

print("\n2. Highest Return Strategy:")
best_return = metrics_df['Annualized Return'].idxmax()
print(f"   Strategy: {best_return}")
print(f"   Annualized Return: {metrics_df.loc[best_return, 'Annualized Return']:.2%}")
print(f"   Volatility: {metrics_df.loc[best_return, 'Annualized Volatility']:.2%}")

print("\n3. Lowest Risk Strategy (by Max Drawdown):")
lowest_dd = metrics_df['Max Drawdown'].idxmin()
print(f"   Strategy: {lowest_dd}")
print(f"   Max Drawdown: {metrics_df.loc[lowest_dd, 'Max Drawdown']:.2%}")
print(f"   Annualized Return: {metrics_df.loc[lowest_dd, 'Annualized Return']:.2%}")

print("\n4. PPO Agent Performance:")
if 'PPO' in metrics_df.index:
    ppo_metrics = metrics_df.loc['PPO']
    print(f"   Sharpe Ratio: {ppo_metrics['Sharpe Ratio']:.4f}")
    print(f"   Annualized Return: {ppo_metrics['Annualized Return']:.2%}")
    print(f"   Max Drawdown: {ppo_metrics['Max Drawdown']:.2%}")
    print(f"   Average Turnover: {ppo_metrics['Average Turnover']:.4f}")
    
    # Rank among all strategies
    sharpe_rank = (metrics_df['Sharpe Ratio'] > ppo_metrics['Sharpe Ratio']).sum() + 1
    print(f"   Sharpe Ratio Rank: {sharpe_rank} out of {len(metrics_df)}")

print("\n5. Key Observations:")
print("   - DRL agents learn adaptive trading policies from data")
print("   - Transaction costs significantly impact strategy performance")
print("   - Risk-adjusted metrics (Sharpe ratio) more meaningful than raw returns")
print("   - Portfolio diversification helps manage risk")

print("\n" + "="*80)
print("✓ Analysis Complete!")
print("="*80)

## Additional Experiments

Try these experiments to further explore the framework:

1. **Different Assets**: Change `ASSETS` list to include different stocks or ETFs
2. **Transaction Costs**: Vary `TRANSACTION_COST` to see impact on strategies
3. **Training Duration**: Increase `TRAIN_TIMESTEPS` for better convergence
4. **Network Architecture**: Modify `net_arch` in agent creation
5. **Reward Functions**: Try different `reward_type` values
6. **DDPG Agent**: Train and evaluate DDPG alongside PPO
7. **Market Regimes**: Analyze performance in bull vs bear markets
8. **Hyperparameter Tuning**: Optimize learning rates, batch sizes, etc.

---

**Disclaimer**: This is an academic research project for educational purposes only. Not financial advice. Do not use for real trading without proper validation.