# Stock NeurIPS2018 Part 3. Backtest
This series is a reproduction of paper *the process in the paper Practical Deep Reinforcement Learning Approach for Stock Trading*. 

This is the third and last part of the NeurIPS2018 series, introducing how to use use the agents we trained to do backtest, and compare with baselines such as Mean Variance Optimization and DJIA index.

Other demos can be found at the repo of [FinRL-Tutorials]((https://github.com/AI4Finance-Foundation/FinRL-Tutorials)).

# Part 1. Install Packages

In [1]:
## install finrl library
!pip install git+https://github.com/AI4Finance-Foundation/FinRL.git

Collecting git+https://github.com/AI4Finance-Foundation/FinRL.git
  Cloning https://github.com/AI4Finance-Foundation/FinRL.git to c:\users\jason\appdata\local\temp\pip-req-build-dytmgdnx
  Resolved https://github.com/AI4Finance-Foundation/FinRL.git to commit 9e8c38aa5b92bbf0e20f65fc611fd43b43196859
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting elegantrl@ git+https://github.com/AI4Finance-Foundation/ElegantRL.git (from finrl==0.3.8)
  Cloning https://github.com/AI4Finance-Foundation/ElegantRL.git to c:\users\jason\appdata\local\temp\pip-install-_v8mkaw9\elegantrl_060bb8ad75d041bc9212fb0c63cec693
  Resolved https://github.com/AI4Finance-Foundation/ElegantRL.git to commit 37aac1f592e1add9f9fd37ae8db

  Running command git clone --filter=blob:none --quiet https://github.com/AI4Finance-Foundation/FinRL.git 'C:\Users\Jason\AppData\Local\Temp\pip-req-build-dytmgdnx'
  Running command git clone --filter=blob:none --quiet https://github.com/AI4Finance-Foundation/ElegantRL.git 'C:\Users\Jason\AppData\Local\Temp\pip-install-_v8mkaw9\elegantrl_060bb8ad75d041bc9212fb0c63cec693'


In [2]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from stable_baselines3 import A2C, DDPG, PPO, SAC, TD3

from finrl.agents.stablebaselines3.models import DRLAgent
from finrl.config import INDICATORS, TRAINED_MODEL_DIR

import sys, pathlib
sys.path.insert(0, str(pathlib.Path.cwd().parents[1]))
from finai_contest.env_stock_trading.env_stock_trading_meta import StockTradingEnv_FinRLMeta
from finai_contest.env_stock_trading.env_stock_trading_gym_anytrading import StockTradingEnv_gym_anytrading
from finrl.meta.preprocessor.yahoodownloader import YahooDownloader

# Part 2. Backtesting

To backtest the agents, upload trade_data.csv in the same directory of this notebook. For Colab users, just upload trade_data.csv to the default directory.

In [3]:
train = pd.read_csv('./data/train_data.csv')
trade = pd.read_csv('./data/trade_data.csv')

# If you are not using the data generated from part 1 of this tutorial, make sure 
# it has the columns and index in the form that could be make into the environment. 
# Then you can comment and skip the following lines.
train = train.set_index(train.columns[0])
train.index.names = ['']
trade = trade.set_index(trade.columns[0])
trade.index.names = ['']
trade_aapl = trade[trade["tic"] == "AAPL"]


Then, upload the trained agent to the same directory, and set the corresponding variable to True.

In [4]:
env_used = "finrl"
# env_used = "gym_anytrading"

In [5]:
if_using_a2c = False
if_using_ddpg = False
if_using_ppo = True
if_using_td3 = False
if_using_sac = False

Load the agents

In [6]:
trained_a2c = A2C.load(TRAINED_MODEL_DIR + "/agent_a2c") if if_using_a2c else None
trained_ddpg = DDPG.load(TRAINED_MODEL_DIR + "/agent_ddpg") if if_using_ddpg else None
trained_ppo = PPO.load(TRAINED_MODEL_DIR + "/agent_ppo_"+env_used) if if_using_ppo else None
trained_td3 = TD3.load(TRAINED_MODEL_DIR + "/agent_td3") if if_using_td3 else None
trained_sac = SAC.load(TRAINED_MODEL_DIR + "/agent_sac") if if_using_sac else None

### Trading (Out-of-sample Performance)

We update periodically in order to take full advantage of the data, e.g., retrain quarterly, monthly or weekly. We also tune the parameters along the way, in this notebook we use the in-sample data from 2009-01 to 2020-07 to tune the parameters once, so there is some alpha decay here as the length of trade date extends. 

Numerous hyperparameters – e.g. the learning rate, the total number of samples to train on – influence the learning process and are usually determined by testing some variations.

In [7]:
stock_dimension = len(trade_aapl.tic.unique())
state_space = 1 + 2 * stock_dimension + len(INDICATORS) * stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")

Stock Dimension: 1, State Space: 11


In [8]:
buy_cost_list = sell_cost_list = [0.001] * stock_dimension
num_stock_shares = [0] * stock_dimension

if env_used =="finrl":
    env_kwargs = {
        "hmax": 100,
        "initial_amount": 1000000,
        "num_stock_shares": num_stock_shares,
        "buy_cost_pct": buy_cost_list,
        "sell_cost_pct": sell_cost_list,
        "state_space": state_space,
        "stock_dim": stock_dimension,
        "tech_indicator_list": INDICATORS,
        "action_space": stock_dimension,
        "reward_scaling": 1e-4
    }


    e_trade_gym = StockTradingEnv_FinRLMeta(df = trade_aapl, **env_kwargs)

elif env_used =="gym_anytrading":
    env_kwargs = {
    "hmax": np.inf,
    "initial_amount": 1000000,
    "num_stock_shares": num_stock_shares,
    "buy_cost_pct": buy_cost_list,
    "sell_cost_pct": sell_cost_list,
    "state_space": state_space,
    "stock_dim": stock_dimension,
    "tech_indicator_list": INDICATORS,
    "action_space": 2*stock_dimension,
    "reward_scaling": 1e-4,
    "window_size": 30
    }

    e_trade_gym = StockTradingEnv_gym_anytrading(df = trade_aapl, **env_kwargs)

In [9]:
env_trade, obs_trade = e_trade_gym.get_sb_env()

In [10]:
# trade_aapl = trade[trade["tic"] == "AAPL"]
# trade_gym_anytrade = pd.DataFrame()
# trade_gym_anytrade['Time'] = pd.to_datetime(trade_aapl['date'])  # Convert to datetime
# trade_gym_anytrade['Open'] = trade_aapl['open']
# trade_gym_anytrade['High'] = trade_aapl['high']
# trade_gym_anytrade['Low'] = trade_aapl['low']
# trade_gym_anytrade['Close'] = trade_aapl['close']
# trade_gym_anytrade['Volume'] = trade_aapl['volume']
# trade_aapl


In [11]:
# import os
# import sys
# module_path = os.path.abspath(os.path.join('..'))
# if module_path not in sys.path:
#     sys.path.append(module_path)

# import gymnasium as gym
# import gym_anytrading

# e_trade_gym = gym.make(
#     'stocks-v0',
#     df=trade_gym_anytrade,
#     window_size=30,
#     frame_bound=(30, len(trade_gym_anytrade))
# )

# from stable_baselines3.common.vec_env import DummyVecEnv

# def get_sb_env(self):
#     e = DummyVecEnv([lambda: self])
#     obs = e.reset()
#     return e, obs


# def save_asset_memory(self):
#     if "total_profit" not in self.history:
#         print("Warning: 'total_profit' not found in history. Returning empty DataFrame.")
#         return pd.DataFrame({"date": [], "account_value": []})
#     dates = self.df['Time'][self._start_tick:self._current_tick].dt.strftime('%Y-%m-%d')
#     profits = self.history["total_profit"]

#     assert len(dates) == len(profits), f"Length mismatch: {len(dates)} dates vs {len(profits)} profits"

#     df_account_value = pd.DataFrame({
#         "date": dates,
#         "account_value": profits
#     })
#     return df_account_value


# def save_action_memory(self):
#     if "total_profit" not in self.history:
#         print("Warning: 'total_profit' not found in history. Returning empty DataFrame.")
#         return pd.DataFrame({"date": [], "account_value": []})
#     dates = self.df['Time'][self._start_tick:self._current_tick].dt.strftime('%Y-%m-%d')
#     actions = self.history["position"]

#     assert len(dates) == len(actions), f"Length mismatch: {len(dates)} dates vs {len(actions)} profits"

#     df_account_value = pd.DataFrame({
#         "date": dates,
#         "action": actions
#     })
#     return df_account_value

# # Patch the method
# e_trade_gym = e_trade_gym.env.env
# e_trade_gym.get_sb_env = get_sb_env.__get__(e_trade_gym)
# e_trade_gym.save_asset_memory = save_asset_memory.__get__(e_trade_gym)
# e_trade_gym.save_action_memory = save_action_memory.__get__(e_trade_gym)
# e_trade_gym.get_sb_env

In [10]:
df_account_value_a2c, df_actions_a2c = DRLAgent.DRL_prediction(
    model=trained_a2c, 
    environment = e_trade_gym) if if_using_a2c else (None, None)

In [11]:
df_account_value_ddpg, df_actions_ddpg = DRLAgent.DRL_prediction(
    model=trained_ddpg, 
    environment = e_trade_gym) if if_using_ddpg else (None, None)

In [12]:
df_account_value_ppo_list = []
df_actions_ppo_list = []
for i in range(1):
    df_account_value_ppo, df_actions_ppo = DRLAgent.DRL_prediction(
        model=trained_ppo, 
        environment = e_trade_gym, deterministic=False) if if_using_ppo else (None, None)
    df_account_value_ppo_list.append(df_account_value_ppo)
    df_actions_ppo_list.append(df_actions_ppo)
    # print(df_account_value_ppo_list)

df_account_value_all = pd.concat([df.set_index("date")["account_value"] for df in df_account_value_ppo_list],axis=1)
df_account_value_all.columns = [f"run_{i+1}" for i in range(len(df_account_value_ppo_list))]
df_account_value_all['mean'] = df_account_value_all.mean(axis=1)
df_account_value_all['std'] = df_account_value_all.std(axis=1)



hit end!


In [13]:
df_account_value_ppo = df_account_value_all[['mean', 'std']].reset_index()
df_account_value_ppo.columns = ['date', 'account_value', 'std']

In [14]:
import os
os.makedirs("./results_csv", exist_ok=True)
df_account_value_ppo.to_csv("./results_csv/account_value_ppo_"+env_used+".csv", index=False)
print(df_account_value_ppo)

           date  account_value  std
0    2023-01-03   1.000000e+06  0.0
1    2023-01-04   1.000115e+06  0.0
2    2023-01-05   9.999234e+05  0.0
3    2023-01-06   1.000097e+06  0.0
4    2023-01-09   1.000115e+06  0.0
..          ...            ...  ...
495  2024-12-20   1.483083e+06  0.0
496  2024-12-23   1.487406e+06  0.0
497  2024-12-24   1.503829e+06  0.0
498  2024-12-26   1.508444e+06  0.0
499  2024-12-27   1.489090e+06  0.0

[500 rows x 3 columns]


In [15]:
df_account_value_td3, df_actions_td3 = DRLAgent.DRL_prediction(
    model=trained_td3, 
    environment = e_trade_gym) if if_using_td3 else (None, None)

In [16]:
df_account_value_sac, df_actions_sac = DRLAgent.DRL_prediction(
    model=trained_sac, 
    environment = e_trade_gym) if if_using_sac else (None, None)

In [17]:
def _calculate_sharpe_ratio(total_profits):
    total_profits = np.array(total_profits)
    
    # Calculate daily returns (percentage change)
    daily_returns = np.diff(total_profits) / total_profits[:-1]
    
    if daily_returns.std() == 0 or len(daily_returns) < 2:
        return 0.0
    
    sharpe = (255 ** 0.5) * daily_returns.mean() / daily_returns.std()
    return sharpe

print("Sharpe Ratio:",_calculate_sharpe_ratio(df_account_value_ppo["account_value"]))

Sharpe Ratio: 1.2302588903410787


In [18]:
import matplotlib.pyplot as plt

# Ensure 'date' is in datetime format (optional, for better x-axis formatting)
df_account_value_ppo['date'] = pd.to_datetime(df_account_value_ppo['date'])

# Plot
plt.figure(figsize=(12, 6))
plt.plot(df_account_value_ppo['date'], df_account_value_ppo['account_value'], label='Account Value (PPO)', linewidth=2)
plt.xlabel('Date')
plt.ylabel('Account Value')
plt.title('PPO Trading Strategy: Account Value Over Time')
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.show()

  plt.show()


# Part 3: Mean Variance Optimization

Mean Variance optimization is a very classic strategy in portfolio management. Here, we go through the whole process to do the mean variance optimization and add it as a baseline to compare.

First, process dataframe to the form for MVO weight calculation.

In [19]:
def process_df_for_mvo(df):
  return df.pivot(index="date", columns="tic", values="close")

### Helper functions for mean returns and variance-covariance matrix

In [20]:
# Codes in this section partially refer to Dr G A Vijayalakshmi Pai
# https://www.kaggle.com/code/vijipai/lesson-5-mean-variance-optimization-of-portfolios/notebook

def StockReturnsComputing(StockPrice, Rows, Columns): 
  import numpy as np 
  StockReturn = np.zeros([Rows-1, Columns]) 
  for j in range(Columns):        # j: Assets 
    for i in range(Rows-1):     # i: Daily Prices 
      StockReturn[i,j]=((StockPrice[i+1, j]-StockPrice[i,j])/StockPrice[i,j])* 100 
      
  return StockReturn

### Calculate the weights for mean-variance

In [21]:
StockData = process_df_for_mvo(train)
TradeData = process_df_for_mvo(trade)

TradeData.to_numpy()

array([[123.33065033, 239.42544556, 142.09277344, ...,  33.34642029,
         32.07088852,  46.24145508],
       [124.60271454, 241.93273926, 145.39587402, ...,  34.18589401,
         32.32959366,  46.29298401],
       [123.28132629, 244.19287109, 141.91654968, ...,  34.65966034,
         30.34617996,  46.13518906],
       ...,
       [257.28668213, 257.97912598, 300.98654175, ...,  37.87639999,
          9.18999958,  91.98840332],
       [258.10372925, 256.70135498, 301.51220703, ...,  38.02866745,
          9.68000031,  92.09758759],
       [254.68586731, 256.18441772, 298.59619141, ...,  37.99059677,
          9.61999989,  90.97602844]], shape=(500, 29))

In [22]:
#compute asset returns
arStockPrices = np.asarray(StockData)
[Rows, Cols]=arStockPrices.shape
arReturns = StockReturnsComputing(arStockPrices, Rows, Cols)

#compute mean returns and variance covariance matrix of returns
meanReturns = np.mean(arReturns, axis = 0)
covReturns = np.cov(arReturns, rowvar=False)
 
#set precision for printing results
np.set_printoptions(precision=3, suppress = True)

#display mean returns and variance-covariance matrix of returns
print('Mean returns of assets in k-portfolio 1\n', meanReturns)
print('Variance-Covariance matrix of returns\n', covReturns)

Mean returns of assets in k-portfolio 1
 [0.143 0.061 0.095 0.08  0.076 0.131 0.065 0.044 0.075 0.072 0.108 0.078
 0.037 0.066 0.049 0.083 0.048 0.063 0.055 0.053 0.108 0.097 0.048 0.059
 0.11  0.102 0.04  0.047 0.045]
Variance-Covariance matrix of returns
 [[3.225 1.03  1.593 1.588 1.536 1.81  1.489 1.174 1.2   1.662 1.26  1.33
  1.103 1.622 0.698 1.643 0.677 0.845 1.124 0.76  1.613 1.263 0.711 0.952
  1.217 1.399 0.578 0.973 0.633]
 [1.03  2.517 1.203 1.049 1.106 1.227 1.059 0.984 0.965 1.156 1.001 1.036
  0.841 1.164 0.895 1.261 0.655 0.684 0.958 1.108 1.122 0.902 0.757 0.898
  1.208 1.053 0.649 1.038 0.644]
 [1.593 1.203 4.794 2.805 2.542 1.935 1.824 2.192 2.165 3.027 1.735 2.229
  1.518 1.902 0.937 3.581 1.096 1.226 1.737 1.125 1.712 1.764 0.923 1.913
  1.715 2.246 0.932 1.432 0.638]
 [1.588 1.049 2.805 5.092 2.344 1.833 1.625 2.202 1.991 2.375 1.618 2.293
  1.553 1.786 0.901 2.584 1.129 1.236 1.569 0.975 1.507 1.688 0.804 1.678
  1.577 1.764 0.776 1.437 0.613]
 [1.536 1.106 2.542

### Use PyPortfolioOpt

In [23]:
from pypfopt.efficient_frontier import EfficientFrontier

ef_mean = EfficientFrontier(meanReturns, covReturns, weight_bounds=(0, 0.5))
raw_weights_mean = ef_mean.max_sharpe()
cleaned_weights_mean = ef_mean.clean_weights()
mvo_weights = np.array([1000000 * cleaned_weights_mean[i] for i in range(len(cleaned_weights_mean))])
mvo_weights

array([361430.,      0.,      0.,      0.,      0.,  56980.,      0.,
            0.,      0.,      0., 246140.,      0.,      0.,      0.,
            0.,      0.,      0.,      0.,      0.,      0.,  58030.,
        49750.,      0.,      0., 173330.,  54350.,      0.,      0.,
            0.])

In [24]:
LastPrice = np.array([1/p for p in StockData.tail(1).to_numpy()[0]])
Initial_Portfolio = np.multiply(mvo_weights, LastPrice)
Initial_Portfolio

array([2076.108,    0.   ,    0.   ,    0.   ,    0.   ,  226.575,
          0.   ,    0.   ,    0.   ,    0.   ,  651.987,    0.   ,
          0.   ,    0.   ,    0.   ,    0.   ,    0.   ,    0.   ,
          0.   ,    0.   ,  177.992,  316.127,    0.   ,    0.   ,
        366.832,  257.959,    0.   ,    0.   ,    0.   ])

In [25]:
Portfolio_Assets = TradeData @ Initial_Portfolio
MVO_result = pd.DataFrame(Portfolio_Assets, columns=["Mean Var"])
MVO_result

Unnamed: 0_level_0,Mean Var
date,Unnamed: 1_level_1
2023-01-03,7.896225e+05
2023-01-04,7.909572e+05
2023-01-05,7.780722e+05
2023-01-06,7.930048e+05
2023-01-09,7.964514e+05
...,...
2024-12-20,1.218321e+06
2024-12-23,1.221487e+06
2024-12-24,1.231781e+06
2024-12-26,1.233944e+06


# Part 4: DJIA index

Add DJIA index as a baseline to compare with.

In [26]:
# TRAIN_START_DATE = '2009-01-01'
# TRAIN_END_DATE = '2020-07-01'
# TRADE_START_DATE = '2020-07-01'
# TRADE_END_DATE = '2021-10-29'
TRAIN_START_DATE = '2009-01-01'
TRAIN_END_DATE = '2023-01-01'
TRADE_START_DATE = '2023-01-01'
TRADE_END_DATE = '2024-12-31'

In [27]:
df_dji = YahooDownloader(
    start_date=TRADE_START_DATE, end_date=TRADE_END_DATE, ticker_list=["^DJI"]
).fetch_data()



[*********************100%***********************]  1 of 1 completed

Shape of DataFrame:  (501, 8)





In [28]:
df_dji = df_dji[["date", "close"]]
fst_day = df_dji["close"][0]
dji = pd.merge(
    df_dji["date"],
    df_dji["close"].div(fst_day).mul(1000000),
    how="outer",
    left_index=True,
    right_index=True,
).set_index("date")

# Part 5: Equally Weighted Strategy


In [29]:
def process_df(df):
  return df.pivot(index="date", columns="tic", values="close")

In [30]:
StockData = process_df(train)
TradeData = process_df(trade)

TradeData.to_numpy()

array([[123.331, 239.425, 142.093, ...,  33.346,  32.071,  46.241],
       [124.603, 241.933, 145.396, ...,  34.186,  32.33 ,  46.293],
       [123.281, 244.193, 141.917, ...,  34.66 ,  30.346,  46.135],
       ...,
       [257.287, 257.979, 300.987, ...,  37.876,   9.19 ,  91.988],
       [258.104, 256.701, 301.512, ...,  38.029,   9.68 ,  92.098],
       [254.686, 256.184, 298.596, ...,  37.991,   9.62 ,  90.976]],
      shape=(500, 29))

In [31]:
trade_data = TradeData.to_numpy()
T, N = trade_data.shape

# Initialize portfolio
portfolio_value = [1000000]
weights = np.ones(N) / N  # equal weights

for t in range(1, T):
    # Previous prices and today's prices
    prev_prices = trade_data[t - 1]
    curr_prices = trade_data[t]

    # How many shares of each asset we held yesterday
    shares = (portfolio_value[-1] * weights) / prev_prices

    # Today's value = shares * today's price
    new_value = np.sum(shares * curr_prices)

    portfolio_value.append(new_value)

TradeData.index = pd.to_datetime(TradeData.index)

# Step 2: Convert your portfolio_value into a DataFrame
EWS_result = pd.DataFrame(
    portfolio_value, 
    index=TradeData.index,   # align with dates
    columns=["Equal Weight"] # name the strategy
)
EWS_result

Unnamed: 0_level_0,Equal Weight
date,Unnamed: 1_level_1
2023-01-03,1.000000e+06
2023-01-04,1.008939e+06
2023-01-05,9.994740e+05
2023-01-06,1.023346e+06
2023-01-09,1.020309e+06
...,...
2024-12-20,1.293422e+06
2024-12-23,1.293978e+06
2024-12-24,1.304220e+06
2024-12-26,1.308777e+06


<a id='4'></a>
# Part 6: Backtesting Results
Backtesting plays a key role in evaluating the performance of a trading strategy. Automated backtesting tool is preferred because it reduces the human error. We usually use the Quantopian pyfolio package to backtest our trading strategies. It is easy to use and consists of various individual plots that provide a comprehensive image of the performance of a trading strategy.

In [32]:
df_result_a2c = (
    df_account_value_a2c.set_index(df_account_value_a2c.columns[0])
    if if_using_a2c
    else None
)
df_result_ddpg = (
    df_account_value_ddpg.set_index(df_account_value_ddpg.columns[0])
    if if_using_ddpg
    else None
)
df_result_ppo = (
    df_account_value_ppo.set_index(df_account_value_ppo.columns[0])
    if if_using_ppo
    else None
)
df_result_td3 = (
    df_account_value_td3.set_index(df_account_value_td3.columns[0])
    if if_using_td3
    else None
)
df_result_sac = (
    df_account_value_sac.set_index(df_account_value_sac.columns[0])
    if if_using_sac
    else None
)
df_result_ppo.index = pd.to_datetime(df_result_ppo.index)
MVO_result.index = pd.to_datetime(MVO_result.index)
dji.index = pd.to_datetime(dji.index)
print(df_result_ppo["account_value"])
print(MVO_result["Mean Var"])
print(df_result_ppo["account_value"].apply(type).unique())
print(MVO_result["Mean Var"].apply(type).unique())
print(dji["close"].apply(type).unique())
result = pd.DataFrame(
    {
        "a2c": df_result_a2c["account_value"] if if_using_a2c else None,
        "ddpg": df_result_ddpg["account_value"] if if_using_ddpg else None,
        "ppo": df_result_ppo["account_value"] if if_using_ppo else None,
        "td3": df_result_td3["account_value"] if if_using_td3 else None,
        "sac": df_result_sac["account_value"] if if_using_sac else None,
        "mvo": MVO_result["Mean Var"],
        "dji": dji["close"],
        "ews": EWS_result["Equal Weight"]
    }
)

date
2023-01-03    1.000000e+06
2023-01-04    1.000115e+06
2023-01-05    9.999234e+05
2023-01-06    1.000097e+06
2023-01-09    1.000115e+06
                  ...     
2024-12-20    1.483083e+06
2024-12-23    1.487406e+06
2024-12-24    1.503829e+06
2024-12-26    1.508444e+06
2024-12-27    1.489090e+06
Name: account_value, Length: 500, dtype: float64
date
2023-01-03    7.896225e+05
2023-01-04    7.909572e+05
2023-01-05    7.780722e+05
2023-01-06    7.930048e+05
2023-01-09    7.964514e+05
                  ...     
2024-12-20    1.218321e+06
2024-12-23    1.221487e+06
2024-12-24    1.231781e+06
2024-12-26    1.233944e+06
2024-12-27    1.222152e+06
Name: Mean Var, Length: 500, dtype: float64
[<class 'float'>]
[<class 'float'>]
[<class 'float'>]


In [33]:
import numpy as np
import pandas as pd

trading_days_per_year = 252

# === Utility Functions ===
def compute_annualized_return(v0, vT, T):
    return (1+((vT - v0)/ v0))** (365 / T) - 1

def compute_sharpe_ratio(daily_returns):
    if daily_returns.std() == 0 or len(daily_returns) < 2:
        return 0.0
    return daily_returns.mean() / daily_returns.std() * np.sqrt(trading_days_per_year)

def compute_annualized_volatility(daily_returns):
    return daily_returns.std() * np.sqrt(trading_days_per_year)

def compute_max_drawdown(daily_returns):
    r = np.asarray(daily_returns, dtype=float)
    if r.size == 0:
        return 0.0
    r = r[~np.isnan(r)]
    if r.size == 0:
        return 0.0
    equity = np.cumprod(1.0 + r)
    peaks = np.maximum.accumulate(equity)
    drawdowns = equity / peaks - 1.0  # ≤ 0
    return float(np.min(drawdowns) * 100.0)
# === Initialize Metrics Dictionary ===
metrics = {}

# === Process Baselines ===
for strategy in result.columns.drop('ppo'):
    series = result[strategy].dropna()
    if len(series) < 2:
        continue

    daily_returns = series.pct_change().dropna()
    final_value = series.iloc[-1]
    v0 = series.iloc[0]
    T = len(series)
    print("baseline trade",T)
    print(strategy,v0,final_value)
    annual_return = compute_annualized_return(v0, final_value, T)
    volatility = compute_annualized_volatility(daily_returns)
    sharpe = compute_sharpe_ratio(daily_returns)
    max_drawdown = compute_max_drawdown(daily_returns)
    metrics[strategy] = {
        "Final Value": final_value,
        "Annualized Return": annual_return,
        "Annualized Volatility": volatility,
        "Annualized StdErr": np.nan,
        "Sharpe Ratio": sharpe,
        "Maximum Drawdown":max_drawdown,
    }

# === Process PPO Multi-run ===
ppo_runs = [df_account_value_all[f'run_{i+1}'] for i in range(1)]
ppo_final_values = []
ppo_annual_returns = []
ppo_sharpe_ratios = []
ppo_volatilities = []
ppo_max_drawdown = []
for series in ppo_runs:
    series = series.dropna()
    v0 = series.iloc[0]
    vT = series.iloc[-1]
    T = len(series)
    print("agent trade",T)

    daily_returns = series.pct_change().dropna()

    ppo_final_values.append(vT)
    ppo_annual_returns.append(compute_annualized_return(v0, vT, T))
    ppo_sharpe_ratios.append(compute_sharpe_ratio(daily_returns))
    ppo_volatilities.append(daily_returns.std() * np.sqrt(252)) 
    ppo_max_drawdown.append(compute_max_drawdown(daily_returns))
# Convert to arrays
ppo_annual_returns = np.array(ppo_annual_returns)
ppo_sharpe_ratios = np.array(ppo_sharpe_ratios)
ppo_final_values = np.array(ppo_final_values)
ppo_volatilities = np.array(ppo_volatilities)
ppo_max_drawdown = np.array(ppo_max_drawdown)

# Compute stats
mean_volatility = ppo_volatilities.mean()
std_volatility = ppo_volatilities.std(ddof=1)

mean_final_value = ppo_final_values.mean()
std_final_value = ppo_final_values.std(ddof=1)
mean_annual_return = ppo_annual_returns.mean()
std_annual_return = ppo_annual_returns.std(ddof=1)
stderr_annual_return = std_annual_return / np.sqrt(len(ppo_annual_returns))
mean_sharpe = ppo_sharpe_ratios.mean()
std_sharpe = ppo_sharpe_ratios.std(ddof=1)
mean_max_drawdown = ppo_max_drawdown.mean()

# Save PPO to metrics
metrics['ppo'] = {
    "Final Value": f"{mean_final_value:.2f}",
    "Annualized Return": f"{mean_annual_return:.4f}",
    "Annualized Volatility": f"{mean_volatility:.4f}",
    "Annualized StdErr": f"{stderr_annual_return:.4f}",
    "Sharpe Ratio": f"{mean_sharpe:.4f}",
    "Maximum Drawdown": f"{mean_max_drawdown:.4f}"

}


# === Final DataFrame ===
df_metrics = pd.DataFrame(metrics)



baseline trade 500
mvo 789622.490278661 1222152.0565361185
baseline trade 501
dji 1000000.0 1284803.6481816208
baseline trade 500
ews 1000000.0 1300317.8524948538
agent trade 500


  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  ret = ret.dtype.type(ret / rcount)


In [34]:
df_metrics

Unnamed: 0,mvo,dji,ews,ppo
Final Value,1222152.0,1284804.0,1300318.0,1489089.78
Annualized Return,0.3755779,0.2003068,0.2113124,0.3373
Annualized Volatility,0.1372538,0.1139912,0.1124035,0.1775
Annualized StdErr,,,,
Sharpe Ratio,1.676379,1.165158,1.236259,1.2218
Maximum Drawdown,-9.858092,-9.017762,-8.675963,-16.5661


In [37]:
result

Unnamed: 0_level_0,a2c,ddpg,ppo,td3,sac,mvo,dji,ews
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2023-01-03,,,1.000000e+06,,,7.896225e+05,1.000000e+06,1.000000e+06
2023-01-04,,,1.000115e+06,,,7.909572e+05,1.004026e+06,1.008939e+06
2023-01-05,,,9.999234e+05,,,7.780722e+05,9.937744e+05,9.994740e+05
2023-01-06,,,1.000097e+06,,,7.930048e+05,1.014915e+06,1.023346e+06
2023-01-09,,,1.000115e+06,,,7.964514e+05,1.011506e+06,1.020309e+06
...,...,...,...,...,...,...,...,...
2024-12-23,,,1.487406e+06,,,1.221487e+06,1.294860e+06,1.293978e+06
2024-12-24,,,1.503829e+06,,,1.231781e+06,1.306632e+06,1.304220e+06
2024-12-26,,,1.508444e+06,,,1.233944e+06,1.307500e+06,1.308777e+06
2024-12-27,,,1.489090e+06,,,1.222152e+06,1.297433e+06,1.300318e+06


Now, everything is ready, we can plot the backtest result.

In [38]:
plt.rcParams["figure.figsize"] = (15,5)
plt.figure()
result.plot()

<Axes: xlabel='date'>