<a href="https://colab.research.google.com/github/AI4Finance-Foundation/FinRL-Tutorials/blob/master/1-Introduction/Stock_NeurIPS2018_SB3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Reinforcement Learning for Stock Trading from Scratch: Multiple Stock Trading

* **Pytorch Version** 



# Content

* [1. Task Description](#0)
* [2. Install Python packages](#1)
    * [2.1. Install Packages](#1.1)    
    * [2.2. A List of Python Packages](#1.2)
    * [2.3. Import Packages](#1.3)
    * [2.4. Create Folders](#1.4)
* [3. Download and Preprocess Data](#2)
* [4. Preprocess Data](#3)        
    * [4.1. Technical Indicators](#3.1)
    * [4.2. Perform Feature Engineering](#3.2)
* [5. Build Market Environment in OpenAI Gym-style](#4)  
    * [5.1. Data Split](#4.1)  
    * [5.3. Environment for Training](#4.2)    
* [6. Train DRL Agents](#5)
* [7. Backtesting Performance](#6)  
    * [7.1. BackTestStats](#6.1)
    * [7.2. BackTestPlot](#6.2)   
  

<a id='0'></a>
# Part 1. Task Discription

We train a DRL agent for stock trading. This task is modeled as a Markov Decision Process (MDP), and the objective function is maximizing (expected) cumulative return.

We specify the state-action-reward as follows:

* **State s**: The state space represents an agent's perception of the market environment. Just like a human trader analyzing various information, here our agent passively observes many features and learns by interacting with the market environment (usually by replaying historical data).

* **Action a**: The action space includes allowed actions that an agent can take at each state. For example, a ∈ {−1, 0, 1}, where −1, 0, 1 represent
selling, holding, and buying. When an action operates multiple shares, a ∈{−k, ..., −1, 0, 1, ..., k}, e.g.. "Buy
10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively

* **Reward function r(s, a, s′)**: Reward is an incentive for an agent to learn a better policy. For example, it can be the change of the portfolio value when taking a at state s and arriving at new state s',  i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio values at state s′ and s, respectively


**Market environment**: 30 consituent stocks of Dow Jones Industrial Average (DJIA) index. Accessed at the starting date of the testing period.


The data for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume.


<a id='1'></a>
# Part 2. Install Python Packages

<a id='1.1'></a>
## 2.1. Install packages



<a id='1.2'></a>
## 2.2. A list of Python packages 
* Yahoo Finance API
* pandas
* numpy
* matplotlib
* stockstats
* OpenAI gym
* stable-baselines
* tensorflow
* pyfolio

<a id='1.3'></a>
## 2.3. Import Packages

In [327]:
int(len(trade)/29)

658

In [133]:
import datetime

In [281]:
import sys
sys.path.append("../STOCK_DRL")

import warnings
warnings.filterwarnings("ignore")


import itertools
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# matplotlib.use('Agg')
import datetime

from Processed import get_processed_data

from module.yahoodownloader import YahooDownloader
from module.preprocessor import FeatureEngineer, data_split
from module.efficient_frontier import EfficientFrontier
from module import helper
from module.config_tickers import DOW_30_TICKER
from module.env_stocktrading import StockTradingEnv
from module.models import DRLAgent
from module.logger import configure
from module.config import (
    DATA_SAVE_DIR,
    TRAINED_MODEL_DIR,
    TENSORBOARD_LOG_DIR,
    RESULTS_DIR,
    INDICATORS,
    TRAIN_START_DATE,
    TRAIN_END_DATE,
    TEST_START_DATE,
    TEST_END_DATE,
    TRADE_START_DATE,
    TRADE_END_DATE,
    
)


<a id='1.4'></a>
## 2.4. Create Folders

In [71]:
from finrl.main import check_and_make_directories
check_and_make_directories([DATA_SAVE_DIR, TRAINED_MODEL_DIR, TENSORBOARD_LOG_DIR, RESULTS_DIR])



<a id='2'></a>
# Part 3. Download Data
Yahoo Finance provides stock data, financial news, financial reports, etc. Yahoo Finance is free.
* FinRL uses a class **YahooDownloader** in FinRL-Meta to fetch data via Yahoo Finance API
* Call Limit: Using the Public API (without authentication), you are limited to 2,000 requests per hour per IP (or up to a total of 48,000 requests a day).



-----
class YahooDownloader:
    Retrieving daily stock data from
    Yahoo Finance API

    Attributes
    ----------
        start_date : str
            start date of the data (modified from config.py)
        end_date : str
            end date of the data (modified from config.py)
        ticker_list : list
            a list of stock tickers (modified from config.py)

    Methods
    -------
    fetch_data()


In [72]:
# from config.py, TRAIN_START_DATE is a string
TRAIN_START_DATE
# from config.py, TRAIN_END_DATE is a string
TRAIN_END_DATE

'2020-07-01'

In [73]:
df = YahooDownloader(start_date = TRAIN_START_DATE,
                     end_date = TRADE_END_DATE,
                     ticker_list = DOW_30_TICKER).fetch_data()

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

In [74]:
print(DOW_30_TICKER)

['AXP', 'AMGN', 'AAPL', 'BA', 'CAT', 'CSCO', 'CVX', 'GS', 'HD', 'HON', 'IBM', 'INTC', 'JNJ', 'KO', 'JPM', 'MCD', 'MMM', 'MRK', 'MSFT', 'NKE', 'PG', 'TRV', 'UNH', 'CRM', 'VZ', 'V', 'WBA', 'WMT', 'DIS', 'DOW']


In [75]:
df.shape

(103961, 8)

In [76]:
df.sort_values(['date','tic'],ignore_index=True).head()

Unnamed: 0,date,open,high,low,close,volume,tic,day
0,2009-01-02,3.067143,3.251429,3.041429,2.758535,746015200,AAPL,4
1,2009-01-02,58.59,59.080002,57.75,43.832634,6547900,AMGN,4
2,2009-01-02,18.57,19.52,18.4,15.365306,10955700,AXP,4
3,2009-01-02,42.799999,45.560001,42.779999,33.94109,7010200,BA,4
4,2009-01-02,44.91,46.98,44.709999,31.579327,7117200,CAT,4


# Part 4: Preprocess Data
We need to check for missing data and do feature engineering to convert the data point into a state.
* **Adding technical indicators**. In practical trading, various information needs to be taken into account, such as historical prices, current holding shares, technical indicators, etc. Here, we demonstrate two trend-following technical indicators: MACD and RSI.
* **Adding turbulence index**. Risk-aversion reflects whether an investor prefers to protect the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the turbulence index that measures extreme fluctuation of asset price.

In [77]:
fe = FeatureEngineer(
                    use_technical_indicator=True,
                    tech_indicator_list = INDICATORS,
                    use_vix=True,
                    use_turbulence=True,
                    user_defined_feature = False)

processed = fe.preprocess_data(df)

Successfully added technical indicators
[*********************100%***********************]  1 of 1 completed
Shape of DataFrame:  (3551, 8)
Successfully added vix
Successfully added turbulence index


In [78]:
list_ticker = processed["tic"].unique().tolist()
list_date = list(pd.date_range(processed['date'].min(),processed['date'].max()).astype(str))
combination = list(itertools.product(list_date,list_ticker))

processed_full = pd.DataFrame(combination,columns=["date","tic"]).merge(processed,on=["date","tic"],how="left")
processed_full = processed_full[processed_full['date'].isin(processed['date'])]
processed_full = processed_full.sort_values(['date','tic'])

processed_full = processed_full.fillna(0)

In [79]:
processed_full.sort_values(['date','tic'],ignore_index=True).head(10)

Unnamed: 0,date,tic,open,high,low,close,volume,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
0,2009-01-02,AAPL,3.067143,3.251429,3.041429,2.758535,746015200.0,4.0,0.0,2.981391,2.6521,100.0,66.666667,100.0,2.758535,2.758535,39.189999,0.0
1,2009-01-02,AMGN,58.59,59.080002,57.75,43.832634,6547900.0,4.0,0.0,2.981391,2.6521,100.0,66.666667,100.0,43.832634,43.832634,39.189999,0.0
2,2009-01-02,AXP,18.57,19.52,18.4,15.365306,10955700.0,4.0,0.0,2.981391,2.6521,100.0,66.666667,100.0,15.365306,15.365306,39.189999,0.0
3,2009-01-02,BA,42.799999,45.560001,42.779999,33.94109,7010200.0,4.0,0.0,2.981391,2.6521,100.0,66.666667,100.0,33.94109,33.94109,39.189999,0.0
4,2009-01-02,CAT,44.91,46.98,44.709999,31.579327,7117200.0,4.0,0.0,2.981391,2.6521,100.0,66.666667,100.0,31.579327,31.579327,39.189999,0.0
5,2009-01-02,CRM,8.025,8.55,7.9125,8.505,4069200.0,4.0,0.0,2.981391,2.6521,100.0,66.666667,100.0,8.505,8.505,39.189999,0.0
6,2009-01-02,CSCO,16.41,17.0,16.25,11.948338,40980600.0,4.0,0.0,2.981391,2.6521,100.0,66.666667,100.0,11.948338,11.948338,39.189999,0.0
7,2009-01-02,CVX,74.230003,77.300003,73.580002,43.677189,13695900.0,4.0,0.0,2.981391,2.6521,100.0,66.666667,100.0,43.677189,43.677189,39.189999,0.0
8,2009-01-02,DIS,22.76,24.030001,22.5,20.597494,9796600.0,4.0,0.0,2.981391,2.6521,100.0,66.666667,100.0,20.597494,20.597494,39.189999,0.0
9,2009-01-02,GS,84.019997,87.620003,82.190002,69.747612,14088500.0,4.0,0.0,2.981391,2.6521,100.0,66.666667,100.0,69.747612,69.747612,39.189999,0.0


In [80]:
mvo_df = processed_full.sort_values(['date','tic'],ignore_index=True)[['date','tic','close']]

In [323]:
int(len(mvo_df)/29)

3551

<a id='4'></a>
# Part 5. Build A Market Environment in OpenAI Gym-style
The training process involves observing stock price change, taking an action and reward's calculation. By interacting with the market environment, the agent will eventually derive a trading strategy that may maximize (expected) rewards.

Our market environment, based on OpenAI Gym, simulates stock markets with historical market data.

## Data Split
We split the data into training set and testing set as follows:

Training data period: 2009-01-01 to 2020-07-01

Trading data period: 2020-07-01 to 2021-10-31


In [82]:
train = data_split(processed_full, TRAIN_START_DATE,TRAIN_END_DATE)
trade = data_split(processed_full, TRADE_START_DATE,TRADE_END_DATE)
print(len(train))
print(len(trade))

83897
19082


In [83]:
train.tail()

Unnamed: 0,date,tic,open,high,low,close,volume,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
2892,2020-06-30,UNH,288.570007,296.450012,287.660004,284.97879,2932900.0,1.0,-0.019287,300.970827,268.613865,52.413059,-25.914719,1.846804,285.22027,278.269271,30.43,12.918757
2892,2020-06-30,V,191.490005,193.75,190.160004,189.604019,9040100.0,1.0,1.042556,197.569696,183.942024,53.021034,-51.608286,2.013358,190.347375,180.598288,30.43,12.918757
2892,2020-06-30,VZ,54.919998,55.290001,54.360001,48.16901,17414800.0,1.0,-0.417955,51.555486,46.593788,48.097037,-51.186723,8.508886,48.776547,49.20927,30.43,12.918757
2892,2020-06-30,WBA,42.119999,42.580002,41.759998,37.630482,4782100.0,1.0,-0.080963,41.075411,35.173593,48.830183,-14.613203,1.500723,37.726358,37.5335,30.43,12.918757
2892,2020-06-30,WMT,119.220001,120.129997,118.540001,115.184006,6836400.0,1.0,-0.87941,118.508925,112.593777,48.159674,-69.964752,3.847271,116.836409,118.756425,30.43,12.918757


In [84]:
trade.head()

Unnamed: 0,date,tic,open,high,low,close,volume,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
0,2020-07-01,AAPL,91.279999,91.839996,90.977501,89.494553,110737200.0,2.0,3.000854,92.276538,79.814266,62.807128,107.498985,29.730532,83.550962,77.363088,28.620001,53.068112
0,2020-07-01,AMGN,235.520004,256.230011,232.580002,234.614304,6575800.0,2.0,3.552511,227.036704,195.594648,61.279625,270.848998,46.806139,209.902523,210.950772,28.620001,53.068112
0,2020-07-01,AXP,95.25,96.959999,93.639999,91.078468,3301000.0,2.0,-0.384903,109.215321,86.798779,48.504819,-66.306151,3.142448,96.180265,89.702836,28.620001,53.068112
0,2020-07-01,BA,185.880005,190.610001,180.039993,180.320007,49036700.0,2.0,5.443193,220.721139,160.932863,50.925771,24.220608,15.93292,176.472335,155.614168,28.620001,53.068112
0,2020-07-01,CAT,129.380005,129.399994,125.879997,118.455788,2807800.0,2.0,1.249466,128.246931,111.290116,52.865418,35.692958,14.457404,117.239535,111.578318,28.620001,53.068112


In [85]:
INDICATORS

['macd',
 'boll_ub',
 'boll_lb',
 'rsi_30',
 'cci_30',
 'dx_30',
 'close_30_sma',
 'close_60_sma']

In [270]:
stock_dimension = len(train.tic.unique())
state_space = 1 + 2*stock_dimension + len(INDICATORS)*stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")

Stock Dimension: 29, State Space: 291


In [271]:
buy_cost_list = sell_cost_list = [0.001] * stock_dimension
num_stock_shares = [0] * stock_dimension

env_kwargs = {
    "hmax": 100,
    "initial_amount": 1000000,
    "num_stock_shares": num_stock_shares,
    "buy_cost_pct": buy_cost_list,
    "sell_cost_pct": sell_cost_list,
    "state_space": state_space,
    "stock_dim": stock_dimension,
    "tech_indicator_list": INDICATORS,
    "action_space": stock_dimension,
    "reward_scaling": 1e-4
}


e_train_gym = StockTradingEnv(df = train, **env_kwargs)

## Environment for Training



In [272]:
env_train, _ = e_train_gym.get_sb_env()
print(type(env_train))

<class 'stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv'>


<a id='5'></a>
# Part 6: Train DRL Agents
* The DRL algorithms are from **Stable Baselines 3**. Users are also encouraged to try **ElegantRL** and **Ray RLlib**.
* FinRL includes fine-tuned standard DRL algorithms, such as DQN, DDPG, Multi-Agent DDPG, PPO, SAC, A2C and TD3. We also allow users to
design their own DRL algorithms by adapting these DRL algorithms.

In [89]:
agent = DRLAgent(env = env_train)

if_using_a2c = True
if_using_ddpg = True
if_using_ppo = True
if_using_td3 = True
if_using_sac = True


### Agent Training: 5 algorithms (A2C, DDPG, PPO, TD3, SAC)


### Agent 1: A2C


In [90]:
agent = DRLAgent(env = env_train)
model_a2c = agent.get_model("a2c")

if if_using_a2c:
  # set up logger
  tmp_path = RESULTS_DIR + '/a2c'
  new_logger_a2c = configure(tmp_path, ["stdout", "csv", "tensorboard"])
  # Set new logger
  model_a2c.set_logger(new_logger_a2c)


{'n_steps': 5, 'ent_coef': 0.01, 'learning_rate': 0.0007}
Using cpu device


2023-02-24 11:30:04.330726: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-24 11:30:06.846584: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/acraf/.local/lib/python3.8/site-packages/cv2/../../lib64:
2023-02-24 11:30:06.846831: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/acraf/.local/lib/python3.8/site-packages/cv2/../../lib64:


Logging to results/a2c


In [91]:
trained_a2c = agent.train_model(model=model_a2c, 
                             tb_log_name='a2c',
                             total_timesteps=50000) if if_using_a2c else None

--------------------------------------
| time/                 |            |
|    fps                | 107        |
|    iterations         | 100        |
|    time_elapsed       | 4          |
|    total_timesteps    | 500        |
| train/                |            |
|    entropy_loss       | -41.4      |
|    explained_variance | 0.0175     |
|    learning_rate      | 0.0007     |
|    n_updates          | 99         |
|    policy_loss        | -38.9      |
|    reward             | 0.45019418 |
|    std                | 1.01       |
|    value_loss         | 1.45       |
--------------------------------------
--------------------------------------
| time/                 |            |
|    fps                | 110        |
|    iterations         | 200        |
|    time_elapsed       | 9          |
|    total_timesteps    | 1000       |
| train/                |            |
|    entropy_loss       | -41.5      |
|    explained_variance | -0.187     |
|    learning_rate      |

### Agent 2: DDPG

In [92]:
agent = DRLAgent(env = env_train)
model_ddpg = agent.get_model("ddpg")

if if_using_ddpg:
  # set up logger
  tmp_path = RESULTS_DIR + '/ddpg'
  new_logger_ddpg = configure(tmp_path, ["stdout", "csv", "tensorboard"])
  # Set new logger
  model_ddpg.set_logger(new_logger_ddpg)

{'batch_size': 128, 'buffer_size': 50000, 'learning_rate': 0.001}
Using cpu device
Logging to results/ddpg


In [93]:
trained_ddpg = agent.train_model(model=model_ddpg, 
                             tb_log_name='ddpg',
                             total_timesteps=50000) if if_using_ddpg else None

day: 2892, episode: 20
begin_total_asset: 1000000.00
end_total_asset: 3702468.75
total_reward: 2702468.75
total_cost: 5120.89
total_trades: 36610
Sharpe: 0.751
----------------------------------
| time/              |           |
|    episodes        | 4         |
|    fps             | 54        |
|    time_elapsed    | 213       |
|    total_timesteps | 11572     |
| train/             |           |
|    actor_loss      | -46       |
|    critic_loss     | 169       |
|    learning_rate   | 0.001     |
|    n_updates       | 8679      |
|    reward          | 2.2028034 |
----------------------------------
----------------------------------
| time/              |           |
|    episodes        | 8         |
|    fps             | 52        |
|    time_elapsed    | 441       |
|    total_timesteps | 23144     |
| train/             |           |
|    actor_loss      | -31.6     |
|    critic_loss     | 6.29      |
|    learning_rate   | 0.001     |
|    n_updates       | 20251     |


### Agent 3: PPO

In [94]:
agent = DRLAgent(env = env_train)
PPO_PARAMS = {
    "n_steps": 2048,
    "ent_coef": 0.01,
    "learning_rate": 0.00025,
    "batch_size": 128,
}
model_ppo = agent.get_model("ppo",model_kwargs = PPO_PARAMS)

if if_using_ppo:
  # set up logger
  tmp_path = RESULTS_DIR + '/ppo'
  new_logger_ppo = configure(tmp_path, ["stdout", "csv", "tensorboard"])
  # Set new logger
  model_ppo.set_logger(new_logger_ppo)

{'n_steps': 2048, 'ent_coef': 0.01, 'learning_rate': 0.00025, 'batch_size': 128}
Using cpu device
Logging to results/ppo


In [96]:
trained_ppo = agent.train_model(model=model_ppo, 
                             tb_log_name='ppo',
                             total_timesteps=50000) if if_using_ppo else None

-----------------------------------------
| time/                   |             |
|    fps                  | 110         |
|    iterations           | 1           |
|    time_elapsed         | 18          |
|    total_timesteps      | 2048        |
| train/                  |             |
|    approx_kl            | 0.024190363 |
|    clip_fraction        | 0.195       |
|    clip_range           | 0.2         |
|    entropy_loss         | -42.2       |
|    explained_variance   | 0.0116      |
|    learning_rate        | 0.00025     |
|    loss                 | 44.4        |
|    n_updates            | 190         |
|    policy_gradient_loss | -0.0121     |
|    reward               | 0.49708298  |
|    std                  | 1.04        |
|    value_loss           | 58.6        |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 110         |
|    iterations           | 2     

### Agent 4: TD3

In [97]:
agent = DRLAgent(env = env_train)
TD3_PARAMS = {"batch_size": 100, 
              "buffer_size": 1000000, 
              "learning_rate": 0.001}

model_td3 = agent.get_model("td3",model_kwargs = TD3_PARAMS)

if if_using_td3:
  # set up logger
  tmp_path = RESULTS_DIR + '/td3'
  new_logger_td3 = configure(tmp_path, ["stdout", "csv", "tensorboard"])
  # Set new logger
  model_td3.set_logger(new_logger_td3)

{'batch_size': 100, 'buffer_size': 1000000, 'learning_rate': 0.001}
Using cpu device
Logging to results/td3


In [98]:
trained_td3 = agent.train_model(model=model_td3, 
                             tb_log_name='td3',
                             total_timesteps=50000) if if_using_td3 else None

---------------------------------
| time/              |          |
|    episodes        | 4        |
|    fps             | 57       |
|    time_elapsed    | 199      |
|    total_timesteps | 11572    |
| train/             |          |
|    actor_loss      | 162      |
|    critic_loss     | 1.24e+04 |
|    learning_rate   | 0.001    |
|    n_updates       | 8679     |
|    reward          | 3.151815 |
---------------------------------
---------------------------------
| time/              |          |
|    episodes        | 8        |
|    fps             | 52       |
|    time_elapsed    | 438      |
|    total_timesteps | 23144    |
| train/             |          |
|    actor_loss      | 42.6     |
|    critic_loss     | 3.05e+03 |
|    learning_rate   | 0.001    |
|    n_updates       | 20251    |
|    reward          | 3.151815 |
---------------------------------
day: 2892, episode: 80
begin_total_asset: 1000000.00
end_total_asset: 4548562.25
total_reward: 3548562.25
total_cost

### Agent 5: SAC

In [99]:
agent = DRLAgent(env = env_train)
SAC_PARAMS = {
    "batch_size": 128,
    "buffer_size": 100000,
    "learning_rate": 0.0001,
    "learning_starts": 100,
    "ent_coef": "auto_0.1",
}

model_sac = agent.get_model("sac",model_kwargs = SAC_PARAMS)

if if_using_sac:
  # set up logger
  tmp_path = RESULTS_DIR + '/sac'
  new_logger_sac = configure(tmp_path, ["stdout", "csv", "tensorboard"])
  # Set new logger
  model_sac.set_logger(new_logger_sac)

{'batch_size': 128, 'buffer_size': 100000, 'learning_rate': 0.0001, 'learning_starts': 100, 'ent_coef': 'auto_0.1'}
Using cpu device
Logging to results/sac


In [100]:
trained_sac = agent.train_model(model=model_sac, 
                             tb_log_name='sac',
                             total_timesteps=50000) if if_using_sac else None

day: 2892, episode: 90
begin_total_asset: 1000000.00
end_total_asset: 3228770.27
total_reward: 2228770.27
total_cost: 35476.68
total_trades: 44660
Sharpe: 0.556
----------------------------------
| time/              |           |
|    episodes        | 4         |
|    fps             | 46        |
|    time_elapsed    | 246       |
|    total_timesteps | 11572     |
| train/             |           |
|    actor_loss      | 644       |
|    critic_loss     | 256       |
|    ent_coef        | 0.131     |
|    ent_coef_loss   | -61       |
|    learning_rate   | 0.0001    |
|    n_updates       | 11471     |
|    reward          | 4.6760244 |
----------------------------------
---------------------------------
| time/              |          |
|    episodes        | 8        |
|    fps             | 46       |
|    time_elapsed    | 493      |
|    total_timesteps | 23144    |
| train/             |          |
|    actor_loss      | 251      |
|    critic_loss     | 55.4     |
|    ent

## In-sample Performance

Assume that the initial capital is $1,000,000.

### Set turbulence threshold
Set the turbulence threshold to be greater than the maximum of insample turbulence data. If current turbulence index is greater than the threshold, then we assume that the current market is volatile

In [101]:
data_risk_indicator = processed_full[(processed_full.date<TRAIN_END_DATE) & (processed_full.date>=TRAIN_START_DATE)]
insample_risk_indicator = data_risk_indicator.drop_duplicates(subset=['date'])

In [102]:
insample_risk_indicator.vix.describe()

count    2893.000000
mean       18.824245
std         8.489311
min         9.140000
25%        13.330000
50%        16.139999
75%        21.309999
max        82.690002
Name: vix, dtype: float64

In [103]:
insample_risk_indicator.vix.quantile(0.996)

57.40400183105453

In [104]:
insample_risk_indicator.turbulence.describe()

count    2893.000000
mean       34.567962
std        43.790810
min         0.000000
25%        14.962540
50%        24.123943
75%        39.162579
max       652.505565
Name: turbulence, dtype: float64

In [105]:
insample_risk_indicator.turbulence.quantile(0.996)

276.4524526553459

### Trading (Out-of-sample Performance)

We update periodically in order to take full advantage of the data, e.g., retrain quarterly, monthly or weekly. We also tune the parameters along the way, in this notebook we use the in-sample data from 2009-01 to 2020-07 to tune the parameters once, so there is some alpha decay here as the length of trade date extends. 

Numerous hyperparameters – e.g. the learning rate, the total number of samples to train on – influence the learning process and are usually determined by testing some variations.

In [106]:
e_trade_gym = StockTradingEnv(df = trade, turbulence_threshold = 70,risk_indicator_col='vix', **env_kwargs)
env_trade, obs_trade = e_trade_gym.get_sb_env()

In [107]:
trade.head()

Unnamed: 0,date,tic,open,high,low,close,volume,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
0,2020-07-01,AAPL,91.279999,91.839996,90.977501,89.494553,110737200.0,2.0,3.000854,92.276538,79.814266,62.807128,107.498985,29.730532,83.550962,77.363088,28.620001,53.068112
0,2020-07-01,AMGN,235.520004,256.230011,232.580002,234.614304,6575800.0,2.0,3.552511,227.036704,195.594648,61.279625,270.848998,46.806139,209.902523,210.950772,28.620001,53.068112
0,2020-07-01,AXP,95.25,96.959999,93.639999,91.078468,3301000.0,2.0,-0.384903,109.215321,86.798779,48.504819,-66.306151,3.142448,96.180265,89.702836,28.620001,53.068112
0,2020-07-01,BA,185.880005,190.610001,180.039993,180.320007,49036700.0,2.0,5.443193,220.721139,160.932863,50.925771,24.220608,15.93292,176.472335,155.614168,28.620001,53.068112
0,2020-07-01,CAT,129.380005,129.399994,125.879997,118.455788,2807800.0,2.0,1.249466,128.246931,111.290116,52.865418,35.692958,14.457404,117.239535,111.578318,28.620001,53.068112


In [108]:
trained_moedl = trained_a2c
df_account_value_a2c, df_actions_a2c = DRLAgent.DRL_prediction(
    model=trained_moedl, 
    environment = e_trade_gym)

hit end!


In [109]:
trained_moedl = trained_ddpg
df_account_value_ddpg, df_actions_ddpg = DRLAgent.DRL_prediction(
    model=trained_moedl, 
    environment = e_trade_gym)

hit end!


In [110]:
trained_moedl = trained_ppo
df_account_value_ppo, df_actions_ppo = DRLAgent.DRL_prediction(
    model=trained_moedl, 
    environment = e_trade_gym)

hit end!


In [111]:
trained_moedl = trained_td3
df_account_value_td3, df_actions_td3 = DRLAgent.DRL_prediction(
    model=trained_moedl, 
    environment = e_trade_gym)

hit end!


In [112]:
trained_moedl = trained_sac
df_account_value_sac, df_actions_sac = DRLAgent.DRL_prediction(
    model=trained_moedl, 
    environment = e_trade_gym)

hit end!


In [113]:
df_account_value_a2c.shape

(658, 2)

<a id='7'></a>
# Part 6.5: Mean Variance Optimization

In [114]:
mvo_df.head()

Unnamed: 0,date,tic,close
0,2009-01-02,AAPL,2.758535
1,2009-01-02,AMGN,43.832634
2,2009-01-02,AXP,15.365306
3,2009-01-02,BA,33.94109
4,2009-01-02,CAT,31.579327


In [115]:
mvo_df

Unnamed: 0,date,tic,close
0,2009-01-02,AAPL,2.758535
1,2009-01-02,AMGN,43.832634
2,2009-01-02,AXP,15.365306
3,2009-01-02,BA,33.941090
4,2009-01-02,CAT,31.579327
...,...,...,...
102974,2023-02-09,UNH,485.730011
102975,2023-02-09,V,229.350006
102976,2023-02-09,VZ,39.810001
102977,2023-02-09,WBA,35.342278


In [116]:
fst = mvo_df
fst = fst.iloc[0*29:0*29+29, :]
tic = fst['tic'].tolist()

mvo = pd.DataFrame()

for k in range(len(tic)):
  mvo[tic[k]] = 0

for i in range(mvo_df.shape[0]//29):
  n = mvo_df
  n = n.iloc[i*29:i*29+29, :]
  date = n['date'][i*29]
  mvo.loc[date] = n['close'].tolist()

In [117]:
mvo.shape[0]

3551

In [118]:
mvo

Unnamed: 0,AAPL,AMGN,AXP,BA,CAT,CRM,CSCO,CVX,DIS,GS,...,MRK,MSFT,NKE,PG,TRV,UNH,V,VZ,WBA,WMT
2009-01-02,2.758535,43.832634,15.365306,33.941090,31.579327,8.505000,11.948338,43.677189,20.597494,69.747612,...,17.947050,15.162146,11.144301,41.015411,32.030632,22.703100,12.079287,16.240593,17.517094,41.618896
2009-01-05,2.874957,44.323048,15.858130,34.631161,31.020584,8.337500,12.054011,43.757122,20.235834,71.371521,...,17.674953,15.303852,11.224109,40.721523,31.555828,22.332808,12.165179,15.227900,18.401514,41.138504
2009-01-06,2.827537,43.349655,16.748425,34.736183,30.832092,8.650000,12.533072,44.150948,20.933327,71.315254,...,17.350746,15.482843,10.997277,40.603954,30.592075,21.806171,13.021853,14.984101,18.312380,40.774574
2009-01-07,2.766439,43.245621,16.042887,33.573555,29.398203,8.000000,12.201953,42.215961,19.960279,67.930779,...,17.072865,14.550591,10.598215,39.892052,29.380297,21.641596,12.739308,15.174374,18.531778,40.425209
2009-01-08,2.817809,44.033253,16.066935,33.596058,29.633816,8.227500,12.356951,42.375786,19.719173,68.662315,...,16.997597,15.005531,10.793545,39.454475,29.925962,21.978970,12.603684,15.407464,18.401514,37.397320
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-02-03,154.264465,243.026794,178.860001,206.009995,247.759995,171.039993,48.630001,167.965149,110.709999,369.950012,...,102.940002,257.704529,127.610001,142.610001,182.759995,472.019989,229.680145,41.509998,36.605560,141.710007
2023-02-06,151.498688,241.718353,176.479996,206.809998,251.419998,169.050003,47.570000,168.153488,109.870003,370.799988,...,104.029999,256.128448,125.730003,141.399994,185.990005,475.239990,228.991501,41.279999,35.806137,140.679993
2023-02-07,154.414230,241.867035,178.699997,214.759995,249.660004,171.279999,47.840000,172.564484,111.629997,374.399994,...,105.680000,266.891510,125.330002,140.020004,189.009995,476.880005,230.867828,40.549999,36.260132,140.979996
2023-02-08,151.688400,238.100250,179.000000,213.500000,248.869995,169.630005,46.959999,168.510330,111.779999,375.100006,...,106.639999,266.063599,122.910004,138.570007,187.389999,483.220001,229.750000,40.520000,36.082481,140.220001


### Helper functions

### Calculate mean returns and variance-covariance matrix

In [119]:
# Obtain optimal portfolio sets that maximize return and minimize risk

#Dependencies
import numpy as np
import pandas as pd


#input k-portfolio 1 dataset comprising 15 stocks
# StockFileName = './DJIA_Apr112014_Apr112019_kpf1.csv'

Rows = int(len(mvo_df)/29)  #excluding header 
Columns = 15  #excluding date
portfolioSize = 29 #set portfolio size


#read stock prices in a dataframe
# df = pd.read_csv(StockFileName,  nrows= Rows)

#extract asset labels
# assetLabels = df.columns[1:Columns+1].tolist()
# print(assetLabels)

#extract asset prices
# StockData = df.iloc[0:, 1:]
StockData = mvo.head(mvo.shape[0]-int(len(trade)))
TradeData = mvo.tail(int(len(trade)))
# df.head()
TradeData.to_numpy()


array([[ 89.495, 234.614,  91.078, ...,  47.767,  36.29 , 115.097],
       [ 89.495, 237.484,  91.35 , ...,  47.872,  37.267, 114.636],
       [ 91.889, 235.654,  93.529, ...,  48.265,  38.314, 114.328],
       ...,
       [154.414, 241.867, 178.7  , ...,  40.55 ,  36.26 , 140.98 ],
       [151.688, 238.1  , 179.   , ...,  40.52 ,  36.082, 140.22 ],
       [150.64 , 237.902, 179.37 , ...,  39.81 ,  35.342, 141.52 ]])

In [120]:
#compute asset returns
arStockPrices = np.asarray(StockData)
[Rows, Cols]=arStockPrices.shape
arReturns = helper.StockReturnsComputing(arStockPrices, Rows, Cols)


#compute mean returns and variance covariance matrix of returns
meanReturns = np.mean(arReturns, axis = 0)
covReturns = np.cov(arReturns, rowvar=False)
 
#set precision for printing results
np.set_printoptions(precision=3, suppress = True)

#display mean returns and variance-covariance matrix of returns
print('Mean returns of assets in k-portfolio 1\n', meanReturns)
print('Variance-Covariance matrix of returns\n', covReturns)


Mean returns of assets in k-portfolio 1
 [0.136 0.068 0.086 0.083 0.066 0.134 0.06  0.035 0.072 0.056 0.103 0.073
 0.033 0.076 0.047 0.073 0.042 0.056 0.054 0.056 0.103 0.089 0.041 0.053
 0.104 0.11  0.044 0.042 0.042]
Variance-Covariance matrix of returns
 [[3.156 1.066 1.768 1.669 1.722 1.814 1.569 1.302 1.302 1.811 1.303 1.432
  1.218 1.674 0.74  1.839 0.719 0.884 1.241 0.823 1.561 1.324 0.752 1.027
  1.298 1.466 0.657 1.078 0.631]
 [1.066 2.571 1.306 1.123 1.193 1.319 1.116 1.053 1.045 1.269 1.068 1.089
  0.899 1.218 0.926 1.391 0.682 0.727 1.025 1.156 1.166 0.984 0.798 0.956
  1.259 1.111 0.688 1.091 0.682]
 [1.768 1.306 4.847 2.73  2.6   2.128 1.944 2.141 2.17  3.142 1.932 2.283
  1.56  2.012 0.993 3.707 1.094 1.319 1.845 1.236 1.899 1.894 1.041 1.921
  1.823 2.314 0.986 1.421 0.707]
 [1.669 1.123 2.73  4.892 2.363 1.979 1.7   2.115 1.959 2.387 1.773 2.319
  1.571 1.797 0.968 2.597 1.144 1.298 1.643 1.071 1.615 1.775 0.91  1.666
  1.707 1.784 0.82  1.345 0.647]
 [1.722 1.193 2.6 

In [121]:
from module.efficient_frontier import EfficientFrontier

ef_mean = EfficientFrontier(meanReturns, covReturns, weight_bounds=(0, 0.5))
raw_weights_mean = ef_mean.max_sharpe()
cleaned_weights_mean = ef_mean.clean_weights()
mvo_weights = np.array([1000000 * cleaned_weights_mean[i] for i in range(29)])
mvo_weights

array([424250.,      0.,      0.,      0.,      0., 108650.,      0.,
            0.,      0.,      0., 181450.,      0.,      0.,      0.,
            0.,      0.,      0.,      0.,      0.,      0.,  16960.,
            0.,      0.,      0., 133540., 135150.,      0.,      0.,
            0.])

In [122]:
StockData.tail(1)

Unnamed: 0,AAPL,AMGN,AXP,BA,CAT,CRM,CSCO,CVX,DIS,GS,...,MRK,MSFT,NKE,PG,TRV,UNH,V,VZ,WBA,WMT
2020-06-30,89.664169,216.90239,91.775711,183.300003,118.86924,187.330002,42.848248,78.808449,111.510002,187.060165,...,67.950195,198.444733,95.878021,111.771385,107.692001,284.97879,189.604019,48.16901,37.630482,115.184006


In [123]:
LastPrice = np.array([1/p for p in StockData.tail(1).to_numpy()[0]])
Initial_Portfolio = np.multiply(mvo_weights, LastPrice)
Initial_Portfolio

array([4731.544,    0.   ,    0.   ,    0.   ,    0.   ,  579.993,
          0.   ,    0.   ,    0.   ,    0.   ,  766.211,    0.   ,
          0.   ,    0.   ,    0.   ,    0.   ,    0.   ,    0.   ,
          0.   ,    0.   ,   85.465,    0.   ,    0.   ,    0.   ,
        468.596,  712.801,    0.   ,    0.   ,    0.   ])

In [124]:
Portfolio_Assets = TradeData @ Initial_Portfolio
MVO_result = pd.DataFrame(Portfolio_Assets, columns=["Mean Var"])
MVO_result

Unnamed: 0,Mean Var
2020-07-01,1.001917e+06
2020-07-02,1.004234e+06
2020-07-06,1.023225e+06
2020-07-07,1.014021e+06
2020-07-08,1.029461e+06
...,...
2023-02-03,1.490038e+06
2023-02-06,1.474972e+06
2023-02-07,1.489967e+06
2023-02-08,1.474837e+06


<a id='6'></a>
# Part 7: Backtesting Results
Backtesting plays a key role in evaluating the performance of a trading strategy. Automated backtesting tool is preferred because it reduces the human error. We usually use the Quantopian pyfolio package to backtest our trading strategies. It is easy to use and consists of various individual plots that provide a comprehensive image of the performance of a trading strategy.

In [125]:
df_result_a2c = df_account_value_a2c.set_index(df_account_value_a2c.columns[0])
df_result_ddpg = df_account_value_ddpg.set_index(df_account_value_ddpg.columns[0])
df_result_td3 = df_account_value_td3.set_index(df_account_value_td3.columns[0])
df_result_ppo = df_account_value_ppo.set_index(df_account_value_ppo.columns[0])
df_result_sac = df_account_value_sac.set_index(df_account_value_sac.columns[0])

result = pd.merge(df_result_a2c, df_result_ddpg, left_index=True, right_index=True)
result = pd.merge(result, df_result_td3, left_index=True, right_index=True)
result = pd.merge(result, df_result_ppo, left_index=True, right_index=True)
result = pd.merge(result, df_result_sac, left_index=True, right_index=True)
result = pd.merge(result, MVO_result, left_index=True, right_index=True)
result.columns = ['a2c', 'ddpg', 'td3', 'ppo', 'sac', 'mean var']

In [126]:
plt.rcParams["figure.figsize"] = (15,5)
plt.figure()
result.plot()
plt.savefig("trained_models/models_" + ".jpg")

In [127]:
result.tail(1)

Unnamed: 0_level_0,a2c,ddpg,td3,ppo,sac,mean var
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2023-02-09,1383442.0,1251458.0,1328701.0,1417191.0,1493857.0,1468825.0


In [222]:
MVO_result

Unnamed: 0_level_0,account_value
date,Unnamed: 1_level_1
2020-07-01,1.001917e+06
2020-07-02,1.004234e+06
2020-07-06,1.023225e+06
2020-07-07,1.014021e+06
2020-07-08,1.029461e+06
...,...
2023-02-03,1.490038e+06
2023-02-06,1.474972e+06
2023-02-07,1.489967e+06
2023-02-08,1.474837e+06


In [None]:
MVO_result['account_value'] = MVO_result['Mean Var']
MVO_result = MVO_result.drop("Mean Var", axis=1)
MVO_result.index.name='date'
MVO_result['date'] = MVO_result.index

In [224]:

now = datetime.datetime.now().strftime('%Y%m%d-%Hh%M')

perf_stats_sac = helper.backtest_stats(account_value = df_account_value_sac)
perf_stats_sac = pd.DataFrame(perf_stats_sac)

perf_stats_ddpg = helper.backtest_stats(account_value = df_account_value_ddpg)
perf_stats_ddpg = pd.DataFrame(perf_stats_ddpg)

perf_stats_ppo = helper.backtest_stats(account_value = df_account_value_ppo)
perf_stats_ppo = pd.DataFrame(perf_stats_ppo)

perf_stats_td3 = helper.backtest_stats(account_value = df_account_value_td3)
perf_stats_td3 = pd.DataFrame(perf_stats_td3)

perf_stats_a2c = helper.backtest_stats(account_value = df_account_value_a2c)
perf_stats_a2c = pd.DataFrame(perf_stats_a2c)

perf_stats_mvo = helper.backtest_stats(account_value = MVO_result)
perf_stats_mvo = pd.DataFrame(perf_stats_mvo)


Annual return          0.166156
Cumulative returns     0.493857
Annual volatility      0.185149
Sharpe ratio           0.924061
Calmar ratio           0.721552
Stability              0.348040
Max drawdown          -0.230276
Omega ratio            1.165591
Sortino ratio          1.373467
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.034576
Daily value at risk   -0.022648
dtype: float64
Annual return          0.089703
Cumulative returns     0.251458
Annual volatility      0.170235
Sharpe ratio           0.590576
Calmar ratio           0.390584
Stability              0.112480
Max drawdown          -0.229665
Omega ratio            1.104479
Sortino ratio          0.839012
Skew                        NaN
Kurtosis                    NaN
Tail ratio             0.940607
Daily value at risk   -0.021049
dtype: float64
Annual return          0.142862
Cumulative returns     0.417191
Annual volatility      0.213350
Sharpe ratio           0.733649
Calmar rat

In [159]:
baseline_df = YahooDownloader(
        ticker_list =["^DJI"], 
        start_date = TRADE_START_DATE,
        end_date = TRADE_END_DATE).fetch_data()

stats = helper.backtest_stats(baseline_df, value_col_name = 'close')

[*********************100%***********************]  1 of 1 completed
Shape of DataFrame:  (658, 8)
Annual return          0.108788
Cumulative returns     0.309497
Annual volatility      0.164130
Sharpe ratio           0.712314
Calmar ratio           0.495826
Stability              0.243378
Max drawdown          -0.219408
Omega ratio            1.127036
Sortino ratio          1.016757
Skew                        NaN
Kurtosis                    NaN
Tail ratio             0.968538
Daily value at risk   -0.020214
dtype: float64


In [238]:

Annual_return_sac = perf_stats_sac.T['Annual return'][0]
Annual_return_ppo = perf_stats_ppo.T['Annual return'][0]
Annual_return_td3 = perf_stats_td3.T['Annual return'][0]
Annual_return_ddpg = perf_stats_ddpg.T['Annual return'][0]
Annual_return_a2c = perf_stats_a2c.T['Annual return'][0]
Annual_return_mvo = perf_stats_mvo.T['Annual return'][0]

In [285]:
def get_best_model():

    Models = []
    if Annual_return_sac >= Annual_return_mvo:
        Models.append(trained_sac)
        
    elif Annual_return_a2c >= Annual_return_mvo:
        Models.append(trained_a2c)

    elif Annual_return_ddpg >= Annual_return_mvo:
        Models.append(trained_ddpg)

    elif Annual_return_ppo >= Annual_return_mvo:
        Models.append(trained_ppo)

    elif Annual_return_td3 >= Annual_return_mvo:
        Models.append(trained_td3)

    return Models

In [315]:
YESTERDAY = datetime.datetime.today() - datetime.timedelta(days=1)
DELTA = -(datetime.datetime.strptime('2009-01-01', '%Y-%M-%d') - YESTERDAY).days

In [321]:
YESTERDAY - datetime.timedelta(days=DELTA*0.2)

datetime.datetime(2020, 4, 26, 8, 21, 46, 308468)