<a href="https://colab.research.google.com/github/AI4Finance-Foundation/FinRL/blob/master/FinRL_StockTrading_NeurIPS_2018.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Reinforcement Learning for Stock Trading from Scratch: Multiple Stock Trading

* **Pytorch Version** 



# Content

* [1. Problem Definition](#0)
* [2. Getting Started - Load Python packages](#1)
    * [2.1. Install Packages](#1.1)    
    * [2.2. Check Additional Packages](#1.2)
    * [2.3. Import Packages](#1.3)
    * [2.4. Create Folders](#1.4)
* [3. Download Data](#2)
* [4. Preprocess Data](#3)        
    * [4.1. Technical Indicators](#3.1)
    * [4.2. Perform Feature Engineering](#3.2)
* [5.Build Environment](#4)  
    * [5.1. Training & Trade Data Split](#4.1)
    * [5.2. User-defined Environment](#4.2)   
    * [5.3. Initialize Environment](#4.3)    
* [6.Implement DRL Algorithms](#5)  
* [7.Backtesting Performance](#6)  
    * [7.1. BackTestStats](#6.1)
    * [7.2. BackTestPlot](#6.2)   
    * [7.3. Baseline Stats](#6.3)   
    * [7.3. Compare to Stock Market Index](#6.4)   
* [RLlib Section](#7)            

<a id='0'></a>
# Part 1. Problem Definition

This problem is to design an automated trading solution for single stock trading. We model the stock trading process as a Markov Decision Process (MDP). We then formulate our trading goal as a maximization problem.

The algorithm is trained using Deep Reinforcement Learning (DRL) algorithms and the components of the reinforcement learning environment are:


* Action: The action space describes the allowed actions that the agent interacts with the
environment. Normally, a ∈ A includes three actions: a ∈ {−1, 0, 1}, where −1, 0, 1 represent
selling, holding, and buying one stock. Also, an action can be carried upon multiple shares. We use
an action space {−k, ..., −1, 0, 1, ..., k}, where k denotes the number of shares. For example, "Buy
10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively

* Reward function: r(s, a, s′) is the incentive mechanism for an agent to learn a better action. The change of the portfolio value when action a is taken at state s and arriving at new state s',  i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio
values at state s′ and s, respectively

* State: The state space describes the observations that the agent receives from the environment. Just as a human trader needs to analyze various information before executing a trade, so
our trading agent observes many different features to better learn in an interactive environment.

* Environment: Dow 30 consituents


The data of the single stock that we will be using for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume.


<a id='1'></a>
# Part 2. Getting Started- Load Python Packages

<a id='1.1'></a>
## 2.1. Install all the packages through FinRL library


In [1]:
from finrl import config
from finrl import config_tickers
import os
if not os.path.exists("./" + config.DATA_SAVE_DIR):
    os.makedirs("./" + config.DATA_SAVE_DIR)
if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
    os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
    os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
if not os.path.exists("./" + config.RESULTS_DIR):
    os.makedirs("./" + config.RESULTS_DIR)


<a id='1.2'></a>
## 2.2. Check if the additional packages needed are present, if not install them. 
* Yahoo Finance API
* pandas
* numpy
* matplotlib
* stockstats
* OpenAI gym
* stable-baselines
* tensorflow
* pyfolio

<a id='1.3'></a>
## 2.3. Import Packages

In [208]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
# matplotlib.use('Agg')
import datetime

%matplotlib inline
from finrl.finrl_meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.finrl_meta.preprocessor.preprocessors import FeatureEngineer, data_split
from finrl.finrl_meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.agents.stablebaselines3.models import DRLAgent
from finrl.finrl_meta.data_processor import DataProcessor

from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline
from pprint import pprint

import sys
sys.path.append("../FinRL-Library")

import itertools

import warnings
warnings.filterwarnings('ignore')

import yfinance as yf

<a id='1.4'></a>
## 2.4. Create Folders

<a id='2'></a>
# Part 3. Download Data
Yahoo Finance is a website that provides stock data, financial news, financial reports, etc. All the data provided by Yahoo Finance is free.
* FinRL uses a class **YahooDownloader** to fetch data from Yahoo Finance API
* Call Limit: Using the Public API (without authentication), you are limited to 2,000 requests per hour per IP (or up to a total of 48,000 requests a day).




-----
class YahooDownloader:
    Provides methods for retrieving daily stock data from
    Yahoo Finance API

    Attributes
    ----------
        start_date : str
            start date of the data (modified from config.py)
        end_date : str
            end date of the data (modified from config.py)
        ticker_list : list
            a list of stock tickers (modified from config.py)

    Methods
    -------
    fetch_data()
        Fetches data from yahoo API


In [363]:
# from config.py TRAIN_START_DATE is a string
config.TRAIN_START_DATE

'2014-01-01'

In [364]:
# from config.py TRAIN_END_DATE is a string
config.TRAIN_END_DATE

'2020-07-31'

In [232]:
CRYPTO_TICKER = ['BTC-USD', 'ETH-USD', 'USDT-USD', 'BNB-USD']
#CRYPTO_TICKER = ['ETH-USD', 'BTC-USD']
GLOBAL_TICKER = ['^STOXX50E', '^GSPC']

TOTAL_TICKER = CRYPTO_TICKER + GLOBAL_TICKER

In [349]:
df = YahooDownloader(start_date = '2017-11-09',
                     end_date = '2022-04-30',
                     ticker_list = CRYPTO_TICKER).fetch_data()

# We start gather data on the 2017-11-09 since it is the first day where we have data on eth on yfinance

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
Shape of DataFrame:  (6533, 8)


In [350]:
print(CRYPTO_TICKER)

['BTC-USD', 'ETH-USD', 'USDT-USD', 'BNB-USD']


In [351]:
df.shape

(6533, 8)

In [352]:
df.sort_values(['date','tic'],ignore_index=True).head()

Unnamed: 0,date,open,high,low,close,volume,tic,day
0,2017-11-08,7141.379883,7776.419922,7114.02002,7459.689941,4602200064,BTC-USD,2
1,2017-11-09,2.05314,2.17423,1.89394,1.99077,19192200,BNB-USD,3
2,2017-11-09,7446.830078,7446.830078,7101.52002,7143.580078,3226249984,BTC-USD,3
3,2017-11-09,308.644989,329.451996,307.056,320.884003,893249984,ETH-USD,3
4,2017-11-09,1.01087,1.01327,0.996515,1.00818,358188000,USDT-USD,3


In [353]:
df[df['tic'] == 'ETH-USD'].sort_values(['date','tic'],ignore_index=True).head()

Unnamed: 0,date,open,high,low,close,volume,tic,day
0,2017-11-09,308.644989,329.451996,307.056,320.884003,893249984,ETH-USD,3
1,2017-11-10,320.67099,324.717987,294.541992,299.252991,885985984,ETH-USD,4
2,2017-11-11,298.585999,319.453003,298.191986,314.681,842300992,ETH-USD,5
3,2017-11-12,314.690002,319.153015,298.513,307.90799,1613479936,ETH-USD,6
4,2017-11-13,307.024994,328.415009,307.024994,316.716003,1041889984,ETH-USD,0


In [354]:
df[df['tic'] == 'BTC-USD'].sort_values(['date','tic'],ignore_index=True).head()

Unnamed: 0,date,open,high,low,close,volume,tic,day
0,2017-11-08,7141.379883,7776.419922,7114.02002,7459.689941,4602200064,BTC-USD,2
1,2017-11-09,7446.830078,7446.830078,7101.52002,7143.580078,3226249984,BTC-USD,3
2,2017-11-10,7173.72998,7312.0,6436.870117,6618.140137,5208249856,BTC-USD,4
3,2017-11-11,6618.609863,6873.149902,6204.220215,6357.600098,4908680192,BTC-USD,5
4,2017-11-12,6295.450195,6625.049805,5519.009766,5950.069824,8957349888,BTC-USD,6


In [355]:
df['tic'].iloc[0]

'BTC-USD'

In [356]:
def get_ticker_indicator_df(df, ticker_to_trade, ticker_indicator, 
                            stock_indicators = ['open', 'high', 'low', 'close', 'volume']):
    
    df_indicator = df[df['tic'] == ticker_indicator].sort_values(['date','tic'],ignore_index=True)
    for indicator in stock_indicators:
        df_indicator = df_indicator.rename(columns = {indicator : f'{indicator}_{df_indicator.tic.iloc[0]}'})
    df_indicator = df_indicator.drop(['tic', 'day'], axis = 1)
    
    df = df[df['tic'] == ticker_to_trade]
    df = pd.merge(df, df_indicator, on = 'date')
    return df

In [357]:
def add_ticker_indicators_df(df, ticker_to_trade, tickers_indicators,
                            stock_indicators = ['open', 'high', 'low', 'close', 'volume']):
    
    dfs_to_merge = []
    for ticker_indicator in tickers_indicators:
        if(ticker_indicator != ticker_to_trade):
            dfs_to_merge.append(get_ticker_indicator_df(df, ticker_to_trade, ticker_indicator))

    df_indicators = dfs_to_merge[0]
    for counter in range(1, len(dfs_to_merge)):
        df_indicators = pd.merge(df_indicators, dfs_to_merge[counter], on = ['date', 'open', 'high', 'low', 'close', 'volume', 'tic', 'day'])
    return df_indicators


In [358]:
df_indicators = add_ticker_indicators_df(df, 'ETH-USD', CRYPTO_TICKER)

In [360]:
df_indicators

Unnamed: 0,date,open,high,low,close,volume,tic,day,open_BTC-USD,high_BTC-USD,...,open_USDT-USD,high_USDT-USD,low_USDT-USD,close_USDT-USD,volume_USDT-USD,open_BNB-USD,high_BNB-USD,low_BNB-USD,close_BNB-USD,volume_BNB-USD
0,2017-11-09,308.644989,329.451996,307.056000,320.884003,893249984,ETH-USD,3,7446.830078,7446.830078,...,1.010870,1.013270,0.996515,1.008180,358188000,2.053140,2.174230,1.893940,1.990770,19192200
1,2017-11-10,320.670990,324.717987,294.541992,299.252991,885985984,ETH-USD,4,7173.729980,7312.000000,...,1.006500,1.024230,0.995486,1.006010,756446016,2.007730,2.069470,1.644780,1.796840,11155000
2,2017-11-11,298.585999,319.453003,298.191986,314.681000,842300992,ETH-USD,5,6618.609863,6873.149902,...,1.005980,1.026210,0.995799,1.008990,746227968,1.786280,1.917750,1.614290,1.670470,8178150
3,2017-11-12,314.690002,319.153015,298.513000,307.907990,1613479936,ETH-USD,6,6295.450195,6625.049805,...,1.006020,1.105910,0.967601,1.012470,1466060032,1.668890,1.672800,1.462560,1.519690,15298700
4,2017-11-13,307.024994,328.415009,307.024994,316.716003,1041889984,ETH-USD,0,5938.250000,6811.189941,...,1.004480,1.029290,0.975103,1.009350,767884032,1.526010,1.735020,1.517600,1.686620,12238800
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1628,2022-04-25,2922.990234,3018.415527,2804.507080,3009.393555,22332690614,ETH-USD,0,39472.605469,40491.753906,...,1.000346,1.000393,1.000103,1.000145,73256647805,399.129913,404.873901,383.190674,404.350281,1910707893
1629,2022-04-26,3008.946289,3026.415039,2786.253174,2808.298340,19052045399,ETH-USD,1,40448.421875,40713.890625,...,1.000131,1.000265,1.000040,1.000073,69068297656,404.268860,407.084503,382.080994,385.483063,1671963898
1630,2022-04-27,2808.645996,2911.877441,2802.273438,2888.929688,17419284041,ETH-USD,2,38120.300781,39397.917969,...,1.000059,1.000284,1.000013,1.000153,61043327313,385.562164,394.458527,384.077698,391.445831,1512587369
1631,2022-04-28,2888.849854,2973.135010,2861.821533,2936.940918,18443524633,ETH-USD,3,39241.429688,40269.464844,...,1.000155,1.000292,1.000028,1.000191,68392086439,391.438660,408.232452,388.877655,406.718201,2116381172


In [359]:
for ticker in CRYPTO_TICKER:
    print(f'{ticker} : {len(df[df.tic == ticker])}')

BTC-USD : 1634
ETH-USD : 1633
USDT-USD : 1633
BNB-USD : 1633


We have one more value for BTC (it starts one day before) we will delete it.
For the global ticker we don't have the values on the week-end days so we will interpolate these values.
There is many ways to interpolate these values, we will start by taking the values of Friday for Saturday and the values of Monday for Sunday

In [186]:
# For now, we only start with crypto data so we don't have to do the interpolation
# We just drop the first row of the df

#df = df.drop([0])

# Part 4: Preprocess Data
Data preprocessing is a crucial step for training a high quality machine learning model. We need to check for missing data and do feature engineering in order to convert the data into a model-ready state.
* Add technical indicators. In practical trading, various information needs to be taken into account, for example the historical stock prices, current holding shares, technical indicators, etc. In this article, we demonstrate two trend-following technical indicators: MACD and RSI.
* Add turbulence index. Risk-aversion reflects whether an investor will choose to preserve the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the financial turbulence index that measures extreme asset price fluctuation.

## STILL NEED TO ADD FE ON THE NEW TICKERS

In [366]:
df_indicators.sort_values(by=["tic", "date"])

Unnamed: 0,date,open,high,low,close,volume,tic,day,open_BTC-USD,high_BTC-USD,...,open_USDT-USD,high_USDT-USD,low_USDT-USD,close_USDT-USD,volume_USDT-USD,open_BNB-USD,high_BNB-USD,low_BNB-USD,close_BNB-USD,volume_BNB-USD
0,2017-11-09,308.644989,329.451996,307.056000,320.884003,893249984,ETH-USD,3,7446.830078,7446.830078,...,1.010870,1.013270,0.996515,1.008180,358188000,2.053140,2.174230,1.893940,1.990770,19192200
1,2017-11-10,320.670990,324.717987,294.541992,299.252991,885985984,ETH-USD,4,7173.729980,7312.000000,...,1.006500,1.024230,0.995486,1.006010,756446016,2.007730,2.069470,1.644780,1.796840,11155000
2,2017-11-11,298.585999,319.453003,298.191986,314.681000,842300992,ETH-USD,5,6618.609863,6873.149902,...,1.005980,1.026210,0.995799,1.008990,746227968,1.786280,1.917750,1.614290,1.670470,8178150
3,2017-11-12,314.690002,319.153015,298.513000,307.907990,1613479936,ETH-USD,6,6295.450195,6625.049805,...,1.006020,1.105910,0.967601,1.012470,1466060032,1.668890,1.672800,1.462560,1.519690,15298700
4,2017-11-13,307.024994,328.415009,307.024994,316.716003,1041889984,ETH-USD,0,5938.250000,6811.189941,...,1.004480,1.029290,0.975103,1.009350,767884032,1.526010,1.735020,1.517600,1.686620,12238800
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1628,2022-04-25,2922.990234,3018.415527,2804.507080,3009.393555,22332690614,ETH-USD,0,39472.605469,40491.753906,...,1.000346,1.000393,1.000103,1.000145,73256647805,399.129913,404.873901,383.190674,404.350281,1910707893
1629,2022-04-26,3008.946289,3026.415039,2786.253174,2808.298340,19052045399,ETH-USD,1,40448.421875,40713.890625,...,1.000131,1.000265,1.000040,1.000073,69068297656,404.268860,407.084503,382.080994,385.483063,1671963898
1630,2022-04-27,2808.645996,2911.877441,2802.273438,2888.929688,17419284041,ETH-USD,2,38120.300781,39397.917969,...,1.000059,1.000284,1.000013,1.000153,61043327313,385.562164,394.458527,384.077698,391.445831,1512587369
1631,2022-04-28,2888.849854,2973.135010,2861.821533,2936.940918,18443524633,ETH-USD,3,39241.429688,40269.464844,...,1.000155,1.000292,1.000028,1.000191,68392086439,391.438660,408.232452,388.877655,406.718201,2116381172


In [361]:
fe = FeatureEngineer(
                    use_technical_indicator=True,
                    tech_indicator_list = config.INDICATORS,
                    use_vix=True,
                    use_turbulence=True,
                    user_defined_feature = False)

#processed = fe.preprocess_data(df)
processed = fe.preprocess_data(df_indicators)

Invalid number of return arguments after parsing column name: 'date'


KeyError: "None of [Index(['tic', 'date', 'date'], dtype='object')] are in the [columns]"

In [342]:
list_ticker = processed["tic"].unique().tolist()
list_date = list(pd.date_range(processed['date'].min(),processed['date'].max()).astype(str))
combination = list(itertools.product(list_date,list_ticker))

processed_full = pd.DataFrame(combination,columns=["date","tic"]).merge(processed,on=["date","tic"],how="left")
processed_full = processed_full[processed_full['date'].isin(processed['date'])]
processed_full = processed_full.sort_values(['date','tic'])

processed_full = processed_full.fillna(0)

# Change 'date' to index of the df

#processed_full = processed_full.set_index('date')

In [348]:
processed_full['2017-11-13']

KeyError: '2017-11-13'

In [344]:
processed_full.sort_values(['date','tic'],ignore_index=False).head(10)

Unnamed: 0_level_0,tic,open,high,low,close,volume,day,open_BTC-USD,high_BTC-USD,low_BTC-USD,...,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2017-11-09,ETH-USD,308.644989,329.451996,307.056,320.884003,893250000.0,3.0,7446.830078,7446.830078,7101.52002,...,0.0,340.659367,279.477626,0.0,-66.666667,100.0,320.884003,320.884003,10.5,0.0
2017-11-10,ETH-USD,320.67099,324.717987,294.541992,299.252991,885986000.0,4.0,7173.72998,7312.0,6436.870117,...,-0.485311,340.659367,279.477626,0.0,-66.666667,100.0,310.068497,310.068497,11.29,0.0
2017-11-13,ETH-USD,307.024994,328.415009,307.024994,316.716003,1041890000.0,0.0,5938.25,6811.189941,5844.290039,...,0.085548,328.844885,294.93191,47.098193,70.924692,5.03849,311.888397,311.888397,11.5,0.0
2017-11-14,ETH-USD,316.763,340.177002,316.763,337.631012,1069680000.0,1.0,6561.47998,6764.97998,6461.75,...,1.164355,342.098056,290.25961,63.231222,149.767947,36.129409,316.178833,316.178833,11.59,0.0
2017-11-15,ETH-USD,337.963989,340.911987,329.812988,333.356995,722666000.0,2.0,6634.759766,7342.25,6634.759766,...,1.642529,345.62287,291.642842,59.40169,125.701937,37.758889,318.632856,318.632856,13.13,0.0
2017-11-16,ETH-USD,333.442993,336.158997,323.605988,330.924011,797254000.0,3.0,7323.240234,7967.379883,7176.580078,...,1.821038,346.625465,293.713036,57.356094,75.067931,11.970775,320.16925,320.16925,11.76,0.0
2017-11-17,ETH-USD,330.166992,334.963989,327.52301,332.394012,621733000.0,4.0,7853.569824,8004.589844,7561.089844,...,1.989876,347.582495,295.47262,58.254622,72.305814,11.970775,321.527557,321.527557,11.43,0.0
2017-11-20,ETH-USD,354.093994,372.136993,353.289001,366.730011,807027000.0,0.0,8039.069824,8336.860352,7949.359863,...,5.142005,369.551898,290.860771,72.633794,163.332724,62.328294,330.206334,330.206334,10.65,0.0
2017-11-21,ETH-USD,367.442993,372.470001,350.692993,360.401001,949912000.0,1.0,8205.740234,8348.660156,7762.709961,...,5.698686,373.755156,291.302846,68.002359,128.200588,55.075995,332.529001,332.529001,9.73,0.0
2017-11-22,ETH-USD,360.312012,381.420013,360.147003,380.652008,800819000.0,2.0,8077.950195,8302.259766,8075.470215,...,7.215027,383.194757,288.73796,73.578867,148.189742,61.427047,335.966359,335.966359,9.88,0.0


<a id='4'></a>
# Part 5. Design Environment
Considering the stochastic and interactive nature of the automated stock trading tasks, a financial task is modeled as a **Markov Decision Process (MDP)** problem. The training process involves observing stock price change, taking an action and reward's calculation to have the agent adjusting its strategy accordingly. By interacting with the environment, the trading agent will derive a trading strategy with the maximized rewards as time proceeds.

Our trading environments, based on OpenAI Gym framework, simulate live stock markets with real market data according to the principle of time-driven simulation.

The action space describes the allowed actions that the agent interacts with the environment. Normally, action a includes three actions: {-1, 0, 1}, where -1, 0, 1 represent selling, holding, and buying one share. Also, an action can be carried upon multiple shares. We use an action space {-k,…,-1, 0, 1, …, k}, where k denotes the number of shares to buy and -k denotes the number of shares to sell. For example, "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or -10, respectively. The continuous action space needs to be normalized to [-1, 1], since the policy is defined on a Gaussian distribution, which needs to be normalized and symmetric.

To apply this environment to the trade only on the Ethereum crypto, we will allow to trade only on 'ETH-USD

In [192]:
##### CHANGES ADDED TO STOCKTRADINGENV :
##### - change cost percetage of trades


import gym
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from gym import spaces
from gym.utils import seeding
from stable_baselines3.common.vec_env import DummyVecEnv
from typing import List
matplotlib.use("Agg")

# from stable_baselines3.common.logger import Logger, KVWriter, CSVOutputFormat


class StockTradingEnv(gym.Env):
    """A stock trading environment for OpenAI gym"""

    metadata = {"render.modes": ["human"]}

    def __init__(
        self,
        df: pd.DataFrame,
        stock_dim: int,
        hmax: int,
        initial_amount: int,
        num_stock_shares: List[int],
        buy_cost_pct: List[float],
        sell_cost_pct: List[float],
        reward_scaling: float,
        state_space: int,
        action_space: int,
        tech_indicator_list: List[str],
        turbulence_threshold=None,
        risk_indicator_col="turbulence",
        make_plots: bool =False,
        print_verbosity=10,
        day=0,
        initial=True,
        previous_state=[],
        model_name="",
        mode="",
        iteration="",
    ):
        self.day = day
        self.df = df
        self.stock_dim = stock_dim
        self.hmax = hmax
        self.num_stock_shares=num_stock_shares
        self.initial_amount = initial_amount # get the initial cash
        self.buy_cost_pct = buy_cost_pct
        self.sell_cost_pct = sell_cost_pct
        self.reward_scaling = reward_scaling
        self.state_space = state_space
        self.action_space = action_space
        self.tech_indicator_list = tech_indicator_list
        self.action_space = spaces.Box(low=-1, high=1, shape=(self.action_space,))
        self.observation_space = spaces.Box(
            low=-np.inf, high=np.inf, shape=(self.state_space,)
        )
        self.data = self.df.loc[self.day, :]
        self.terminal = False
        self.make_plots = make_plots
        self.print_verbosity = print_verbosity
        self.turbulence_threshold = turbulence_threshold
        self.risk_indicator_col = risk_indicator_col
        self.initial = initial
        self.previous_state = previous_state
        self.model_name = model_name
        self.mode = mode
        self.iteration = iteration
        # initalize state
        self.state = self._initiate_state()

        # initialize reward
        self.reward = 0
        self.turbulence = 0
        self.cost = 0
        self.trades = 0
        self.episode = 0
        # memorize all the total balance change
        self.asset_memory = [self.initial_amount+np.sum(np.array(self.num_stock_shares)*np.array(self.state[1:1+self.stock_dim]))] # the initial total asset is calculated by cash + sum (num_share_stock_i * price_stock_i)
        self.rewards_memory = []
        self.actions_memory = []
        self.state_memory=[] # we need sometimes to preserve the state in the middle of trading process 
        self.date_memory = [self._get_date()]
        #         self.logger = Logger('results',[CSVOutputFormat])
        # self.reset()
        self._seed()

    def _sell_stock(self, index, action):
        def _do_sell_normal():
            if self.state[index + 2*self.stock_dim + 1]!=True : # check if the stock is able to sell, for simlicity we just add it in techical index
            # if self.state[index + 1] > 0: # if we use price<0 to denote a stock is unable to trade in that day, the total asset calculation may be wrong for the price is unreasonable
                # Sell only if the price is > 0 (no missing data in this particular date)
                # perform sell action based on the sign of the action
                if self.state[index + self.stock_dim + 1] > 0:
                    # Sell only if current asset is > 0
                    sell_num_shares = min(
                        abs(action), self.state[index + self.stock_dim + 1]
                    )
                    sell_amount = (
                        self.state[index + 1]
                        * sell_num_shares
                        * (1 - self.sell_cost_pct[index])
                    )
                    # update balance
                    self.state[0] += sell_amount

                    self.state[index + self.stock_dim + 1] -= sell_num_shares
                    self.cost += (
                        self.state[index + 1] * sell_num_shares * self.sell_cost_pct[index]
                    )
                    self.trades += 1
                else:
                    sell_num_shares = 0
            else:
                sell_num_shares = 0

            return sell_num_shares

        # perform sell action based on the sign of the action
        if self.turbulence_threshold is not None:
            if self.turbulence >= self.turbulence_threshold:
                if self.state[index + 1] > 0:
                    # Sell only if the price is > 0 (no missing data in this particular date)
                    # if turbulence goes over threshold, just clear out all positions
                    if self.state[index + self.stock_dim + 1] > 0:
                        # Sell only if current asset is > 0
                        sell_num_shares = self.state[index + self.stock_dim + 1]
                        sell_amount = (
                            self.state[index + 1]
                            * sell_num_shares
                            * (1 - self.sell_cost_pct[index])
                        )
                        # update balance
                        self.state[0] += sell_amount
                        self.state[index + self.stock_dim + 1] = 0
                        self.cost += (
                            self.state[index + 1] * sell_num_shares * self.sell_cost_pct
                        )
                        self.trades += 1
                    else:
                        sell_num_shares = 0
                else:
                    sell_num_shares = 0
            else:
                sell_num_shares = _do_sell_normal()
        else:
            sell_num_shares = _do_sell_normal()

        return sell_num_shares

    def _buy_stock(self, index, action):
        def _do_buy():
            if self.state[index + 2*self.stock_dim+ 1] !=True: # check if the stock is able to buy
            # if self.state[index + 1] >0:
                # Buy only if the price is > 0 (no missing data in this particular date)
                available_amount = self.state[0] / (self.state[index + 1]*(1 + self.buy_cost_pct[index])) # when buying stocks, we should consider the cost of trading when calculating available_amount, or we may be have cash<0
                # print('available_amount:{}'.format(available_amount))

                # update balance
                buy_num_shares = min(available_amount, action)
                buy_amount = (
                    self.state[index + 1] * buy_num_shares * (1 + self.buy_cost_pct[index])
                )
                self.state[0] -= buy_amount

                self.state[index + self.stock_dim + 1] += buy_num_shares

                self.cost += self.state[index + 1] * buy_num_shares * self.buy_cost_pct[index]
                self.trades += 1
            else:
                buy_num_shares = 0

            return buy_num_shares

        # perform buy action based on the sign of the action
        if self.turbulence_threshold is None:
            buy_num_shares = _do_buy()
        else:
            if self.turbulence < self.turbulence_threshold:
                buy_num_shares = _do_buy()
            else:
                buy_num_shares = 0
                pass

        return buy_num_shares

    def _make_plot(self):
        plt.plot(self.asset_memory, "r")
        plt.savefig("results/account_value_trade_{}.png".format(self.episode))
        plt.close()

    def step(self, actions):
        self.terminal = self.day >= len(self.df.index.unique()) - 1
        if self.terminal:
            # print(f"Episode: {self.episode}")
            if self.make_plots:
                self._make_plot()
            end_total_asset = self.state[0] + sum(
                np.array(self.state[1 : (self.stock_dim + 1)])
                * np.array(self.state[(self.stock_dim + 1) : (self.stock_dim * 2 + 1)])
            )
            df_total_value = pd.DataFrame(self.asset_memory)
            tot_reward = (
                self.state[0]
                + sum(
                    np.array(self.state[1 : (self.stock_dim + 1)])
                    * np.array(
                        self.state[(self.stock_dim + 1) : (self.stock_dim * 2 + 1)]
                    )
                )
                - self.asset_memory[0]
            ) # initial_amount is only cash part of our initial asset
            df_total_value.columns = ["account_value"]
            df_total_value["date"] = self.date_memory
            df_total_value["daily_return"] = df_total_value["account_value"].pct_change(
                1
            )
            if df_total_value["daily_return"].std() != 0:
                sharpe = (
                    (252 ** 0.5)
                    * df_total_value["daily_return"].mean()
                    / df_total_value["daily_return"].std()
                )
            df_rewards = pd.DataFrame(self.rewards_memory)
            df_rewards.columns = ["account_rewards"]
            df_rewards["date"] = self.date_memory[:-1]
            if self.episode % self.print_verbosity == 0:
                print(f"day: {self.day}, episode: {self.episode}")
                print(f"begin_total_asset: {self.asset_memory[0]:0.2f}")
                print(f"end_total_asset: {end_total_asset:0.2f}")
                print(f"total_reward: {tot_reward:0.2f}")
                print(f"total_cost: {self.cost:0.2f}")
                print(f"total_trades: {self.trades}")
                if df_total_value["daily_return"].std() != 0:
                    print(f"Sharpe: {sharpe:0.3f}")
                print("=================================")

            if (self.model_name != "") and (self.mode != ""):
                df_actions = self.save_action_memory()
                df_actions.to_csv(
                    "results/actions_{}_{}_{}.csv".format(
                        self.mode, self.model_name, self.iteration
                    )
                )
                df_total_value.to_csv(
                    "results/account_value_{}_{}_{}.csv".format(
                        self.mode, self.model_name, self.iteration
                    ),
                    index=False,
                )
                df_rewards.to_csv(
                    "results/account_rewards_{}_{}_{}.csv".format(
                        self.mode, self.model_name, self.iteration
                    ),
                    index=False,
                )
                plt.plot(self.asset_memory, "r")
                plt.savefig(
                    "results/account_value_{}_{}_{}.png".format(
                        self.mode, self.model_name, self.iteration
                    ),
                    index=False,
                )
                plt.close()

            # Add outputs to logger interface
            # logger.record("environment/portfolio_value", end_total_asset)
            # logger.record("environment/total_reward", tot_reward)
            # logger.record("environment/total_reward_pct", (tot_reward / (end_total_asset - tot_reward)) * 100)
            # logger.record("environment/total_cost", self.cost)
            # logger.record("environment/total_trades", self.trades)

            return self.state, self.reward, self.terminal, {}

        else:
            actions = actions * self.hmax  # actions initially is scaled between 0 to 1
            actions = actions.astype(
                int
            )  # convert into integer because we can't by fraction of shares
            if self.turbulence_threshold is not None:
                if self.turbulence >= self.turbulence_threshold:
                    actions = np.array([-self.hmax] * self.stock_dim)
            begin_total_asset = self.state[0] + sum(
                np.array(self.state[1 : (self.stock_dim + 1)])
                * np.array(self.state[(self.stock_dim + 1) : (self.stock_dim * 2 + 1)])
            )
            # print("begin_total_asset:{}".format(begin_total_asset))

            argsort_actions = np.argsort(actions)
            sell_index = argsort_actions[: np.where(actions < 0)[0].shape[0]]
            buy_index = argsort_actions[::-1][: np.where(actions > 0)[0].shape[0]]

            for index in sell_index:
                # print(f"Num shares before: {self.state[index+self.stock_dim+1]}")
                # print(f'take sell action before : {actions[index]}')
                actions[index] = self._sell_stock(index, actions[index]) * (-1)
                # print(f'take sell action after : {actions[index]}')
                # print(f"Num shares after: {self.state[index+self.stock_dim+1]}")

            for index in buy_index:
                # print('take buy action: {}'.format(actions[index]))
                actions[index] = self._buy_stock(index, actions[index])

            self.actions_memory.append(actions)

            # state: s -> s+1
            self.day += 1
            self.data = self.df.loc[self.day, :]
            if self.turbulence_threshold is not None:
                if len(self.df.tic.unique()) == 1:
                    self.turbulence = self.data[self.risk_indicator_col]
                elif len(self.df.tic.unique()) > 1:
                    self.turbulence = self.data[self.risk_indicator_col].values[0]
            self.state = self._update_state()

            end_total_asset = self.state[0] + sum(
                np.array(self.state[1 : (self.stock_dim + 1)])
                * np.array(self.state[(self.stock_dim + 1) : (self.stock_dim * 2 + 1)])
            )
            self.asset_memory.append(end_total_asset)
            self.date_memory.append(self._get_date())
            self.reward = end_total_asset - begin_total_asset
            self.rewards_memory.append(self.reward)
            self.reward = self.reward * self.reward_scaling
            self.state_memory.append(self.state) # add current state in state_recorder for each step

        return self.state, self.reward, self.terminal, {}

    def reset(self):
        # initiate state
        self.state = self._initiate_state()

        if self.initial:
            self.asset_memory = [self.initial_amount+np.sum(np.array(self.num_stock_shares)*np.array(self.state[1:1+self.stock_dim]))]
        else:
            previous_total_asset = self.previous_state[0] + sum(
                np.array(self.state[1 : (self.stock_dim + 1)])
                * np.array(
                    self.previous_state[(self.stock_dim + 1) : (self.stock_dim * 2 + 1)]
                )
            )
            self.asset_memory = [previous_total_asset]

        self.day = 0
        self.data = self.df.loc[self.day, :]
        self.turbulence = 0
        self.cost = 0
        self.trades = 0
        self.terminal = False
        # self.iteration=self.iteration
        self.rewards_memory = []
        self.actions_memory = []
        self.date_memory = [self._get_date()]

        self.episode += 1

        return self.state

    def render(self, mode="human", close=False):
        return self.state

    def _initiate_state(self):
        if self.initial:
            # For Initial State
            if len(self.df.tic.unique()) > 1:
                # for multiple stock
                state = (
                    [self.initial_amount]
                    + self.data.close.values.tolist()
                    + self.num_stock_shares
                    + sum(
                        [
                            self.data[tech].values.tolist()
                            for tech in self.tech_indicator_list
                        ],
                        [],
                    )
                ) # append initial stocks_share to initial state, instead of all zero 
            else:
                # for single stock
                state = (
                    [self.initial_amount]
                    + [self.data.close]
                    + [0] * self.stock_dim
                    + sum([[self.data[tech]] for tech in self.tech_indicator_list], [])
                )
        else:
            # Using Previous State
            if len(self.df.tic.unique()) > 1:
                # for multiple stock
                state = (
                    [self.previous_state[0]]
                    + self.data.close.values.tolist()
                    + self.previous_state[
                        (self.stock_dim + 1) : (self.stock_dim * 2 + 1)
                    ]
                    + sum(
                        [
                            self.data[tech].values.tolist()
                            for tech in self.tech_indicator_list
                        ],
                        [],
                    )
                )
            else:
                # for single stock
                state = (
                    [self.previous_state[0]]
                    + [self.data.close]
                    + self.previous_state[
                        (self.stock_dim + 1) : (self.stock_dim * 2 + 1)
                    ]
                    + sum([[self.data[tech]] for tech in self.tech_indicator_list], [])
                )
        return state

    def _update_state(self):
        if len(self.df.tic.unique()) > 1:
            # for multiple stock
            state = (
                [self.state[0]]
                + self.data.close.values.tolist()
                + list(self.state[(self.stock_dim + 1) : (self.stock_dim * 2 + 1)])
                + sum(
                    [
                        self.data[tech].values.tolist()
                        for tech in self.tech_indicator_list
                    ],
                    [],
                )
            )

        else:
            # for single stock
            state = (
                [self.state[0]]
                + [self.data.close]
                + list(self.state[(self.stock_dim + 1) : (self.stock_dim * 2 + 1)])
                + sum([[self.data[tech]] for tech in self.tech_indicator_list], [])
            )

        return state

    def _get_date(self):
        if len(self.df.tic.unique()) > 1:
            date = self.data.date.unique()[0]
        else:
            date = self.data.date
        return date

    # add save_state_memory to preserve state in the trading process 
    def save_state_memory(self):
        if len(self.df.tic.unique()) > 1:
            # date and close price length must match actions length
            date_list = self.date_memory[:-1]
            df_date = pd.DataFrame(date_list)
            df_date.columns = ["date"]

            state_list = self.state_memory
            df_states = pd.DataFrame(state_list,columns=['cash','Bitcoin_price','Gold_price','Bitcoin_num','Gold_num','Bitcoin_Disable','Gold_Disable'])
            df_states.index = df_date.date
            # df_actions = pd.DataFrame({'date':date_list,'actions':action_list})
        else:
            date_list = self.date_memory[:-1]
            state_list = self.state_memory
            df_states = pd.DataFrame({"date": date_list, "states": state_list})
        # print(df_states)
        return df_states

    def save_asset_memory(self):
        date_list = self.date_memory
        asset_list = self.asset_memory
        # print(len(date_list))
        # print(len(asset_list))
        df_account_value = pd.DataFrame(
            {"date": date_list, "account_value": asset_list}
        )
        return df_account_value

    def save_action_memory(self):
        if len(self.df.tic.unique()) > 1:
            # date and close price length must match actions length
            date_list = self.date_memory[:-1]
            df_date = pd.DataFrame(date_list)
            df_date.columns = ["date"]

            action_list = self.actions_memory
            df_actions = pd.DataFrame(action_list)
            df_actions.columns = self.data.tic.values
            df_actions.index = df_date.date
            # df_actions = pd.DataFrame({'date':date_list,'actions':action_list})
        else:
            date_list = self.date_memory[:-1]
            action_list = self.actions_memory
            df_actions = pd.DataFrame({"date": date_list, "actions": action_list})
        return df_actions

    def _seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def get_sb_env(self):
        e = DummyVecEnv([lambda: self])
        obs = e.reset()
        return e, obs


## Training data split: 2017-11-09 to 2021-04-30
## Trade data split: 2021-04-30 to 2022-04-30

In [345]:
train = data_split(processed_full, '2017-11-09','2021-04-30')
trade = data_split(processed_full, '2021-04-30','2022-04-30')
print(len(train))
print(len(trade))

KeyError: 'date'

In [318]:
train.tail()

Unnamed: 0,date,tic,open,high,low,close,volume,day,open_BTC-USD,high_BTC-USD,...,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
867,2021-04-23,ETH-USD,2401.256348,2439.537109,2117.039551,2363.586182,55413930000.0,4.0,51739.808594,52120.792969,...,121.276468,2544.539819,1933.436316,59.978773,75.175189,10.103289,2107.293962,1896.099239,17.33,0.221329
868,2021-04-26,ETH-USD,2319.478027,2536.337402,2308.315186,2534.481689,35208330000.0,0.0,49077.792969,54288.003906,...,111.422086,2586.263595,1965.990348,61.8049,122.753614,22.91154,2175.542997,1935.927299,17.639999,0.981106
869,2021-04-27,ETH-USD,2534.03125,2676.392822,2485.375,2662.865234,32275970000.0,1.0,54030.304688,55416.964844,...,129.738387,2631.721397,1989.711342,63.860057,170.897929,30.988732,2207.92664,1956.207825,17.559999,0.39809
870,2021-04-28,ETH-USD,2664.685547,2757.477295,2564.081543,2746.380127,34269030000.0,2.0,55036.636719,56227.207031,...,149.272422,2701.459989,1985.75339,65.122918,187.115312,35.191218,2238.816479,1977.647941,17.280001,0.098852
871,2021-04-29,ETH-USD,2748.649658,2797.972412,2672.106689,2756.876953,32578130000.0,3.0,54858.089844,55115.84375,...,163.713109,2756.739902,1998.950283,65.280664,190.283286,37.231803,2269.177922,1999.995074,17.610001,0.019053


In [319]:
trade.head()

Unnamed: 0,date,tic,open,high,low,close,volume,day,open_BTC-USD,high_BTC-USD,...,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
0,2021-04-30,ETH-USD,2757.734131,2796.054932,2728.169922,2773.207031,29777180000.0,4.0,53568.664062,57900.71875,...,174.464045,2809.551665,2009.86501,65.531558,178.777421,37.231803,2297.672754,2020.136731,18.610001,0.009635
1,2021-05-03,ETH-USD,2951.175781,3450.037842,2951.175781,3431.086182,49174290000.0,0.0,56620.273438,58973.308594,...,255.668805,3172.448543,1920.251848,73.820994,251.430755,61.774865,2403.676428,2098.781047,18.309999,13.716108
2,2021-05-04,ETH-USD,3431.131592,3523.585938,3180.742676,3253.629395,62402050000.0,1.0,57214.179688,57214.179688,...,275.333805,3285.599999,1888.952833,69.101079,225.053368,63.54682,2442.359981,2127.453619,19.48,1.072426
3,2021-05-05,ETH-USD,3240.554688,3541.462646,3213.101562,3522.783203,48334200000.0,2.0,53252.164062,57911.363281,...,309.074107,3450.061169,1824.858363,71.918229,214.699672,63.98065,2489.523181,2158.587646,19.15,1.251543
4,2021-05-06,ETH-USD,3524.930908,3598.895996,3386.23999,3490.880371,44300390000.0,3.0,57441.308594,58363.316406,...,329.44165,3580.888165,1799.92475,71.123115,195.560187,65.39502,2535.27323,2188.049756,18.389999,0.125952


In [328]:
stock_indicators = ['date,', 'tic', 'open', 'high', 'low', 'close', 'volume', 'vix', 'turbulence']

for indicator in train.columns:
    if((indicator not in stock_indicators) and (indicator not in config.INDICATORS)):
        config.INDICATORS.append(indicator)

In [335]:
config.INDICATORS.remove('turbulence')

In [336]:
config.INDICATORS

['macd',
 'boll_ub',
 'boll_lb',
 'rsi_30',
 'cci_30',
 'dx_30',
 'close_30_sma',
 'close_60_sma',
 'date',
 'day',
 'open_BTC-USD',
 'high_BTC-USD',
 'low_BTC-USD',
 'close_BTC-USD',
 'volume_BTC-USD',
 'open_USDT-USD',
 'high_USDT-USD',
 'low_USDT-USD',
 'close_USDT-USD',
 'volume_USDT-USD',
 'open_BNB-USD',
 'high_BNB-USD',
 'low_BNB-USD',
 'close_BNB-USD',
 'volume_BNB-USD']

In [337]:
stock_dimension = len(train.tic.unique())
state_space = 1 + 2*stock_dimension + len(config.INDICATORS)*stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")


Stock Dimension: 1, State Space: 28


In [339]:
# CARE, HERE I MODIFIED 0.001 to 0 to annualate trade fees for our model to trade more

buy_cost_list = sell_cost_list = [0] * stock_dimension
num_stock_shares = [0] * stock_dimension

env_kwargs = {
    "hmax": 100,
    "initial_amount": 1000000,
    "num_stock_shares": num_stock_shares,
    "buy_cost_pct": buy_cost_list,
    "sell_cost_pct": sell_cost_list,
    "state_space": state_space,
    "stock_dim": stock_dimension,
    "tech_indicator_list": config.INDICATORS,
    "action_space": stock_dimension,
    "reward_scaling": 1e-4
}


e_train_gym = StockTradingEnv(df = train, **env_kwargs)

In [340]:
e_train_gym.action_space

Box([-1.], [1.], (1,), float32)

## Environment for Training



In [341]:
env_train, _ = e_train_gym.get_sb_env()
print(type(env_train))

ValueError: could not convert string to float: '2017-11-09'

<a id='5'></a>
# Part 6: Implement DRL Algorithms
* The implementation of the DRL algorithms are based on **OpenAI Baselines** and **Stable Baselines**. Stable Baselines is a fork of OpenAI Baselines, with a major structural refactoring, and code cleanups.
* FinRL library includes fine-tuned standard DRL algorithms, such as DQN, DDPG,
Multi-Agent DDPG, PPO, SAC, A2C and TD3. We also allow users to
design their own DRL algorithms by adapting these DRL algorithms.

In [132]:
agent = DRLAgent(env = env_train)

### Model Training: 5 models, A2C DDPG, PPO, TD3, SAC


### Model 1: A2C


In [133]:
agent = DRLAgent(env = env_train)
model_a2c = agent.get_model("a2c")

{'n_steps': 5, 'ent_coef': 0.01, 'learning_rate': 0.0007}
Using cpu device


In [134]:
trained_a2c = agent.train_model(model=model_a2c, 
                             tb_log_name='a2c',
                             total_timesteps=50000)

--------------------------------------
| time/                 |            |
|    fps                | 459        |
|    iterations         | 100        |
|    time_elapsed       | 1          |
|    total_timesteps    | 500        |
| train/                |            |
|    entropy_loss       | -1.4       |
|    explained_variance | -0.161     |
|    learning_rate      | 0.0007     |
|    n_updates          | 99         |
|    policy_loss        | 1.52       |
|    reward             | 0.29479045 |
|    std                | 0.978      |
|    value_loss         | 4.37       |
--------------------------------------
---------------------------------------
| time/                 |             |
|    fps                | 459         |
|    iterations         | 200         |
|    time_elapsed       | 2           |
|    total_timesteps    | 1000        |
| train/                |             |
|    entropy_loss       | -1.39       |
|    explained_variance | 0.101       |
|    learning_ra

--------------------------------------
| time/                 |            |
|    fps                | 437        |
|    iterations         | 1500       |
|    time_elapsed       | 17         |
|    total_timesteps    | 7500       |
| train/                |            |
|    entropy_loss       | -1.42      |
|    explained_variance | 0          |
|    learning_rate      | 0.0007     |
|    n_updates          | 1499       |
|    policy_loss        | -1.03      |
|    reward             | -0.3409762 |
|    std                | 1          |
|    value_loss         | 0.64       |
--------------------------------------
day: 871, episode: 10
begin_total_asset: 1000000.00
end_total_asset: 489134.35
total_reward: -510865.65
total_cost: 18271.13
total_trades: 794
Sharpe: 0.003
---------------------------------------
| time/                 |             |
|    fps                | 436         |
|    iterations         | 1600        |
|    time_elapsed       | 18          |
|    total_timestep

--------------------------------------
| time/                 |            |
|    fps                | 419        |
|    iterations         | 2800       |
|    time_elapsed       | 33         |
|    total_timesteps    | 14000      |
| train/                |            |
|    entropy_loss       | -1.47      |
|    explained_variance | 0.000484   |
|    learning_rate      | 0.0007     |
|    n_updates          | 2799       |
|    policy_loss        | 27.1       |
|    reward             | -6.1943913 |
|    std                | 1.06       |
|    value_loss         | 613        |
--------------------------------------
---------------------------------------
| time/                 |             |
|    fps                | 420         |
|    iterations         | 2900        |
|    time_elapsed       | 34          |
|    total_timesteps    | 14500       |
| train/                |             |
|    entropy_loss       | -1.49       |
|    explained_variance | 0.311       |
|    learning_ra

-------------------------------------
| time/                 |           |
|    fps                | 427       |
|    iterations         | 4100      |
|    time_elapsed       | 47        |
|    total_timesteps    | 20500     |
| train/                |           |
|    entropy_loss       | -1.51     |
|    explained_variance | 0.0289    |
|    learning_rate      | 0.0007    |
|    n_updates          | 4099      |
|    policy_loss        | -3.79     |
|    reward             | 3.1882489 |
|    std                | 1.1       |
|    value_loss         | 6.98      |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 428       |
|    iterations         | 4200      |
|    time_elapsed       | 49        |
|    total_timesteps    | 21000     |
| train/                |           |
|    entropy_loss       | -1.51     |
|    explained_variance | -0.00353  |
|    learning_rate      | 0.0007    |
|    n_updat

-------------------------------------
| time/                 |           |
|    fps                | 438       |
|    iterations         | 5400      |
|    time_elapsed       | 61        |
|    total_timesteps    | 27000     |
| train/                |           |
|    entropy_loss       | -1.55     |
|    explained_variance | -0.00284  |
|    learning_rate      | 0.0007    |
|    n_updates          | 5399      |
|    policy_loss        | 50        |
|    reward             | 2.5970955 |
|    std                | 1.14      |
|    value_loss         | 2.49e+03  |
-------------------------------------
--------------------------------------
| time/                 |            |
|    fps                | 438        |
|    iterations         | 5500       |
|    time_elapsed       | 62         |
|    total_timesteps    | 27500      |
| train/                |            |
|    entropy_loss       | -1.55      |
|    explained_variance | 6.5e-06    |
|    learning_rate      | 0.0007     |
| 

--------------------------------------
| time/                 |            |
|    fps                | 445        |
|    iterations         | 6800       |
|    time_elapsed       | 76         |
|    total_timesteps    | 34000      |
| train/                |            |
|    entropy_loss       | -1.57      |
|    explained_variance | 0.00232    |
|    learning_rate      | 0.0007     |
|    n_updates          | 6799       |
|    policy_loss        | 5.59       |
|    reward             | 0.16734162 |
|    std                | 1.17       |
|    value_loss         | 19.9       |
--------------------------------------
day: 871, episode: 40
begin_total_asset: 1000000.00
end_total_asset: 906274.25
total_reward: -93725.75
total_cost: 25451.06
total_trades: 823
Sharpe: 0.192
-------------------------------------
| time/                 |           |
|    fps                | 445       |
|    iterations         | 6900      |
|    time_elapsed       | 77        |
|    total_timesteps    | 3450

--------------------------------------
| time/                 |            |
|    fps                | 440        |
|    iterations         | 8200       |
|    time_elapsed       | 92         |
|    total_timesteps    | 41000      |
| train/                |            |
|    entropy_loss       | -1.6       |
|    explained_variance | 0          |
|    learning_rate      | 0.0007     |
|    n_updates          | 8199       |
|    policy_loss        | 14         |
|    reward             | 0.41813537 |
|    std                | 1.2        |
|    value_loss         | 61.6       |
--------------------------------------
--------------------------------------
| time/                 |            |
|    fps                | 440        |
|    iterations         | 8300       |
|    time_elapsed       | 94         |
|    total_timesteps    | 41500      |
| train/                |            |
|    entropy_loss       | -1.59      |
|    explained_variance | -0.169     |
|    learning_rate      |

------------------------------------
| time/                 |          |
|    fps                | 433      |
|    iterations         | 9500     |
|    time_elapsed       | 109      |
|    total_timesteps    | 47500    |
| train/                |          |
|    entropy_loss       | -1.6     |
|    explained_variance | -0.00244 |
|    learning_rate      | 0.0007   |
|    n_updates          | 9499     |
|    policy_loss        | 11.2     |
|    reward             | 2.247777 |
|    std                | 1.2      |
|    value_loss         | 77.3     |
------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 433       |
|    iterations         | 9600      |
|    time_elapsed       | 110       |
|    total_timesteps    | 48000     |
| train/                |           |
|    entropy_loss       | -1.6      |
|    explained_variance | -0.00337  |
|    learning_rate      | 0.0007    |
|    n_updates          | 95

### Model 2: DDPG

In [None]:
agent = DRLAgent(env = env_train)
model_ddpg = agent.get_model("ddpg")

In [None]:
trained_ddpg = agent.train_model(model=model_ddpg, 
                             tb_log_name='ddpg',
                             total_timesteps=50000)

### Model 3: PPO

In [None]:
agent = DRLAgent(env = env_train)
PPO_PARAMS = {
    "n_steps": 2048,
    "ent_coef": 0.01,
    "learning_rate": 0.00025,
    "batch_size": 128,
}
model_ppo = agent.get_model("ppo",model_kwargs = PPO_PARAMS)

In [None]:
trained_ppo = agent.train_model(model=model_ppo, 
                             tb_log_name='ppo',
                             total_timesteps=50000)

### Model 4: TD3

In [None]:
agent = DRLAgent(env = env_train)
TD3_PARAMS = {"batch_size": 100, 
              "buffer_size": 1000000, 
              "learning_rate": 0.001}

model_td3 = agent.get_model("td3",model_kwargs = TD3_PARAMS)

In [None]:
trained_td3 = agent.train_model(model=model_td3, 
                             tb_log_name='td3',
                             total_timesteps=30000)

### Model 5: SAC

In [158]:
agent = DRLAgent(env = env_train)
SAC_PARAMS = {
    "batch_size": 128,
    "buffer_size": 1000000,
    "learning_rate": 0.0001,
    "learning_starts": 100,
    "ent_coef": "auto_0.1",
}

model_sac = agent.get_model("sac",model_kwargs = SAC_PARAMS)

{'batch_size': 128, 'buffer_size': 1000000, 'learning_rate': 0.0001, 'learning_starts': 100, 'ent_coef': 'auto_0.1'}
Using cpu device


In [159]:
trained_sac = agent.train_model(model=model_sac, 
                             tb_log_name='sac',
                             total_timesteps=60000)

day: 871, episode: 80
begin_total_asset: 1000000.00
end_total_asset: 1000000.00
total_reward: 0.00
total_cost: 0.00
total_trades: 0
---------------------------------
| time/              |          |
|    episodes        | 4        |
|    fps             | 47       |
|    time_elapsed    | 73       |
|    total_timesteps | 3488     |
| train/             |          |
|    actor_loss      | 2.18e+04 |
|    critic_loss     | 4.59e+06 |
|    ent_coef        | 0.14     |
|    ent_coef_loss   | 62.7     |
|    learning_rate   | 0.0001   |
|    n_updates       | 3387     |
|    reward          | 0.0      |
---------------------------------
---------------------------------
| time/              |          |
|    episodes        | 8        |
|    fps             | 46       |
|    time_elapsed    | 150      |
|    total_timesteps | 6976     |
| train/             |          |
|    actor_loss      | 1.83e+04 |
|    critic_loss     | 2.34e+04 |
|    ent_coef        | 0.199    |
|    ent_coef_loss

day: 871, episode: 140
begin_total_asset: 1000000.00
end_total_asset: 1000000.00
total_reward: 0.00
total_cost: 0.00
total_trades: 0
---------------------------------
| time/              |          |
|    episodes        | 64       |
|    fps             | 35       |
|    time_elapsed    | 1589     |
|    total_timesteps | 55808    |
| train/             |          |
|    actor_loss      | 3.11e+04 |
|    critic_loss     | 3.42e+05 |
|    ent_coef        | 26.2     |
|    ent_coef_loss   | -104     |
|    learning_rate   | 0.0001   |
|    n_updates       | 55707    |
|    reward          | 0.0      |
---------------------------------
---------------------------------
| time/              |          |
|    episodes        | 68       |
|    fps             | 34       |
|    time_elapsed    | 1713     |
|    total_timesteps | 59296    |
| train/             |          |
|    actor_loss      | 4.29e+04 |
|    critic_loss     | 3.61e+04 |
|    ent_coef        | 37.2     |
|    ent_coef_los

## Trading
Assume that we have $1,000,000 initial capital at 2020-07-01. We use the DDPG model to trade Dow jones 30 stocks.

### Set turbulence threshold
Set the turbulence threshold to be greater than the maximum of insample turbulence data, if current turbulence index is greater than the threshold, then we assume that the current market is volatile

In [160]:
data_risk_indicator = processed_full[(processed_full.date<'2021-04-30') & (processed_full.date>='2017-11-09')]
insample_risk_indicator = data_risk_indicator.drop_duplicates(subset=['date'])

In [161]:
insample_risk_indicator.vix.describe()

count    872.000000
mean      20.167087
std        9.645215
min        9.150000
25%       13.530000
50%       17.460000
75%       23.450001
max       82.690002
Name: vix, dtype: float64

In [162]:
insample_risk_indicator.vix.quantile(0.996)

73.79052062988268

In [163]:
insample_risk_indicator.turbulence.describe()

count    872.000000
mean       0.783612
std        3.473756
min        0.000000
25%        0.000000
50%        0.040732
75%        0.425810
max       79.096379
Name: turbulence, dtype: float64

In [139]:
insample_risk_indicator.turbulence.quantile(0.996)

12.651610411007068

### Trade

DRL model needs to update periodically in order to take full advantage of the data, ideally we need to retrain our model yearly, quarterly, or monthly. We also need to tune the parameters along the way, in this notebook I only use the in-sample data from 2009-01 to 2020-07 to tune the parameters once, so there is some alpha decay here as the length of trade date extends. 

Numerous hyperparameters – e.g. the learning rate, the total number of samples to train on – influence the learning process and are usually determined by testing some variations.

In [164]:
#trade = data_split(processed_full, '2020-07-01','2021-10-31')
e_trade_gym = StockTradingEnv(df = trade, turbulence_threshold = 70,risk_indicator_col='vix', **env_kwargs)
# env_trade, obs_trade = e_trade_gym.get_sb_env()

In [165]:
trade.head()

Unnamed: 0,date,tic,open,high,low,close,volume,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
0,2021-04-30,ETH-USD,2757.734131,2796.054932,2728.169922,2773.207031,29777180000.0,4.0,174.464045,2809.551665,2009.86501,65.531558,178.777421,37.231803,2297.672754,2020.136731,18.610001,0.009635
1,2021-05-03,ETH-USD,2951.175781,3450.037842,2951.175781,3431.086182,49174290000.0,0.0,255.668805,3172.448543,1920.251848,73.820994,251.430755,61.774865,2403.676428,2098.781047,18.309999,13.716108
2,2021-05-04,ETH-USD,3431.131592,3523.585938,3180.742676,3253.629395,62402050000.0,1.0,275.333805,3285.599999,1888.952833,69.101079,225.053368,63.54682,2442.359981,2127.453619,19.48,1.072426
3,2021-05-05,ETH-USD,3240.554688,3541.462646,3213.101562,3522.783203,48334200000.0,2.0,309.074107,3450.061169,1824.858363,71.918229,214.699672,63.98065,2489.523181,2158.587646,19.15,1.251543
4,2021-05-06,ETH-USD,3524.930908,3598.895996,3386.23999,3490.880371,44300390000.0,3.0,329.44165,3580.888165,1799.92475,71.123115,195.560187,65.39502,2535.27323,2188.049756,18.389999,0.125952


In [166]:
df_account_value, df_actions = DRLAgent.DRL_prediction(
    model=trained_sac, 
    environment = e_trade_gym)

hit end!


In [167]:
df_account_value.shape

(252, 2)

In [168]:
df_account_value.tail()

Unnamed: 0,date,account_value
247,2022-04-22,1000000.0
248,2022-04-25,1000000.0
249,2022-04-26,1000000.0
250,2022-04-27,1000000.0
251,2022-04-28,1000000.0


In [169]:
df_actions.head(10)

Unnamed: 0,date,actions
0,2021-04-30,[0]
1,2021-05-03,[0]
2,2021-05-04,[0]
3,2021-05-05,[0]
4,2021-05-06,[0]
5,2021-05-07,[0]
6,2021-05-10,[0]
7,2021-05-11,[0]
8,2021-05-12,[0]
9,2021-05-13,[0]


<a id='6'></a>
# Part 7: Backtest Our Strategy
Backtesting plays a key role in evaluating the performance of a trading strategy. Automated backtesting tool is preferred because it reduces the human error. We usually use the Quantopian pyfolio package to backtest our trading strategies. It is easy to use and consists of various individual plots that provide a comprehensive image of the performance of a trading strategy.

<a id='6.1'></a>
## 7.1 BackTestStats
pass in df_account_value, this information is stored in env class


In [149]:
print("==============Get Backtest Results===========")
now = datetime.datetime.now().strftime('%Y%m%d-%Hh%M')

perf_stats_all = backtest_stats(account_value=df_account_value)
perf_stats_all = pd.DataFrame(perf_stats_all)
perf_stats_all.to_csv("./"+config.RESULTS_DIR+"/perf_stats_all_"+now+'.csv')

Annual return         -0.151936
Cumulative returns    -0.151936
Annual volatility      0.861414
Sharpe ratio           0.250939
Calmar ratio          -0.265488
Stability              0.050855
Max drawdown          -0.572290
Omega ratio            1.042311
Sortino ratio          0.340269
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.107311
Daily value at risk   -0.107670
dtype: float64


In [150]:
#baseline stats
print("==============Get Baseline Stats===========")
baseline_df = get_baseline(
        ticker="ETH-USD", 
        start = df_account_value.loc[0,'date'],
        end = df_account_value.loc[len(df_account_value)-1,'date'])

stats = backtest_stats(baseline_df, value_col_name = 'close')


[*********************100%***********************]  1 of 1 completed
Shape of DataFrame:  (364, 8)
Annual return          0.032922
Cumulative returns     0.047899
Annual volatility      0.796761
Sharpe ratio           0.442387
Calmar ratio           0.057635
Stability              0.062495
Max drawdown          -0.571207
Omega ratio            1.079601
Sortino ratio          0.632828
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.126503
Daily value at risk   -0.098984
dtype: float64


In [151]:
df_account_value.loc[0,'date']

'2021-04-30'

In [152]:
df_account_value.loc[len(df_account_value)-1,'date']

'2022-04-28'

<a id='6.2'></a>
## 7.2 BackTestPlot

In [153]:
print("==============Compare to ETH===========")
%matplotlib inline
# S&P 500: ^GSPC
# Dow Jones Index: ^DJI
# NASDAQ 100: ^NDX
backtest_plot(df_account_value, 
             baseline_ticker = 'ETH-USD', 
             baseline_start = df_account_value.loc[0,'date'],
             baseline_end = df_account_value.loc[len(df_account_value)-1,'date'])

[*********************100%***********************]  1 of 1 completed
Shape of DataFrame:  (364, 8)


Start date,2021-04-30,2021-04-30
End date,2022-04-28,2022-04-28
Total months,12,12
Unnamed: 0_level_3,Backtest,Unnamed: 2_level_3
Annual return,-15.2%,
Cumulative returns,-15.2%,
Annual volatility,86.1%,
Sharpe ratio,0.25,
Calmar ratio,-0.27,
Stability,0.05,
Max drawdown,-57.2%,
Omega ratio,1.04,
Sortino ratio,0.34,
Skew,,


AttributeError: 'numpy.int64' object has no attribute 'to_pydatetime'