# "Portfolio Management with Deep Reinforcement Learning"

> DRL is used to provide dynamic asset allocation for a portfolio
- toc: true
- branch: master
- badges: false
- comments: true
- hide: false
- search_exclude: true
- metadata_key1: metadata_value1
- metadata_key2: metadata_value2
- image: images/PortfolioManagement.png
- categories: [Investment_Industry,  Reinforcement_Learning,A2C,PPO,  OpenAI,Gym,finrl,stable_baselines3,pyfolio]
- show_tags: true

In [1]:
# hide
# Based on FinRL--explainable deep reinforcement learning for portfolio management--an empirical approach_ANNO.ipynb
# Remove the 'explainable' content - could not find a notebook where only the portfolio management tutorial is captured.
# Based on this tutorial:
# https://towardsdatascience.com/finrl-for-quantitative-finance-tutorial-for-portfolio-allocation-9b417660c7cd

In [100]:
# hide
# from IPython.display import Math
# from google.colab.output._publish import javascript
# url = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=default"
# javascript(url=url)
# Math(r"e^\alpha")

from IPython.display import HTML, Math
display(HTML("<script src='https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/"
             "latest.js?config=default'></script>"))
Math(r"e^\alpha")

<IPython.core.display.Math object>

## 1. Problem: Dynamic Asset Allocation

*Asset allocation* refers to the partitioning of available funds in a portfolio among various investment products. Investment products may be categorized in multiple ways. One common (and high level) classification contains three classes: 

* Stocks
* Bonds (fixed income)
* Cash and cash equivalents

Other classification systems may sub-classify each of these categories into subclasses, for example stocks into US and International stocks, or into small-cap, medium-cap, and large-cap stocks, etc.

Another aspect of asset allocation is concerns the underlying *strategy*. Some strategies are:

* Tactical asset allocation
* Strategic asset allocation
* Constant weight asset allocation
* Integrated asset allocation
* Insured asset allocation
* Dynamic asset allocation

We will focus on the last of these strategies: Dynamic asset allocation. With this approach the combination of assets are adjusted at regular intervals to capitalize on the strengthening and weakening of the economy and rise and fall of markets. This strategy depends on the decisions of a portfolio manager. In this project, however, we will attempt to replace the portfolio manager with an *AI agent*. The agent will be trained on past market behavior by means of *Deep Reinforcement Learning*.

For this Proof-Of-Concept (POC) project we will make things as simple as possible. The portfolio will consist of the 30 investment products in the Dow-Jones (DJIA). The fixed allocation in the DJIA also provides a handy reference against which the performance of our agent's dynamic behavior can be juxtaposed. Our agent will have the opportunity to adjust the mix of stocks on a daily basis. In practice, a 401K agent, for example, might be setup to make monthly adjustments to reduce transation costs or to avoid constraints related to trading. For simplicity, trading costs will not be taken into account for now.

Our agent will have $ 1,000,000 when the project starts.

To implement this POC we will make use of the FinRL framework as well as some Yahoo technology to acquire financial data.

## 2. Solution Proposal  

Investment decisions are sequential by nature. Furthermore, an optimal decision in the present may turn out not to be optimal over the longer term. Then there are the complexities of the investment landscape like varying market conditions, political disruptive event, and other economic uncertainties. The ideal tool for this kind of problem is **Reinforcement Learning**.

The solution requires the setup of a **digital twin** for the investor's portfolio. In RL parlance this model of the portfolio is called an **environment**. The environment contains **states** which are modified by applying **actions** to it.

We will choose the following **states** (measured daily in our case) for the environment:

$$ 
\begin{align}
& s_1=\text{Value of AAPL holdings} \\
& s_2=\text{Value of AMGN holdings} \\
& s_3=\text{Value of AXP holdings} \\
& \text{...} \\
& s_{29}=\text{Value of WBA holdings} \\
& s_{30}=\text{Value of WMT holdings} \\
\end{align}
$$

The following **actions** (applied daily in our case) will be setup to influence the environment/portfolio:

$$ 
\begin{align}
& a_1=\text{Fraction to be invested in AAPL this cycle} \\
& a_2=\text{Fraction to be invested in AMGN this cycle} \\
& a_3=\text{Fraction to be invested in AXP this cycle} \\
& \text{...} \\
& a_{29}=\text{Fraction to be invested in WBA this cycle} \\
& a_{30}=\text{Fraction to be invested in WMT this cycle} \\
\end{align}
$$

All action values are in the [0, 1] interval.

The *reward* $r$ is given by

$$
\begin{align}
r(s,a,s')=log(v'/v)
\end{align}
$$

where $v$ and $v'$ are the portfolio value at states $s'$ and $s$ respectively.

The model of the portfolio/environment will have the following **parameters**:

$$ 
\begin{align}
& \theta_1=\text{Initial Amount} \\
& \theta_2=\text{Transaction Cost Pct} \\
& \theta_3=\text{Size of State and Action Spaces} \\
& \theta_4=\text{Reward Scaling} \\
& \theta_5=\text{Technical Indicator List} \\
\end{align}
$$

Rather than having the investor's salary directly as a parameter, we will have two parameters, one for the low value and one for the high value. This makes the model more flexible, in particular during training. The same applies for the two parameters used for the house price. In addition, we will add some variability for the interest rates. The latter variabilities may be considered *disturbances* experienced by the investor. The agent has the ability to incorporate such disturbances during its operation.


## 3. Implementation of the Solution

To implement the *environment* we use the OpenAI Gym tools. We will use the FinRL python library to implement the *agent*. For a function approximator for the agent, two algorithms will be investigated: 

* Advantage Actor-Critic (A2C)
* Proximal Policy Optimization (PPO)

This implementation allows the agent to allocate the investor's funds between the 30 DJIA  instruments. The goal is to maximize net worth at the end of the investment horizon.

The plotly library will be used for visualization. The code will run on the Google Colab platform. To start with, we install the python packages needed.

In [1]:
# hide
# install plotly and finrl library
!pip install plotly==4.4.1
!wget https://github.com/plotly/orca/releases/download/v1.2.1/orca-1.2.1-x86_64.AppImage -O /usr/local/bin/orca
!chmod +x /usr/local/bin/orca
!apt-get install xvfb libgtk2.0-0 libgconf-2-4
!pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git
!pip install PyPortfolioOpt


--2021-11-11 11:44:12--  https://github.com/plotly/orca/releases/download/v1.2.1/orca-1.2.1-x86_64.AppImage
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/99037241/9dc3a580-286a-11e9-8a21-4312b7c8a512?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20211111%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20211111T114412Z&X-Amz-Expires=300&X-Amz-Signature=1a1d2f2527b566ab0f685ebd9ffa1b4ea7be280fd49e0141e48c9280a827b8a6&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=99037241&response-content-disposition=attachment%3B%20filename%3Dorca-1.2.1-x86_64.AppImage&response-content-type=application%2Foctet-stream [following]
--2021-11-11 11:44:12--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/99037241/9dc3a580-286a-11e9-8a21-4312b7c

Import the packages needed:

In [2]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('Agg')
%matplotlib inline
import datetime

In [3]:
from finrl.apps import config
from finrl.neo_finrl.preprocessor.yahoodownloader import YahooDownloader
from finrl.neo_finrl.preprocessor.preprocessors import FeatureEngineer, data_split
from finrl.neo_finrl.env_portfolio_allocation.env_portfolio import StockPortfolioEnv
from finrl.drl_agents.stablebaselines3.models import DRLAgent
from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline,convert_daily_return_to_pyfolio_ts

import gym
from gym.utils import seeding
from gym import spaces
from stable_baselines3.common.vec_env import DummyVecEnv

from pyfolio import timeseries
import plotly
import plotly.graph_objs as go

  'Module "zipline.assets" not found; multipliers will not be applied'


In [4]:
import sys
sys.path.append("../FinRL-Library")

In [5]:
# hide
pd.set_option('display.max_rows', 100)

Setup some directories:

In [6]:
import os
if not os.path.exists("./" + config.DATA_SAVE_DIR):
    os.makedirs("./" + config.DATA_SAVE_DIR)
if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
    os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
    os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
if not os.path.exists("./" + config.RESULTS_DIR):
    os.makedirs("./" + config.RESULTS_DIR)

### 3.0 PARAMETERS

In [62]:
DATA_START = '2008-01-01' 
TRAIN_START = '2009-01-01'
TRADE_START = '2020-07-01'
DATA_END = '2021-09-02'
LOOKBACK = 252 #trading days in one year
INITIAL_AMOUNT = 1_000_000 #dollars
TRANSACTION_COST_PCT = 0
REWARD_SCALING = 1e-1 #scaling factor applied to the reward signal

### 3.1 Download Data
We use the data from Yahoo Finance.


In [8]:
df = YahooDownloader(start_date = DATA_START,
                     end_date = DATA_END,
                     ticker_list = config.DOW_30_TICKER).fetch_data()

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

In [9]:
df

Unnamed: 0,date,open,high,low,close,volume,tic,day
0,2008-01-02,7.116786,7.152143,6.876786,5.966037,1079178800,AAPL,2
1,2008-01-02,46.599998,47.040001,46.259998,36.337456,7934400,AMGN,2
2,2008-01-02,52.090000,52.320000,50.790001,40.694550,8053700,AXP,2
3,2008-01-02,87.570000,87.839996,86.000000,63.481632,4303000,BA,2
4,2008-01-02,72.559998,72.669998,70.050003,47.633057,6337800,CAT,2
...,...,...,...,...,...,...,...,...
100380,2021-09-01,416.540009,419.869995,411.239990,415.890198,2034400,UNH,2
100381,2021-09-01,229.100006,230.779999,228.770004,229.715607,8177600,V,2
100382,2021-09-01,55.000000,55.150002,54.799999,54.295189,12647300,VZ,2
100383,2021-09-01,50.730000,50.810001,49.480000,50.290001,5212400,WBA,2


In [10]:
# hide
print(df['day'].unique())
# df.loc[100:150, 'day'] #assume day-of-week: 0 - 4

[2 3 4 0 1]


In [12]:
# 
# Verify 30 unique tickers
# https://www.investopedia.com/ask/answers/who-or-what-is-dow-jones/
lst = list(df['tic'].unique())
print(len(lst), lst)

30 ['AAPL', 'AMGN', 'AXP', 'BA', 'CAT', 'CRM', 'CSCO', 'CVX', 'DIS', 'GS', 'HD', 'HON', 'IBM', 'INTC', 'JNJ', 'JPM', 'KO', 'MCD', 'MMM', 'MRK', 'MSFT', 'NKE', 'PG', 'TRV', 'UNH', 'VZ', 'WBA', 'WMT', 'V', 'DOW']


### 3.2 Data Understanding and Preparation
We will keep showing snippets of the data set as it evolves to assist with understanding. There is a need to check for missing data and also to do some feature engineering. We rely on the FeatureEngineer class to take care of these needs. Some indicators used are:

* Moving Average Convergence Divergence (MACD)

The MACD is primarily used to gauge the strength of stock price movement. It does this by measuring the divergence of two exponential moving averages (EMAs), commonly a 12-period EMA and a 26-period EMA.

* Relative Strength Index (RSI)

The RSI aims to indicate whether a market is considered to be overbought or oversold in relation to recent price levels. 

* Commodity Channel Index (CCI)
The Commodity Channel Index​ (CCI) is a momentum-based oscillator used to help determine when an investment vehicle is reaching a condition of being overbought or oversold.

FinRL also uses the financial turbulence index that measures extreme asset price fluctuation.

#### 3.2.1 Add technical indicators

In [13]:
%%time
fe = FeatureEngineer(
  use_technical_indicator=True,
  use_turbulence=False,
  user_defined_feature=False)
df = fe.preprocess_data(df)

Successfully added technical indicators


In [14]:
# df.head(100)
df

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma
0,2008-01-02,7.116786,7.152143,6.876786,5.966037,1079178800,AAPL,2,0.000000,5.971309,5.963519,100.000000,-66.666667,100.000000,5.966037,5.966037
3442,2008-01-02,46.599998,47.040001,46.259998,36.337456,7934400,AMGN,2,0.000000,5.971309,5.963519,100.000000,-66.666667,100.000000,36.337456,36.337456
6884,2008-01-02,52.090000,52.320000,50.790001,40.694550,8053700,AXP,2,0.000000,5.971309,5.963519,100.000000,-66.666667,100.000000,40.694550,40.694550
10326,2008-01-02,87.570000,87.839996,86.000000,63.481632,4303000,BA,2,0.000000,5.971309,5.963519,100.000000,-66.666667,100.000000,63.481632,63.481632
13768,2008-01-02,72.559998,72.669998,70.050003,47.633057,6337800,CAT,2,0.000000,5.971309,5.963519,100.000000,-66.666667,100.000000,47.633057,47.633057
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
82607,2021-09-01,159.429993,160.229996,158.210007,158.397003,1133800,TRV,2,2.126652,163.483054,150.694548,55.446787,62.496692,22.987009,154.028030,152.401430
86049,2021-09-01,416.540009,419.869995,411.239990,415.890198,2034400,UNH,2,1.544983,428.508670,402.801383,53.065126,-4.030115,15.845681,415.105289,409.407493
89491,2021-09-01,55.000000,55.150002,54.799999,54.295189,12647300,VZ,2,-0.246229,55.496104,53.821636,44.243162,-102.777095,25.659221,54.825886,55.061444
92933,2021-09-01,50.730000,50.810001,49.480000,50.290001,5212400,WBA,2,0.490625,50.485316,46.328237,54.782463,155.821606,20.132924,47.760327,48.719896


We see that the FeatureEngineer has added some features:

* macd
* boll_ub (upper Bollinger Band)
* boll_lb (lower Bollinger Band)
* rsi_30 (with a lookback of 30)
* cci_30 (with a lookback of 30)
* dx_30 (with a lookback of 30)
* close_30_sma (close price simple moving average with a lookback of 30)
* close_60_sma (close price simple moving average with a lookback of 60)

In [None]:
# hide
# on stockstats library: (seems like FeatureEngineer makes use of it)
# https://medium.com/codex/this-python-library-will-help-you-get-stock-technical-indicators-in-one-line-of-code-c11ed2c8e45f

#### 3.2.2 Add covariance matrix as a feature

Adding the portfolio's covariance matrix as a feature has some advantages. It can be used to quantify the risk (standard deviation) associated with a portfolio.

In [15]:
df = df.sort_values(['date','tic'], ignore_index=True)

In [16]:
df

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma
0,2008-01-02,7.116786,7.152143,6.876786,5.966037,1079178800,AAPL,2,0.000000,5.971309,5.963519,100.000000,-66.666667,100.000000,5.966037,5.966037
1,2008-01-02,46.599998,47.040001,46.259998,36.337456,7934400,AMGN,2,0.000000,5.971309,5.963519,100.000000,-66.666667,100.000000,36.337456,36.337456
2,2008-01-02,52.090000,52.320000,50.790001,40.694550,8053700,AXP,2,0.000000,5.971309,5.963519,100.000000,-66.666667,100.000000,40.694550,40.694550
3,2008-01-02,87.570000,87.839996,86.000000,63.481632,4303000,BA,2,0.000000,5.971309,5.963519,100.000000,-66.666667,100.000000,63.481632,63.481632
4,2008-01-02,72.559998,72.669998,70.050003,47.633057,6337800,CAT,2,0.000000,5.971309,5.963519,100.000000,-66.666667,100.000000,47.633057,47.633057
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96371,2021-09-01,159.429993,160.229996,158.210007,158.397003,1133800,TRV,2,2.126652,163.483054,150.694548,55.446787,62.496692,22.987009,154.028030,152.401430
96372,2021-09-01,416.540009,419.869995,411.239990,415.890198,2034400,UNH,2,1.544983,428.508670,402.801383,53.065126,-4.030115,15.845681,415.105289,409.407493
96373,2021-09-01,55.000000,55.150002,54.799999,54.295189,12647300,VZ,2,-0.246229,55.496104,53.821636,44.243162,-102.777095,25.659221,54.825886,55.061444
96374,2021-09-01,50.730000,50.810001,49.480000,50.290001,5212400,WBA,2,0.490625,50.485316,46.328237,54.782463,155.821606,20.132924,47.760327,48.719896


In [17]:
df.index = df.date.factorize()[0] #. now each new date has a new index

In [18]:
df

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma
0,2008-01-02,7.116786,7.152143,6.876786,5.966037,1079178800,AAPL,2,0.000000,5.971309,5.963519,100.000000,-66.666667,100.000000,5.966037,5.966037
0,2008-01-02,46.599998,47.040001,46.259998,36.337456,7934400,AMGN,2,0.000000,5.971309,5.963519,100.000000,-66.666667,100.000000,36.337456,36.337456
0,2008-01-02,52.090000,52.320000,50.790001,40.694550,8053700,AXP,2,0.000000,5.971309,5.963519,100.000000,-66.666667,100.000000,40.694550,40.694550
0,2008-01-02,87.570000,87.839996,86.000000,63.481632,4303000,BA,2,0.000000,5.971309,5.963519,100.000000,-66.666667,100.000000,63.481632,63.481632
0,2008-01-02,72.559998,72.669998,70.050003,47.633057,6337800,CAT,2,0.000000,5.971309,5.963519,100.000000,-66.666667,100.000000,47.633057,47.633057
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3441,2021-09-01,159.429993,160.229996,158.210007,158.397003,1133800,TRV,2,2.126652,163.483054,150.694548,55.446787,62.496692,22.987009,154.028030,152.401430
3441,2021-09-01,416.540009,419.869995,411.239990,415.890198,2034400,UNH,2,1.544983,428.508670,402.801383,53.065126,-4.030115,15.845681,415.105289,409.407493
3441,2021-09-01,55.000000,55.150002,54.799999,54.295189,12647300,VZ,2,-0.246229,55.496104,53.821636,44.243162,-102.777095,25.659221,54.825886,55.061444
3441,2021-09-01,50.730000,50.810001,49.480000,50.290001,5212400,WBA,2,0.490625,50.485316,46.328237,54.782463,155.821606,20.132924,47.760327,48.719896


In [21]:
# hide
# len(df.index.unique())
# range(lookback, len(df.index.unique()))
lst = [i for i in range(LOOKBACK, len(df.index.unique()))]
lst[:20], lst[-1]

([252,
  253,
  254,
  255,
  256,
  257,
  258,
  259,
  260,
  261,
  262,
  263,
  264,
  265,
  266,
  267,
  268,
  269,
  270,
  271],
 3441)

In [22]:
%%time
cov_list = []
return_list = []
for i in range(LOOKBACK, len(df.index.unique())):
  data_lookback = df.loc[i-LOOKBACK:i, :]
  price_lookback = data_lookback.pivot_table(index='date', columns='tic', values='close')
  return_lookback = price_lookback.pct_change().dropna()
  return_list.append(return_lookback)
  covs = return_lookback.cov().values 
  cov_list.append(covs)

CPU times: user 1min 23s, sys: 1min 10s, total: 2min 34s
Wall time: 1min 19s


In [23]:
len(df['date'].unique()), len(df['date'].unique()[LOOKBACK:])

(3442, 3190)

In [26]:
len(cov_list), len(return_list)

(3190, 3190)

In [79]:
# hide
# cov_list[0]
# return_list[:1]

In [27]:
# 
# form a dataframe with the cov_list and return_list
df_cov = pd.DataFrame({'date':df.date.unique()[LOOKBACK:], 'cov_list':cov_list, 'return_list':return_list})
df_cov

Unnamed: 0,date,cov_list,return_list
0,2008-12-31,"[[0.00134896721940808, 0.0004284132636499008, ...",tic AAPL AMGN AXP ... ...
1,2009-01-02,"[[0.0013661487263709707, 0.0004339387269211268...",tic AAPL AMGN AXP ... ...
2,2009-01-05,"[[0.0013520199451860352, 0.0004294713932227298...",tic AAPL AMGN AXP ... ...
3,2009-01-06,"[[0.0013523434523507877, 0.0004313717163616539...",tic AAPL AMGN AXP ... ...
4,2009-01-07,"[[0.001349260814510138, 0.00043429999408710015...",tic AAPL AMGN AXP ... ...
...,...,...,...
3185,2021-08-26,"[[0.0003928453300803985, 9.896438258199282e-05...",tic AAPL AMGN AXP ... ...
3186,2021-08-27,"[[0.0003923489896636871, 9.967088913301946e-05...",tic AAPL AMGN AXP ... ...
3187,2021-08-30,"[[0.000395776633559146, 0.00010042885429991909...",tic AAPL AMGN AXP ... ...
3188,2021-08-31,"[[0.0003917984085520445, 0.0001000497334010198...",tic AAPL AMGN AXP ... ...


In [28]:
# 
# merge df_cov with the main dataframe
df = df.merge(df_cov, on='date')
df

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2008-12-31,3.070357,3.133571,3.047857,2.613433,607541200,AAPL,2,-0.083547,3.128999,2.482334,42.254781,-80.488215,16.129793,2.780976,2.894368,"[[0.00134896721940808, 0.0004284132636499008, ...",tic AAPL AMGN AXP ... ...
1,2008-12-31,57.110001,58.220001,57.060001,45.031933,6287200,AMGN,2,0.168717,45.965627,43.970344,51.060623,51.616207,10.432018,44.190812,43.701895,"[[0.00134896721940808, 0.0004284132636499008, ...",tic AAPL AMGN AXP ... ...
2,2008-12-31,17.969999,18.750000,17.910000,14.988312,9625600,AXP,2,-0.961734,19.168084,13.014357,42.554864,-75.368734,25.776759,16.184140,18.108784,"[[0.00134896721940808, 0.0004284132636499008, ...",tic AAPL AMGN AXP ... ...
3,2008-12-31,41.590000,43.049999,41.500000,32.005878,5443100,BA,2,-0.279801,32.174382,28.867831,47.440223,156.994494,5.366299,30.327212,32.389915,"[[0.00134896721940808, 0.0004284132636499008, ...",tic AAPL AMGN AXP ... ...
4,2008-12-31,43.700001,45.099998,43.700001,30.925043,6277400,CAT,2,0.684759,31.697350,26.587392,51.205311,98.425695,26.331746,27.876155,27.598371,"[[0.00134896721940808, 0.0004284132636499008, ...",tic AAPL AMGN AXP ... ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89315,2021-09-01,159.429993,160.229996,158.210007,158.397003,1133800,TRV,2,2.126652,163.483054,150.694548,55.446787,62.496692,22.987009,154.028030,152.401430,"[[0.00038578149289071194, 0.000101384676644404...",tic AAPL AMGN AXP ... ...
89316,2021-09-01,416.540009,419.869995,411.239990,415.890198,2034400,UNH,2,1.544983,428.508670,402.801383,53.065126,-4.030115,15.845681,415.105289,409.407493,"[[0.00038578149289071194, 0.000101384676644404...",tic AAPL AMGN AXP ... ...
89317,2021-09-01,55.000000,55.150002,54.799999,54.295189,12647300,VZ,2,-0.246229,55.496104,53.821636,44.243162,-102.777095,25.659221,54.825886,55.061444,"[[0.00038578149289071194, 0.000101384676644404...",tic AAPL AMGN AXP ... ...
89318,2021-09-01,50.730000,50.810001,49.480000,50.290001,5212400,WBA,2,0.490625,50.485316,46.328237,54.782463,155.821606,20.132924,47.760327,48.719896,"[[0.00038578149289071194, 0.000101384676644404...",tic AAPL AMGN AXP ... ...


In [29]:
df = df.sort_values(['date','tic']).reset_index(drop=True)

In [30]:
df

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2008-12-31,3.070357,3.133571,3.047857,2.613433,607541200,AAPL,2,-0.083547,3.128999,2.482334,42.254781,-80.488215,16.129793,2.780976,2.894368,"[[0.00134896721940808, 0.0004284132636499008, ...",tic AAPL AMGN AXP ... ...
1,2008-12-31,57.110001,58.220001,57.060001,45.031933,6287200,AMGN,2,0.168717,45.965627,43.970344,51.060623,51.616207,10.432018,44.190812,43.701895,"[[0.00134896721940808, 0.0004284132636499008, ...",tic AAPL AMGN AXP ... ...
2,2008-12-31,17.969999,18.750000,17.910000,14.988312,9625600,AXP,2,-0.961734,19.168084,13.014357,42.554864,-75.368734,25.776759,16.184140,18.108784,"[[0.00134896721940808, 0.0004284132636499008, ...",tic AAPL AMGN AXP ... ...
3,2008-12-31,41.590000,43.049999,41.500000,32.005878,5443100,BA,2,-0.279801,32.174382,28.867831,47.440223,156.994494,5.366299,30.327212,32.389915,"[[0.00134896721940808, 0.0004284132636499008, ...",tic AAPL AMGN AXP ... ...
4,2008-12-31,43.700001,45.099998,43.700001,30.925043,6277400,CAT,2,0.684759,31.697350,26.587392,51.205311,98.425695,26.331746,27.876155,27.598371,"[[0.00134896721940808, 0.0004284132636499008, ...",tic AAPL AMGN AXP ... ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89315,2021-09-01,159.429993,160.229996,158.210007,158.397003,1133800,TRV,2,2.126652,163.483054,150.694548,55.446787,62.496692,22.987009,154.028030,152.401430,"[[0.00038578149289071194, 0.000101384676644404...",tic AAPL AMGN AXP ... ...
89316,2021-09-01,416.540009,419.869995,411.239990,415.890198,2034400,UNH,2,1.544983,428.508670,402.801383,53.065126,-4.030115,15.845681,415.105289,409.407493,"[[0.00038578149289071194, 0.000101384676644404...",tic AAPL AMGN AXP ... ...
89317,2021-09-01,55.000000,55.150002,54.799999,54.295189,12647300,VZ,2,-0.246229,55.496104,53.821636,44.243162,-102.777095,25.659221,54.825886,55.061444,"[[0.00038578149289071194, 0.000101384676644404...",tic AAPL AMGN AXP ... ...
89318,2021-09-01,50.730000,50.810001,49.480000,50.290001,5212400,WBA,2,0.490625,50.485316,46.328237,54.782463,155.821606,20.132924,47.760327,48.719896,"[[0.00038578149289071194, 0.000101384676644404...",tic AAPL AMGN AXP ... ...


### 3.3 Modeling
The portfolio within the market will be modeled by the OpenAI Gym framework. This is referred to as the *environment*. For the *agent*, we will use the FinRL framework. As the agent interacts with the environment it will gradually learn a trading strategy based on the reward function. The agent is rewarded according to the total value of the portfolio.


#### 3.3.1 Training data

In [37]:
TRAIN_START, TRADE_START

('2009-01-01', '2020-06-30')

In [36]:
train = data_split(df, TRAIN_START, TRADE_START)
train

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2009-01-02,3.067143,3.251429,3.041429,2.778781,746015200,AAPL,4,-0.070954,3.115322,2.480241,45.440189,-32.098565,2.140064,2.781833,2.895179,"[[0.0013661487263709707, 0.0004339387269211268...",tic AAPL AMGN AXP ... ...
0,2009-01-02,58.590000,59.080002,57.750000,45.998848,6547900,AMGN,4,0.249877,46.122330,43.932167,52.756857,93.201882,0.814217,44.259692,43.796767,"[[0.0013661487263709707, 0.0004339387269211268...",tic AAPL AMGN AXP ... ...
0,2009-01-02,18.570000,19.520000,18.400000,15.618539,10955700,AXP,4,-0.855306,18.979336,12.997873,43.957549,-42.761378,16.335101,16.182793,17.988662,"[[0.0013661487263709707, 0.0004339387269211268...",tic AAPL AMGN AXP ... ...
0,2009-01-02,42.799999,45.560001,42.779999,33.941086,7010200,BA,4,-0.002010,32.948623,28.452128,50.822019,272.812545,20.494464,30.469477,32.344131,"[[0.0013661487263709707, 0.0004339387269211268...",tic AAPL AMGN AXP ... ...
0,2009-01-02,44.910000,46.980000,44.709999,32.475796,7117200,CAT,4,0.870226,32.221773,26.565579,53.661249,129.733673,34.637448,28.123536,27.598979,"[[0.0013661487263709707, 0.0004339387269211268...",tic AAPL AMGN AXP ... ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2891,2020-06-29,112.389999,113.419998,110.889999,109.515297,1161200,TRV,0,1.979405,121.852482,103.617419,52.214112,11.966602,5.469698,107.830948,101.884511,"[[0.0006532274198047049, 0.0004064266826074971...",tic AAPL AMGN AXP ... ...
2891,2020-06-29,288.450012,292.279999,286.619995,284.474396,2356700,UNH,0,-0.260890,306.917214,272.829835,50.851001,-60.105349,8.757684,289.721983,281.664996,"[[0.0006532274198047049, 0.0004064266826074971...",tic AAPL AMGN AXP ... ...
2891,2020-06-29,53.310001,54.709999,53.310001,51.192635,15930300,VZ,0,-0.479121,55.253466,50.014836,46.997136,-94.749134,16.640620,52.273772,52.703160,"[[0.0006532274198047049, 0.0004064266826074971...",tic AAPL AMGN AXP ... ...
2891,2020-06-29,41.439999,42.639999,41.380001,40.211544,5225900,WBA,0,-0.100693,44.020085,37.703802,48.712778,-17.069637,1.500723,40.317714,40.140577,"[[0.0006532274198047049, 0.0004064266826074971...",tic AAPL AMGN AXP ... ...


#### 3.3.2 Portfolio (Environment)

In [39]:
# class StockPortfolioEnv(gym.Env):
class Portfolio(gym.Env):
    """A portfolio/market environment
    Attributes
    ----------
        df: DataFrame
            input data
        stock_dim : int
            number of unique stocks
        hmax : int
            maximum number of shares to trade
        initial_amount : int
            start money
        transaction_cost_pct: float
            transaction cost percentage per trade
        reward_scaling: float
            scaling factor for reward, good for training
        state_space: int
            the dimension of input features
        action_space: int
            equals stock dimension
        tech_indicator_list: list
            a list of technical indicator names
        turbulence_threshold: int
            a threshold to control risk aversion
        day: int
            an increment number to control date
    Methods
    -------
    _sell_stock()
        perform sell action based on the sign of the action
    _buy_stock()
        perform buy action based on the sign of the action
    step()
        at each step the agent will return actions, then 
        we will calculate the reward, and return the next observation.
    reset()
        reset the environment
    render()
        use render to return other functions
    save_asset_memory()
        return account value at each time step
    save_action_memory()
        return actions/positions at each time step
    """
    metadata = {'render.modes': ['human']}

    def __init__(self, 
                df,
                stock_dim,
                hmax,
                initial_amount,
                transaction_cost_pct,
                reward_scaling,
                state_space,
                action_space,
                tech_indicator_list,
                turbulence_threshold=None,
                lookback=LOOKBACK,
                day=0):
        #super(StockEnv, self).__init__()
        #money = 10 , scope = 1
        self.day = day
        self.lookback = lookback
        self.df = df
        self.stock_dim = stock_dim
        self.hmax = hmax
        self.initial_amount = initial_amount
        self.transaction_cost_pct = transaction_cost_pct
        self.reward_scaling = reward_scaling
        self.state_space = state_space
        self.action_space = action_space
        self.tech_indicator_list = tech_indicator_list

        # action_space normalization and shape is self.stock_dim
        self.action_space = spaces.Box(low=0, high=1, shape=(self.action_space,)) 
        self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape=(self.state_space + len(self.tech_indicator_list), self.state_space))

        # load data from a pandas dataframe
        self.data = self.df.loc[self.day, :]
        self.covs = self.data['cov_list'].values[0]
        self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
        self.terminal = False
        self.turbulence_threshold = turbulence_threshold
        # initalize state: inital portfolio return + individual stock return + individual weights
        self.portfolio_value = self.initial_amount

        # memorize portfolio value each step
        self.asset_memory = [self.initial_amount]
        # memorize portfolio return each step
        self.portfolio_return_memory = [0]
        self.actions_memory = [[1/self.stock_dim]*self.stock_dim]
        self.date_memory = [self.data.date.unique()[0]]

    def step(self, actions):
        self.terminal = self.day >= len(self.df.index.unique()) - 1

        if self.terminal:
            df = pd.DataFrame(self.portfolio_return_memory)
            df.columns = ['daily_return']
            plt.plot(df.daily_return.cumsum(), 'r')
            plt.savefig('results/cumulative_reward.png')
            plt.close()
            
            plt.plot(self.portfolio_return_memory, 'r')
            plt.savefig('results/rewards.png')
            plt.close()

            print("=================================")
            print("begin_total_asset:{}".format(self.asset_memory[0]))           
            print("end_total_asset:{}".format(self.portfolio_value))

            df_daily_return = pd.DataFrame(self.portfolio_return_memory)
            df_daily_return.columns = ['daily_return']
            if df_daily_return['daily_return'].std() !=0:
              sharpe = (252**0.5)*df_daily_return['daily_return'].mean()/ \
                       df_daily_return['daily_return'].std()
              print("Sharpe: ",sharpe)
            print("=================================")
            
            return self.state, self.reward, self.terminal, {}
        else:
            weights = self.softmax_normalization(actions) 
            self.actions_memory.append(weights)
            last_day_memory = self.data

            #load next state
            self.day += 1
            self.data = self.df.loc[self.day,:]
            self.covs = self.data['cov_list'].values[0]
            self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
            portfolio_return = sum(((self.data.close.values / last_day_memory.close.values)-1)*weights)
            log_portfolio_return = np.log(sum((self.data.close.values / last_day_memory.close.values)*weights))
            # update portfolio value
            new_portfolio_value = self.portfolio_value*(1+portfolio_return)
            self.portfolio_value = new_portfolio_value

            # save into memory
            self.portfolio_return_memory.append(portfolio_return)
            self.date_memory.append(self.data.date.unique()[0])            
            self.asset_memory.append(new_portfolio_value)

            # the reward is the new portfolio value or end portfolo value
            self.reward = new_portfolio_value
        return self.state, self.reward, self.terminal, {}

    def reset(self):
        self.asset_memory = [self.initial_amount]
        self.day = 0
        self.data = self.df.loc[self.day,:]
        # load states
        self.covs = self.data['cov_list'].values[0]
        self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
        self.portfolio_value = self.initial_amount
        #self.cost = 0
        #self.trades = 0
        self.terminal = False 
        self.portfolio_return_memory = [0]
        self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
        self.date_memory=[self.data.date.unique()[0]] 
        return self.state
    
    def render(self, mode='human'):
        return self.state
        
    def softmax_normalization(self, actions):
        numerator = np.exp(actions)
        denominator = np.sum(np.exp(actions))
        softmax_output = numerator/denominator
        return softmax_output

    def save_asset_memory(self):
        date_list = self.date_memory
        portfolio_return = self.portfolio_return_memory
        #print(len(date_list))
        #print(len(asset_list))
        df_account_value = pd.DataFrame({'date':date_list,'daily_return':portfolio_return})
        return df_account_value

    def save_action_memory(self):
        # date and close price length must match actions length
        date_list = self.date_memory
        df_date = pd.DataFrame(date_list)
        df_date.columns = ['date']
        
        action_list = self.actions_memory
        df_actions = pd.DataFrame(action_list)
        df_actions.columns = self.data.tic.values
        df_actions.index = df_date.date
        #df_actions = pd.DataFrame({'date':date_list,'actions':action_list})
        return df_actions

    def _seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def get_sb_env(self):
        e = DummyVecEnv([lambda: self])
        obs = e.reset()
        return e, obs

In [45]:
# hide
#. was 29 in original notebook !
stock_dimension = len(train['tic'].unique())
state_space = stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")

Stock Dimension: 28, State Space: 28


In [46]:
# hide
config.TECHNICAL_INDICATORS_LIST

['macd',
 'boll_ub',
 'boll_lb',
 'rsi_30',
 'cci_30',
 'dx_30',
 'close_30_sma',
 'close_60_sma']

In [55]:
env_kwargs = {
    "hmax": 100, 
    "initial_amount": INITIAL_AMOUNT, 
    "transaction_cost_pct": TRANSACTION_COST_PCT, 
    "state_space": state_space, 
    "stock_dim": stock_dimension, 
    "tech_indicator_list": config.TECHNICAL_INDICATORS_LIST, 
    "action_space": stock_dimension, 
    "reward_scaling": REWARD_SCALING,
}
# e_train_gym = StockPortfolioEnv(df=train, **env_kwargs)
e_train_gym = Portfolio(df=train, **env_kwargs)

In [56]:
env_train, _ = e_train_gym.get_sb_env()
print(type(env_train))

<class 'stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv'>


#### 3.3.3 Portfolio Manager (Agent)
* We investigate the performance of two models for the *agent*:

  * A2C
  * PPO

Both models are based on algorithm implementations in the *OpenAI Baselines* and *Stable Baselines* libraries.

##### 3.3.3.1 A2C


In [57]:
agent = DRLAgent(env=env_train)
A2C_PARAMS = {
  "n_steps": 10, 
  "ent_coef": 0.005, 
  "learning_rate": 0.0004,
}
model_a2c = agent.get_model(model_name="a2c", model_kwargs=A2C_PARAMS)
model_a2c

{'n_steps': 10, 'ent_coef': 0.005, 'learning_rate': 0.0004}
Using cuda device


<stable_baselines3.a2c.a2c.A2C at 0x7fb1c057a510>

In [58]:
%%time
# train A2C agent
# trained_a2c = agent.train_model(model=model_a2c, tb_log_name='a2c', total_timesteps=40000)
trained_a2c = agent.train_model(model=model_a2c, tb_log_name='a2c', total_timesteps=20_000)

Logging to tensorboard_log/a2c/a2c_1
-------------------------------------
| time/                 |           |
|    fps                | 82        |
|    iterations         | 100       |
|    time_elapsed       | 12        |
|    total_timesteps    | 1000      |
| train/                |           |
|    entropy_loss       | -39.6     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 99        |
|    policy_loss        | 4.31e+08  |
|    reward             | 2037159.8 |
|    std                | 0.997     |
|    value_loss         | 1.45e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 121       |
|    iterations         | 200       |
|    time_elapsed       | 16        |
|    total_timesteps    | 2000      |
| train/                |           |
|    entropy_loss       | -39.6     |
|    explained_variance | 1.79e-07  |
|    learning

##### 3.3.3.2 PPO

In [59]:
agent = DRLAgent(env = env_train)
PPO_PARAMS = {
  "n_steps": 2048,
  "ent_coef": 0.005,
  "learning_rate": 0.001,
  "batch_size": 128,
}
model_ppo = agent.get_model("ppo",model_kwargs = PPO_PARAMS)
model_ppo

{'n_steps': 2048, 'ent_coef': 0.005, 'learning_rate': 0.001, 'batch_size': 128}
Using cuda device


<stable_baselines3.ppo.ppo.PPO at 0x7fb145e6f390>

In [60]:
%%time
# train PPO agent
# trained_ppo = agent.train_model(model=model_ppo, tb_log_name='ppo', total_timesteps=40000)
trained_ppo = agent.train_model(model=model_ppo, tb_log_name='ppo', total_timesteps=20_000)

Logging to tensorboard_log/ppo/ppo_1
----------------------------------
| time/              |           |
|    fps             | 281       |
|    iterations      | 1         |
|    time_elapsed    | 7         |
|    total_timesteps | 2048      |
| train/             |           |
|    reward          | 4340690.0 |
----------------------------------
begin_total_asset:1000000
end_total_asset:6072898.409734788
Sharpe:  0.9517677390580576
-------------------------------------------
| time/                   |               |
|    fps                  | 252           |
|    iterations           | 2             |
|    time_elapsed         | 16            |
|    total_timesteps      | 4096          |
| train/                  |               |
|    approx_kl            | 8.6729415e-09 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -39.7         |
|    explained_variance   | 0             |
|    learning_rate        | 0.00

### 3.4 Evaluation
We now use the most recent data to evaluate the performance of the two models of the agent. This is also referred to as *back-testing* or simply *trading*. This data has never been seen by the training process. The start date of this data is captured in the parameter TRADE_START.

In [63]:
TRADE_START

'2020-07-01'

In [100]:
# hide
env_kwargs

{'action_space': 28,
 'hmax': 100,
 'initial_amount': 1000000,
 'reward_scaling': 0.1,
 'state_space': 28,
 'stock_dim': 28,
 'tech_indicator_list': ['macd',
  'boll_ub',
  'boll_lb',
  'rsi_30',
  'cci_30',
  'dx_30',
  'close_30_sma',
  'close_60_sma'],
 'transaction_cost_pct': 0}

In [64]:
# trade = data_split(df,'2020-07-01', '2021-09-02')
trade = data_split(df, TRADE_START, DATA_END)
# e_trade_gym = StockPortfolioEnv(df=trade, **env_kwargs)
e_trade_gym = Portfolio(df=trade, **env_kwargs)
e_trade_gym

<__main__.Portfolio at 0x7fb145070e10>

In [65]:
baseline_df = get_baseline(
        ticker="^DJI", 
        start='2020-07-01',
        end='2021-09-01')

baseline_df_stats = backtest_stats(baseline_df, value_col_name = 'close')
baseline_returns = get_daily_return(baseline_df, value_col_name="close")

dji_cumpod =(baseline_returns + 1).cumprod() - 1

[*********************100%***********************]  1 of 1 completed
Shape of DataFrame:  (295, 8)
Annual return          0.311845
Cumulative returns     0.374034
Annual volatility      0.140762
Sharpe ratio           2.006165
Calmar ratio           3.491806
Stability              0.950106
Max drawdown          -0.089308
Omega ratio            1.397014
Sortino ratio          2.988706
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.094883
Daily value at risk   -0.016614
dtype: float64


In [68]:
df_daily_return_a2c, df_actions_a2c = DRLAgent.DRL_prediction(model=trained_a2c, environment = e_trade_gym)
df_daily_return_ppo, df_actions_ppo = DRLAgent.DRL_prediction(model=trained_ppo, environment = e_trade_gym)
time_ind = pd.Series(df_daily_return_a2c.date)
a2c_cumpod =(df_daily_return_a2c.daily_return + 1).cumprod() - 1
ppo_cumpod =(df_daily_return_ppo.daily_return + 1).cumprod() - 1
DRL_strat_a2c = convert_daily_return_to_pyfolio_ts(df_daily_return_a2c)
DRL_strat_ppo = convert_daily_return_to_pyfolio_ts(df_daily_return_ppo)

perf_func = timeseries.perf_stats
perf_stats_all_a2c = perf_func(returns=DRL_strat_a2c, factor_returns=DRL_strat_a2c, positions=None, transactions=None, turnover_denom="AGB")
perf_stats_all_ppo = perf_func(returns=DRL_strat_ppo, factor_returns=DRL_strat_ppo, positions=None, transactions=None, turnover_denom="AGB")

begin_total_asset:1000000
end_total_asset:1399571.9069731692
Sharpe:  2.157741096379009
hit end!
begin_total_asset:1000000
end_total_asset:1395976.243353623
Sharpe:  2.0377778142744063
hit end!


In [71]:
# hide
len(df_actions_a2c.columns)

28

#### 3.4.1 Inspect actions
For the sake of interest, we inspect some the actions taken by the A2C agent.

In [73]:
# 
# Inspect the actions taken by the A2C agent (for interest sake)
# A2C actions
df_actions_a2c

Unnamed: 0_level_0,AAPL,AMGN,AXP,BA,CAT,CRM,CSCO,CVX,DIS,GS,HD,HON,IBM,INTC,JNJ,JPM,KO,MCD,MMM,MRK,MSFT,NKE,PG,TRV,UNH,VZ,WBA,WMT
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1
2020-07-01,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714,0.035714
2020-07-02,0.024021,0.024021,0.024021,0.038833,0.031895,0.039480,0.036538,0.065297,0.024021,0.028713,0.024517,0.065297,0.065297,0.048245,0.024021,0.024021,0.034142,0.024021,0.024021,0.024021,0.024021,0.024021,0.065297,0.065297,0.024021,0.046597,0.024021,0.032278
2020-07-06,0.023105,0.062806,0.023105,0.023105,0.059748,0.062806,0.029078,0.062806,0.062806,0.023105,0.023105,0.023105,0.062806,0.023105,0.023105,0.023105,0.023105,0.023105,0.023105,0.023105,0.023105,0.023105,0.062806,0.041030,0.037308,0.062806,0.023105,0.023507
2020-07-07,0.023382,0.023382,0.023382,0.023382,0.023382,0.063559,0.063559,0.063559,0.023382,0.023382,0.023382,0.063559,0.063559,0.063559,0.023382,0.023382,0.027503,0.025064,0.025307,0.023382,0.023382,0.023382,0.063559,0.035947,0.039830,0.050707,0.023382,0.023382
2020-07-08,0.025965,0.070581,0.044694,0.025965,0.025965,0.070581,0.025965,0.070581,0.025965,0.025965,0.025965,0.025965,0.028193,0.025965,0.025965,0.042503,0.038366,0.025965,0.025965,0.025965,0.025965,0.025965,0.070581,0.025965,0.025965,0.070581,0.025965,0.025965
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-08-26,0.022128,0.060151,0.022128,0.060151,0.022128,0.022128,0.022128,0.028515,0.022128,0.022128,0.022128,0.022128,0.060151,0.022128,0.060151,0.060151,0.022128,0.022128,0.053942,0.022128,0.022128,0.022128,0.060151,0.060151,0.060151,0.060151,0.022128,0.022128
2021-08-27,0.024457,0.059850,0.024457,0.057124,0.066480,0.024457,0.037117,0.024457,0.042555,0.024457,0.024457,0.024457,0.054486,0.024457,0.024457,0.024457,0.024457,0.066480,0.024457,0.024457,0.032891,0.024457,0.024457,0.059408,0.024457,0.066480,0.024457,0.041368
2021-08-30,0.024208,0.065804,0.024208,0.024208,0.024208,0.024208,0.061345,0.024208,0.024208,0.024208,0.024208,0.059168,0.024208,0.024208,0.024208,0.065804,0.065804,0.024272,0.024208,0.024208,0.063400,0.024208,0.024208,0.024208,0.065452,0.024208,0.065804,0.027401
2021-08-31,0.025266,0.025266,0.025266,0.026724,0.047380,0.025266,0.068680,0.053696,0.025266,0.025266,0.025266,0.068680,0.031668,0.028940,0.051544,0.025266,0.043039,0.032060,0.025266,0.025266,0.025266,0.025266,0.048233,0.068680,0.029846,0.032170,0.040205,0.025266


Here is a visualization of the A2C agent's actions specifically on AAPL:

In [92]:
fig = go.Figure()
fig.update_layout(width=900, height=600)
fig.add_trace(go.Scatter(x=time_ind, y=df_actions_a2c['AAPL'], mode='lines', name='AAPL A2C'))
# fig.add_trace(go.Scatter(x=time_ind, y=-df_actions_ppo['AAPL'], mode='lines', name='AAPL PPO'))
fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=20,
            color="black"
        ),
        bgcolor="White",
        bordercolor="white",
        borderwidth=2))
fig.update_layout(title={
        'text': "AAPL actions of A2C & PPO",
        'y': 0.87,
        'x': 0.48,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_layout(
    paper_bgcolor='rgba(1, 1, 0, 0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
    xaxis_title="Date",
    yaxis = dict(titlefont=dict(size=26), title="Daily Actions"),
    font=dict(size=15))
# fig.update_layout(font_size = 20)
fig.update_traces(line=dict(width=2))
fig.update_xaxes(showline=True, linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(showline=True,linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='Grey')
fig.show()

#### 3.4.2 Inspect daily return

Here is a visualization of the daily return of the portfolio:

In [93]:
fig = go.Figure()
fig.update_layout(width=900, height=600)
# fig.add_trace(go.Scatter(x=time_ind, y=df_actions_a2c['AAPL'], mode='lines', name='AAPL'))
fig.add_trace(go.Scatter(x=time_ind, y=DRL_strat_a2c, mode='lines', name='A2C'))
fig.add_trace(go.Scatter(x=time_ind, y=DRL_strat_ppo, mode='lines', name='PPO'))
fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=20,
            color="black"
        ),
        bgcolor="White",
        bordercolor="white",
        borderwidth=2))
fig.update_layout(title={
        'text': "Daily Return of A2C & PPO",
        'y': 0.87,
        'x': 0.48,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_layout(
    paper_bgcolor='rgba(1, 1, 0, 0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
    xaxis_title="Date",
    yaxis = dict(titlefont=dict(size=26), title="Daily Return"),
    font=dict(size=15))
# fig.update_layout(font_size = 20)
fig.update_traces(line=dict(width=2))
fig.update_xaxes(showline=True, linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(showline=True,linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='Grey')
fig.show()

#### 3.4.3 Inspect cumulative return

Finally, we inspect the cumulative return of the portfolio brought about by each of the agents. The DJIA index is used as a baseline for reference. Both agents end up with a larger cumulative return than the DJIA.

In [96]:
fig = go.Figure()
fig.update_layout(width=900, height=900)
fig.add_trace(go.Scatter(x=time_ind, y=dji_cumpod, mode='lines', name='DJIA', line=dict(color="#d3d3d3")))
fig.add_trace(go.Scatter(x=time_ind, y=a2c_cumpod, mode='lines', name='A2C'))
fig.add_trace(go.Scatter(x=time_ind, y=ppo_cumpod, mode='lines', name='PPO'))
fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=16,
            color="black"
        ),
        bgcolor="White",
        bordercolor="white",
        borderwidth=2))
fig.update_layout(title={
        'text': "Cumulative Return of A2C & PPO against DJIA",
        'y': 0.92,
        'x': 0.48,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_layout(
    paper_bgcolor='rgba(1, 1, 0, 0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
    xaxis_title="Date",
    yaxis = dict(titlefont=dict(size=26), title="Cumulative Return"),
    font=dict(size=15))
# fig.update_layout(font_size = 20)
fig.update_traces(line=dict(width=2))
fig.update_xaxes(showline=True, linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(showline=True,linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='Grey')
fig.show()