# "Dynamic Portfolio Management with Deep Reinforcement Learning (Portfolio of Fidelity® Funds)"

> DRL is used to provide dynamic asset allocation for a portfolio
- toc: true
- branch: master
- badges: false
- comments: true
- hide: false
- search_exclude: true
- metadata_key1: metadata_value1
- metadata_key2: metadata_value2
- image: images/PortfolioManagement-FidelityFunds.png
- categories: [Investment_Industry,  Reinforcement_Learning,A2C,PPO,  OpenAI,Gym,finrl,stable_baselines3,pyfolio]
- show_tags: true

In [None]:
# hide
# Based on FinRL--explainable deep reinforcement learning for portfolio management--an empirical approach_ANNO.ipynb
# Remove the 'explainable' content - could not find a notebook where only the portfolio management tutorial is captured.
# Based on this tutorial:
# https://towardsdatascience.com/finrl-for-quantitative-finance-tutorial-for-portfolio-allocation-9b417660c7cd

## 1. Problem: Dynamic Asset Allocation

*Asset allocation* refers to the partitioning of available funds in a portfolio among various investment products. Investment products may be categorized in multiple ways. One common (and high level) classification contains three classes: 

* Stocks
* Bonds (fixed income)
* Cash and cash equivalents

Other classification systems may sub-classify each of these categories into subclasses, for example stocks into US and International stocks, or into small-cap, medium-cap, and large-cap stocks, etc.

Another aspect of asset allocation is concerns the underlying *strategy*. Some strategies are:

* Tactical asset allocation
* Strategic asset allocation
* Constant weight asset allocation
* Integrated asset allocation
* Insured asset allocation
* Dynamic asset allocation

We will focus on the last of these strategies: Dynamic asset allocation. With this approach the combination of assets are adjusted at regular intervals to capitalize on the strengthening and weakening of the economy and rise and fall of markets. This strategy depends on the decisions of a portfolio manager. In this project, however, we will attempt to replace the portfolio manager with an *AI agent*. The agent will be trained on past market behavior by means of *Deep Reinforcement Learning*.

For this Proof-Of-Concept (POC) project we will make things as simple as possible. The portfolio will consist of 9 Fidelity mutual funds. We will use the DJIA as a handy reference against which the performance of our agent's dynamic behavior can be compared. Our agent will have the opportunity to adjust the mix of these funds on a daily basis. In practice, a 401K agent, for example, might be setup to make monthly adjustments to reduce transation costs or to avoid constraints related to trading. For simplicity, trading costs will not be taken into account for now.

We have selected a mutual fund from a number of fund categories:

* Large Value
  * Fidelity® Blue Chip Value Fund (FBCVX)

* Small/Mid Value
  * Fidelity® Mid-Cap Value Fund (FSMVX)

* Income-Oriented
  * Fidelity® Dividend Growth Fund (FDGFX)

* Large Blend
  * Fidelity® US Low Volatility Equity Fund (FULVX)

* Small/Mid Blend
  * Fidelity® Stock Selector Small-Cap Fund (FDSCX)

* Go-Anywhere
  * Fidelity® Capital Appreciation Fund (FDCAX)

* Large Growth
  * Fidelity® Growth Discovery Fund (FDSVX)

* Small/Mid Growth
  * Fidelity® Small-Cap Growth Fund (FCPGX)

* Diversifiers
  * Fidelity® Founders Fund (FIFNX)

Our agent will have $ 1,000,000 when the project starts.

To implement this POC we will make use of the FinRL framework as well as some Yahoo technology to acquire financial data.

## 2. Solution Proposal  

Investment decisions are sequential by nature. Furthermore, an optimal decision in the present may turn out not to be optimal over the longer term. Then there are the complexities of the investment landscape like varying market conditions, political disruptive events, and other economic uncertainties. The ideal tool for this kind of problem is **Reinforcement Learning**.

The solution requires the setup of a **digital twin** for the investor's portfolio. In RL terms this model of the portfolio is called an **environment**. The environment contains **states** which are modified by applying **actions** to it.

We will choose the following *state vector* (measured daily in our case) for the environment:

$$ \Large
\begin{aligned}
s_1 &= \text{Value of FBCVX holdings} \\
s_2 &= \text{Value of FSMVX holdings} \\
s_3 &= \text{Value of FDGFX holdings} \\
\text{...} \\
s_8 &= \text{Value of FCPGX holdings} \\
s_9 &= \text{Value of FIFNX holdings}
\end{aligned}
$$

The following *action vector* (applied daily in our case) will be setup to influence the environment/portfolio:

$$ \Large
\begin{aligned}
a_1 &= \text{Fraction to be invested in FBCVX this cycle} \\
a_2 &= \text{Fraction to be invested in FSMVX this cycle} \\
a_3 &= \text{Fraction to be invested in FDGFX this cycle} \\
\text{...} \\
a_8 &= \text{Fraction to be invested in FCPGX this cycle} \\
a_9 &= \text{Fraction to be invested in FIFNX this cycle}
\end{aligned}
$$

All action values are in the [0, 1] interval.

The *reward* *r* is given by

$$ \Large
\begin{aligned}
r(s,a,s') &= log(v'/v)
\end{aligned}
$$

where *v* and *v'* are the portfolio value at states $s'$ and $s$ respectively.

The model of the portfolio/environment will have the following **parameters**:

$$ \Large 
\begin{aligned}
\theta_1 &= \text{Initial Amount} \\
\theta_2 &= \text{Transaction Cost Pct} \\
\theta_3 &= \text{Size of State and Action Spaces} \\
\theta_4 &= \text{Reward Scaling} \\
\theta_5 &= \text{Technical Indicator List}
\end{aligned}
$$

## 3. Implementation of the Solution

To implement the *environment* we use the OpenAI Gym tools. We will use the FinRL python library to implement the *agent*. For a function approximator for the agent, two algorithms will be investigated: 

* Advantage Actor-Critic (A2C)
* Proximal Policy Optimization (PPO)

This implementation allows the agent to allocate the investor's funds between the 30 DJIA  instruments. The goal is to maximize net worth at the end of the investment horizon.

The plotly library will be used for visualization. The code will run on the Google Colab platform. To start with, we install the python packages needed.

In [None]:
# hide
# install plotly and finrl library
!pip install plotly==4.4.1
!wget https://github.com/plotly/orca/releases/download/v1.2.1/orca-1.2.1-x86_64.AppImage -O /usr/local/bin/orca
!chmod +x /usr/local/bin/orca
!apt-get install xvfb libgtk2.0-0 libgconf-2-4
!pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git
!pip install PyPortfolioOpt

--2021-11-19 15:08:13--  https://github.com/plotly/orca/releases/download/v1.2.1/orca-1.2.1-x86_64.AppImage
Resolving github.com (github.com)... 140.82.121.3
Connecting to github.com (github.com)|140.82.121.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/99037241/9dc3a580-286a-11e9-8a21-4312b7c8a512?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20211119%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20211119T150813Z&X-Amz-Expires=300&X-Amz-Signature=b952476c497cf1f16fd402204acd4317bb511e91a51cd8c2ac92419cd447cded&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=99037241&response-content-disposition=attachment%3B%20filename%3Dorca-1.2.1-x86_64.AppImage&response-content-type=application%2Foctet-stream [following]
--2021-11-19 15:08:13--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/99037241/9dc3a580-286a-11e9-8a21-4312b7c

Import the packages needed:

In [None]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('Agg')
%matplotlib inline
import datetime

In [None]:
from finrl.apps import config
from finrl.neo_finrl.preprocessor.yahoodownloader import YahooDownloader
from finrl.neo_finrl.preprocessor.preprocessors import FeatureEngineer, data_split
from finrl.neo_finrl.env_portfolio_allocation.env_portfolio import StockPortfolioEnv
from finrl.drl_agents.stablebaselines3.models import DRLAgent
from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline,convert_daily_return_to_pyfolio_ts

import gym
from gym.utils import seeding
from gym import spaces
from stable_baselines3.common.vec_env import DummyVecEnv

from pyfolio import timeseries
import plotly
import plotly.graph_objs as go

  'Module "zipline.assets" not found; multipliers will not be applied'


In [None]:
import sys
sys.path.append("../FinRL-Library")

In [None]:
# hide
pd.set_option('display.max_rows', 100)

Setup some directories:

In [None]:
import os
if not os.path.exists("./" + config.DATA_SAVE_DIR):
    os.makedirs("./" + config.DATA_SAVE_DIR)
if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
    os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
    os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
if not os.path.exists("./" + config.RESULTS_DIR):
    os.makedirs("./" + config.RESULTS_DIR)

### 3.0 Parameters

In [None]:
DATA_START = '2008-01-01' 
TRAIN_START = '2009-01-01'
TRADE_START = '2020-07-01'
DATA_END = '2021-09-01'
LOOKBACK = 252 #trading days in one year
INITIAL_AMOUNT = 1_000_000 #dollars
TRANSACTION_COST_PCT = 0
REWARD_SCALING = 1e-1 #scaling factor applied to the reward signal

### 3.1 Download Data
We use the data from Yahoo Finance.


In [None]:
# hide
config

<module 'finrl.apps.config' from '/usr/local/lib/python3.7/dist-packages/finrl/apps/config.py'>

In [None]:
# hide
config.DOW_30_TICKER

['AXP',
 'AMGN',
 'AAPL',
 'BA',
 'CAT',
 'CSCO',
 'CVX',
 'GS',
 'HD',
 'HON',
 'IBM',
 'INTC',
 'JNJ',
 'KO',
 'JPM',
 'MCD',
 'MMM',
 'MRK',
 'MSFT',
 'NKE',
 'PG',
 'TRV',
 'UNH',
 'CRM',
 'VZ',
 'V',
 'WBA',
 'WMT',
 'DIS',
 'DOW']

In [None]:
# https://www.fidelity.com/mutual-funds/fidelity-funds/overview
my_tickers = [
  # Large Value
  'FBCVX', #Fidelity® Blue Chip Value Fund
  # 'FLVEX', #Fidelity® Large-Cap Value Enhanced Index Fund
  # 'FSLVX', #Fidelity® Stock Selector Large-Cap Value Fund
  # 'FVDFX', #Fidelity® Value Discovery Fund

  # Small/Mid Value
  # 'FLPSX', #Fidelity® Low-Priced Stock Fund
  'FSMVX', #Fidelity® Mid-Cap Value Fund
  # 'FDVLX', #Fidelity® Value Fund
  # 'FSLSX', #Fidelity® Value Strategies Fund
  # 'FCPVX', #Fidelity® Small-Cap Value Fund

  # Income-Oriented
  # 'FEQTX', #Fidelity® Equity Dividend Income Fund
  # 'FEQIX', #Fidelity® Equity-Income Fund
  # 'FGRIX', #Fidelity® Growth & Income Portfolio Fund
  'FDGFX', #Fidelity® Dividend Growth Fund

  # Large Blend
  # 'FSEBX', #Fidelity® Sustainability U.S. Equity Fund, NEW
  'FULVX', #Fidelity® US Low Volatility Equity Fund
  # 'FDEQX', #Fidelity® Disciplined Equity Fund
  # 'FLCEX', #Fidelity® Large-Cap Core Enhanced Index Fund
  # 'FLCSX', #Fidelity® Large-Cap Stock Fund
  # 'FGRTX', #Fidelity® Mega-Cap Stock Fund

  # Small/Mid Blend
  # 'FMEIX', #Fidelity® Mid-Cap Enhanced Index Fund
  # 'FCPEX', #Fidelity® Small-Cap Enhanced Index Fund
  # 'FSLCX', #Fidelity® Small-Cap Stock Fund
  'FDSCX', #Fidelity® Stock Selector Small-Cap Fund
  # 'FSCRX', #Fidelity® Small-Cap Discovery Fund

  # Go-Anywhere
  'FDCAX', #Fidelity® Capital Appreciation Fund
  # 'FCNTX', #Fidelity® Contrafund®
  # 'FMAGX', #Fidelity® Magellan® Fund
  # 'FMILX', #Fidelity® New Millennium Fund

  # Large Growth
  # 'FBGRX', #Fidelity® Blue Chip Growth Fund
  # 'FEXPX', #Fidelity® Export & Multinational Fund
  # 'FTQGX', #Fidelity® Focused Stock Fund
  # 'FFIDX', #Fidelity® Fund
  'FDSVX', #Fidelity® Growth Discovery Fund
  # 'FDGRX', #Fidelity® Growth Company Fund
  # 'FLGEX', #Fidelity® Large-Cap Growth Enhanced Index Fund
  # 'FOCPX', #Fidelity® OTC Portfolio
  # 'FDSSX', #Fidelity® Stock Selector All Cap Fund
  # 'FTRNX', #Fidelity® Trend Fund

  # Small/Mid Growth
  # 'FDEGX', #Fidelity® Growth Strategies Fund
  # 'FMCSX', #Fidelity® Mid-Cap Stock Fund
  # 'FSSMX', #Fidelity® Stock Selector Mid-Cap Fund
  'FCPGX', #Fidelity® Small-Cap Growth Fund

  # Diversifiers
  # 'FWOMX', #Fidelity® Women's Leadership Fund
  'FIFNX', #Fidelity® Founders Fund
  # 'FLVCX', #Fidelity® Leveraged Company Stock Fund
]

In [None]:
df = YahooDownloader(start_date = DATA_START,
                     end_date = DATA_END,
                     # ticker_list = config.DOW_30_TICKER).fetch_data()
                     ticker_list = my_tickers).fetch_data()

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
Shape of DataFrame:  (25183, 8)


In [None]:
df

Unnamed: 0,date,open,high,low,close,volume,tic,day
0,2008-01-02,14.420000,14.420000,14.420000,11.721002,0,FBCVX,2
1,2008-01-02,15.660000,15.660000,15.660000,6.663240,0,FCPGX,2
2,2008-01-02,26.370001,26.370001,26.370001,11.040798,0,FDCAX,2
3,2008-01-02,28.930000,28.930000,28.930000,10.950327,0,FDGFX,2
4,2008-01-02,19.670000,19.670000,19.670000,11.192782,0,FDSCX,2
...,...,...,...,...,...,...,...,...
25178,2021-08-31,36.700001,36.700001,36.700001,36.700001,0,FDSCX,1
25179,2021-08-31,56.790001,56.790001,56.790001,56.790001,0,FDSVX,1
25180,2021-08-31,19.110001,19.110001,19.110001,19.110001,0,FIFNX,1
25181,2021-08-31,29.379999,29.379999,29.379999,29.379999,0,FSMVX,1


In [None]:
# hide
print(df['day'].unique())
# df.loc[100:150, 'day'] #assume day-of-week: 0 - 4

[2 3 4 0 1]


In [None]:
# 
# Verify 2 unique tickers
# https://www.investopedia.com/ask/answers/who-or-what-is-dow-jones/
lst = list(df['tic'].unique())
print(len(lst), lst)

9 ['FBCVX', 'FCPGX', 'FDCAX', 'FDGFX', 'FDSCX', 'FDSVX', 'FSMVX', 'FIFNX', 'FULVX']


### 3.2 Data Understanding and Preparation
We will keep showing snippets of the data set as it evolves to assist with understanding. There is a need to check for missing data and also to do some feature engineering. We rely on the FeatureEngineer class to take care of these needs. Some indicators used are:

* Moving Average Convergence Divergence (MACD)

The MACD is primarily used to gauge the strength of stock price movement. It does this by measuring the divergence of two exponential moving averages (EMAs), commonly a 12-period EMA and a 26-period EMA.

* Relative Strength Index (RSI)

The RSI aims to indicate whether a market is considered to be overbought or oversold in relation to recent price levels. 

* Commodity Channel Index (CCI)

The Commodity Channel Index​ (CCI) is a momentum-based oscillator used to help determine when an investment vehicle is reaching a condition of being overbought or oversold.

FinRL also uses the financial turbulence index that measures extreme asset price fluctuation.

#### 3.2.1 Add technical indicators

In [None]:
%%time
fe = FeatureEngineer(
  use_technical_indicator=True,
  use_turbulence=False,
  user_defined_feature=False)
df = fe.preprocess_data(df)

Successfully added technical indicators
CPU times: user 15.6 s, sys: 1.76 s, total: 17.4 s
Wall time: 15.7 s


In [None]:
# df.head(100)
df

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma
0,2008-01-02,14.420000,14.420000,14.420000,11.721002,0,FBCVX,2,0.000000,11.743293,11.674326,0.000000,-66.666667,100.000000,11.721002,11.721002
3441,2008-01-02,15.660000,15.660000,15.660000,6.663240,0,FCPGX,2,0.000000,11.743293,11.674326,0.000000,-66.666667,100.000000,6.663240,6.663240
6882,2008-01-02,26.370001,26.370001,26.370001,11.040798,0,FDCAX,2,0.000000,11.743293,11.674326,0.000000,-66.666667,100.000000,11.040798,11.040798
10323,2008-01-02,28.930000,28.930000,28.930000,10.950327,0,FDGFX,2,0.000000,11.743293,11.674326,0.000000,-66.666667,100.000000,10.950327,10.950327
13764,2008-01-02,19.670000,19.670000,19.670000,11.192782,0,FDSCX,2,0.000000,11.743293,11.674326,0.000000,-66.666667,100.000000,11.192782,11.192782
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10322,2021-08-31,50.099998,50.099998,50.099998,50.099998,0,FDCAX,1,0.485560,50.233594,47.837405,63.290963,186.262675,33.935129,48.798333,48.066166
13763,2021-08-31,37.290001,37.290001,37.290001,35.175362,0,FDGFX,1,0.160866,35.429981,34.401928,56.769455,108.062182,14.914749,34.818795,34.498705
17204,2021-08-31,36.700001,36.700001,36.700001,36.700001,0,FDSCX,1,0.355394,36.956232,34.422768,56.811784,158.442705,22.592461,35.431666,35.356833
20645,2021-08-31,56.790001,56.790001,56.790001,56.790001,0,FDSVX,1,0.616046,56.891673,53.754767,63.869985,13.372480,1.499019,55.064085,54.107896


We see that the FeatureEngineer has added some features:

* macd
* boll_ub (upper Bollinger Band)
* boll_lb (lower Bollinger Band)
* rsi_30 (with a lookback of 30)
* cci_30 (with a lookback of 30)
* dx_30 (with a lookback of 30)
* close_30_sma (close price simple moving average with a lookback of 30)
* close_60_sma (close price simple moving average with a lookback of 60)

In [None]:
# hide
# on stockstats library: (seems like FeatureEngineer makes use of it)
# https://medium.com/codex/this-python-library-will-help-you-get-stock-technical-indicators-in-one-line-of-code-c11ed2c8e45f

#### 3.2.2 Add covariance matrix as a feature

Adding the portfolio's covariance matrix as a feature has some advantages. It can be used to quantify the risk (standard deviation) associated with a portfolio.

In [None]:
df = df.sort_values(['date','tic'], ignore_index=True)

In [None]:
df

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma
0,2008-01-02,14.420000,14.420000,14.420000,11.721002,0,FBCVX,2,0.000000,11.743293,11.674326,0.000000,-66.666667,100.000000,11.721002,11.721002
1,2008-01-02,15.660000,15.660000,15.660000,6.663240,0,FCPGX,2,0.000000,11.743293,11.674326,0.000000,-66.666667,100.000000,6.663240,6.663240
2,2008-01-02,26.370001,26.370001,26.370001,11.040798,0,FDCAX,2,0.000000,11.743293,11.674326,0.000000,-66.666667,100.000000,11.040798,11.040798
3,2008-01-02,28.930000,28.930000,28.930000,10.950327,0,FDGFX,2,0.000000,11.743293,11.674326,0.000000,-66.666667,100.000000,10.950327,10.950327
4,2008-01-02,19.670000,19.670000,19.670000,11.192782,0,FDSCX,2,0.000000,11.743293,11.674326,0.000000,-66.666667,100.000000,11.192782,11.192782
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
24082,2021-08-31,50.099998,50.099998,50.099998,50.099998,0,FDCAX,1,0.485560,50.233594,47.837405,63.290963,186.262675,33.935129,48.798333,48.066166
24083,2021-08-31,37.290001,37.290001,37.290001,35.175362,0,FDGFX,1,0.160866,35.429981,34.401928,56.769455,108.062182,14.914749,34.818795,34.498705
24084,2021-08-31,36.700001,36.700001,36.700001,36.700001,0,FDSCX,1,0.355394,36.956232,34.422768,56.811784,158.442705,22.592461,35.431666,35.356833
24085,2021-08-31,56.790001,56.790001,56.790001,56.790001,0,FDSVX,1,0.616046,56.891673,53.754767,63.869985,13.372480,1.499019,55.064085,54.107896


In [None]:
df.index = df.date.factorize()[0] #. now each new date has a new index

In [None]:
df

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma
0,2008-01-02,14.420000,14.420000,14.420000,11.721002,0,FBCVX,2,0.000000,11.743293,11.674326,0.000000,-66.666667,100.000000,11.721002,11.721002
0,2008-01-02,15.660000,15.660000,15.660000,6.663240,0,FCPGX,2,0.000000,11.743293,11.674326,0.000000,-66.666667,100.000000,6.663240,6.663240
0,2008-01-02,26.370001,26.370001,26.370001,11.040798,0,FDCAX,2,0.000000,11.743293,11.674326,0.000000,-66.666667,100.000000,11.040798,11.040798
0,2008-01-02,28.930000,28.930000,28.930000,10.950327,0,FDGFX,2,0.000000,11.743293,11.674326,0.000000,-66.666667,100.000000,10.950327,10.950327
0,2008-01-02,19.670000,19.670000,19.670000,11.192782,0,FDSCX,2,0.000000,11.743293,11.674326,0.000000,-66.666667,100.000000,11.192782,11.192782
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3440,2021-08-31,50.099998,50.099998,50.099998,50.099998,0,FDCAX,1,0.485560,50.233594,47.837405,63.290963,186.262675,33.935129,48.798333,48.066166
3440,2021-08-31,37.290001,37.290001,37.290001,35.175362,0,FDGFX,1,0.160866,35.429981,34.401928,56.769455,108.062182,14.914749,34.818795,34.498705
3440,2021-08-31,36.700001,36.700001,36.700001,36.700001,0,FDSCX,1,0.355394,36.956232,34.422768,56.811784,158.442705,22.592461,35.431666,35.356833
3440,2021-08-31,56.790001,56.790001,56.790001,56.790001,0,FDSVX,1,0.616046,56.891673,53.754767,63.869985,13.372480,1.499019,55.064085,54.107896


In [None]:
# hide
# len(df.index.unique())
# range(lookback, len(df.index.unique()))
lst = [i for i in range(LOOKBACK, len(df.index.unique()))]
lst[:20], lst[-1]

([252,
  253,
  254,
  255,
  256,
  257,
  258,
  259,
  260,
  261,
  262,
  263,
  264,
  265,
  266,
  267,
  268,
  269,
  270,
  271],
 3440)

In [None]:
%%time
cov_list = []
return_list = []
for i in range(LOOKBACK, len(df.index.unique())):
  data_lookback = df.loc[i-LOOKBACK:i, :]
  price_lookback = data_lookback.pivot_table(index='date', columns='tic', values='close')
  return_lookback = price_lookback.pct_change().dropna()
  return_list.append(return_lookback)
  covs = return_lookback.cov().values 
  cov_list.append(covs)

CPU times: user 43.9 s, sys: 581 ms, total: 44.4 s
Wall time: 43.8 s


In [None]:
len(df['date'].unique()), len(df['date'].unique()[LOOKBACK:])

(3441, 3189)

In [None]:
len(cov_list), len(return_list)

(3189, 3189)

In [None]:
# hide
# cov_list[0]
# return_list[:1]

In [None]:
# 
# form a dataframe with the cov_list and return_list
df_cov = pd.DataFrame({'date':df.date.unique()[LOOKBACK:], 'cov_list':cov_list, 'return_list':return_list})
df_cov

Unnamed: 0,date,cov_list,return_list
0,2008-12-31,"[[0.0008916783690050918, 0.0007199340713349982...",tic FBCVX FCPGX FDCAX ... ...
1,2009-01-02,"[[0.0008961331319844811, 0.0007229547753450831...",tic FBCVX FCPGX FDCAX ... ...
2,2009-01-05,"[[0.0008942290687322837, 0.0007206030648001816...",tic FBCVX FCPGX FDCAX ... ...
3,2009-01-06,"[[0.0008951474693283972, 0.0007217056039965993...",tic FBCVX FCPGX FDCAX ... ...
4,2009-01-07,"[[0.0008975149225319093, 0.0007241915142094865...",tic FBCVX FCPGX FDCAX ... ...
...,...,...,...
3184,2021-08-25,"[[8.139554607177268e-05, 7.076161408798078e-05...",tic FBCVX FCPGX FDCAX ... ...
3185,2021-08-26,"[[8.15840000747442e-05, 7.098311940863228e-05,...",tic FBCVX FCPGX FDCAX ... ...
3186,2021-08-27,"[[8.155510926610815e-05, 7.162918231818502e-05...",tic FBCVX FCPGX FDCAX ... ...
3187,2021-08-30,"[[8.162658991218995e-05, 7.155445795151861e-05...",tic FBCVX FCPGX FDCAX ... ...


In [None]:
# 
# merge df_cov with the main dataframe
df = df.merge(df_cov, on='date')
df

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2008-12-31,7.900000,7.900000,7.900000,6.548186,0,FBCVX,2,-0.003032,6.625188,6.032323,48.240961,102.573711,6.437515,6.187183,6.453092,"[[0.0008916783690050918, 0.0007199340713349982...",tic FBCVX FCPGX FDCAX ... ...
1,2008-12-31,8.690000,8.690000,8.690000,3.697545,0,FCPGX,2,0.015371,3.687956,3.319084,48.638490,137.594713,11.647507,3.427356,3.594576,"[[0.0008916783690050918, 0.0007199340713349982...",tic FBCVX FCPGX FDCAX ... ...
2,2008-12-31,15.730000,15.730000,15.730000,6.669960,0,FDCAX,2,0.025542,6.761376,6.140492,48.908868,110.240063,8.794697,6.289111,6.487707,"[[0.0008916783690050918, 0.0007199340713349982...",tic FBCVX FCPGX FDCAX ... ...
3,2008-12-31,15.790000,15.790000,15.790000,6.350790,0,FDGFX,2,0.025277,6.407065,5.710326,48.573492,115.808269,8.665272,5.894904,6.192935,"[[0.0008916783690050918, 0.0007199340713349982...",tic FBCVX FCPGX FDCAX ... ...
4,2008-12-31,10.530000,10.530000,10.530000,6.006501,0,FDSCX,2,0.025076,6.001624,5.402930,48.564497,128.462223,10.677362,5.559625,5.839951,"[[0.0008916783690050918, 0.0007199340713349982...",tic FBCVX FCPGX FDCAX ... ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22318,2021-08-31,50.099998,50.099998,50.099998,50.099998,0,FDCAX,1,0.485560,50.233594,47.837405,63.290963,186.262675,33.935129,48.798333,48.066166,"[[8.137781480231036e-05, 7.150181363650654e-05...",tic FBCVX FCPGX FDCAX ... ...
22319,2021-08-31,37.290001,37.290001,37.290001,35.175362,0,FDGFX,1,0.160866,35.429981,34.401928,56.769455,108.062182,14.914749,34.818795,34.498705,"[[8.137781480231036e-05, 7.150181363650654e-05...",tic FBCVX FCPGX FDCAX ... ...
22320,2021-08-31,36.700001,36.700001,36.700001,36.700001,0,FDSCX,1,0.355394,36.956232,34.422768,56.811784,158.442705,22.592461,35.431666,35.356833,"[[8.137781480231036e-05, 7.150181363650654e-05...",tic FBCVX FCPGX FDCAX ... ...
22321,2021-08-31,56.790001,56.790001,56.790001,56.790001,0,FDSVX,1,0.616046,56.891673,53.754767,63.869985,13.372480,1.499019,55.064085,54.107896,"[[8.137781480231036e-05, 7.150181363650654e-05...",tic FBCVX FCPGX FDCAX ... ...


In [None]:
df = df.sort_values(['date','tic']).reset_index(drop=True)

In [None]:
df

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2008-12-31,7.900000,7.900000,7.900000,6.548186,0,FBCVX,2,-0.003032,6.625188,6.032323,48.240961,102.573711,6.437515,6.187183,6.453092,"[[0.0008916783690050918, 0.0007199340713349982...",tic FBCVX FCPGX FDCAX ... ...
1,2008-12-31,8.690000,8.690000,8.690000,3.697545,0,FCPGX,2,0.015371,3.687956,3.319084,48.638490,137.594713,11.647507,3.427356,3.594576,"[[0.0008916783690050918, 0.0007199340713349982...",tic FBCVX FCPGX FDCAX ... ...
2,2008-12-31,15.730000,15.730000,15.730000,6.669960,0,FDCAX,2,0.025542,6.761376,6.140492,48.908868,110.240063,8.794697,6.289111,6.487707,"[[0.0008916783690050918, 0.0007199340713349982...",tic FBCVX FCPGX FDCAX ... ...
3,2008-12-31,15.790000,15.790000,15.790000,6.350790,0,FDGFX,2,0.025277,6.407065,5.710326,48.573492,115.808269,8.665272,5.894904,6.192935,"[[0.0008916783690050918, 0.0007199340713349982...",tic FBCVX FCPGX FDCAX ... ...
4,2008-12-31,10.530000,10.530000,10.530000,6.006501,0,FDSCX,2,0.025076,6.001624,5.402930,48.564497,128.462223,10.677362,5.559625,5.839951,"[[0.0008916783690050918, 0.0007199340713349982...",tic FBCVX FCPGX FDCAX ... ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22318,2021-08-31,50.099998,50.099998,50.099998,50.099998,0,FDCAX,1,0.485560,50.233594,47.837405,63.290963,186.262675,33.935129,48.798333,48.066166,"[[8.137781480231036e-05, 7.150181363650654e-05...",tic FBCVX FCPGX FDCAX ... ...
22319,2021-08-31,37.290001,37.290001,37.290001,35.175362,0,FDGFX,1,0.160866,35.429981,34.401928,56.769455,108.062182,14.914749,34.818795,34.498705,"[[8.137781480231036e-05, 7.150181363650654e-05...",tic FBCVX FCPGX FDCAX ... ...
22320,2021-08-31,36.700001,36.700001,36.700001,36.700001,0,FDSCX,1,0.355394,36.956232,34.422768,56.811784,158.442705,22.592461,35.431666,35.356833,"[[8.137781480231036e-05, 7.150181363650654e-05...",tic FBCVX FCPGX FDCAX ... ...
22321,2021-08-31,56.790001,56.790001,56.790001,56.790001,0,FDSVX,1,0.616046,56.891673,53.754767,63.869985,13.372480,1.499019,55.064085,54.107896,"[[8.137781480231036e-05, 7.150181363650654e-05...",tic FBCVX FCPGX FDCAX ... ...


### 3.3 Modeling
The portfolio within the market will be modeled by the OpenAI Gym framework. This is referred to as the *environment*. For the *agent*, we will use the FinRL framework. As the agent interacts with the environment it will gradually learn a trading strategy based on the reward function. The agent is rewarded according to the total value of the portfolio.


#### 3.3.1 Training data

In [None]:
TRAIN_START, TRADE_START

('2009-01-01', '2020-07-01')

In [None]:
train = data_split(df, TRAIN_START, TRADE_START)
train

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2009-01-02,8.150000,8.150000,8.150000,6.755406,0,FBCVX,4,0.030432,6.702175,6.010826,49.991970,152.551464,13.863211,6.209512,6.445367,"[[0.0008961331319844811, 0.0007229547753450831...",tic FBCVX FCPGX FDCAX ... ...
0,2009-01-02,8.870000,8.870000,8.870000,3.774134,0,FCPGX,4,0.032454,3.737249,3.305107,49.880034,162.366452,16.518011,3.442107,3.588619,"[[0.0008961331319844811, 0.0007229547753450831...",tic FBCVX FCPGX FDCAX ... ...
0,2009-01-02,16.280001,16.280001,16.280001,6.903177,0,FDCAX,4,0.058199,6.834997,6.136275,51.154864,170.139325,17.925843,6.316292,6.487900,"[[0.0008961331319844811, 0.0007229547753450831...",tic FBCVX FCPGX FDCAX ... ...
0,2009-01-02,16.370001,16.370001,16.370001,6.584068,0,FDGFX,4,0.060919,6.498272,5.695824,50.606719,165.116529,16.838559,5.921930,6.184977,"[[0.0008961331319844811, 0.0007229547753450831...",tic FBCVX FCPGX FDCAX ... ...
0,2009-01-02,10.740000,10.740000,10.740000,6.126287,0,FDSCX,4,0.052369,6.074467,5.390190,49.704302,150.989986,15.152331,5.582884,5.829957,"[[0.0008961331319844811, 0.0007229547753450831...",tic FBCVX FCPGX FDCAX ... ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2892,2020-06-30,35.930000,35.930000,35.930000,33.073399,0,FDCAX,1,0.501019,33.705111,31.615998,57.598799,82.309417,17.475881,32.193710,30.489567,"[[0.000538045795431353, 0.0004554362595919771,...",tic FBCVX FCPGX FDCAX ... ...
2892,2020-06-30,25.700001,25.700001,25.700001,23.743460,0,FDGFX,1,0.034722,26.677363,22.296985,50.881596,-25.411227,0.081819,24.063118,22.812814,"[[0.000538045795431353, 0.0004554362595919771,...",tic FBCVX FCPGX FDCAX ... ...
2892,2020-06-30,22.870001,22.870001,22.870001,22.532381,0,FDSCX,1,0.174035,23.630273,21.395077,53.642964,31.031452,8.899931,22.311030,21.079976,"[[0.000538045795431353, 0.0004554362595919771,...",tic FBCVX FCPGX FDCAX ... ...
2892,2020-06-30,45.220001,45.220001,45.220001,37.346981,0,FDSVX,1,0.708226,37.957114,35.252713,59.415280,99.055416,21.233245,36.033532,33.922269,"[[0.000538045795431353, 0.0004554362595919771,...",tic FBCVX FCPGX FDCAX ... ...


#### 3.3.2 Portfolio (Environment)

In [None]:
class Portfolio(gym.Env):
    """A portfolio/market environment
    Attributes
    ----------
        df: DataFrame
            input data
        stock_dim : int
            number of unique stocks
        hmax : int
            maximum number of shares to trade
        initial_amount : int
            start money
        transaction_cost_pct: float
            transaction cost percentage per trade
        reward_scaling: float
            scaling factor for reward, good for training
        state_space: int
            the dimension of input features
        action_space: int
            equals stock dimension
        tech_indicator_list: list
            a list of technical indicator names
        turbulence_threshold: int
            a threshold to control risk aversion
        day: int
            an increment number to control date
    Methods
    -------
    _sell_stock()
        perform sell action based on the sign of the action
    _buy_stock()
        perform buy action based on the sign of the action
    step()
        at each step the agent will return actions, then 
        we will calculate the reward, and return the next observation.
    reset()
        reset the environment
    render()
        use render to return other functions
    save_asset_memory()
        return account value at each time step
    save_action_memory()
        return actions/positions at each time step
    """
    metadata = {'render.modes': ['human']}

    def __init__(self, 
                df,
                stock_dim,
                hmax,
                initial_amount,
                transaction_cost_pct,
                reward_scaling,
                state_space,
                action_space,
                tech_indicator_list,
                turbulence_threshold=None,
                lookback=LOOKBACK,
                day=0):
        #super(StockEnv, self).__init__()
        #money = 10 , scope = 1
        self.day = day
        self.lookback = lookback
        self.df = df
        self.stock_dim = stock_dim
        self.hmax = hmax
        self.initial_amount = initial_amount
        self.transaction_cost_pct = transaction_cost_pct
        self.reward_scaling = reward_scaling
        self.state_space = state_space
        self.action_space = action_space
        self.tech_indicator_list = tech_indicator_list

        # action_space normalization and shape is self.stock_dim
        self.action_space = spaces.Box(low=0, high=1, shape=(self.action_space,)) 
        self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape=(self.state_space + len(self.tech_indicator_list), self.state_space))

        # load data from a pandas dataframe
        self.data = self.df.loc[self.day, :]
        self.covs = self.data['cov_list'].values[0]
        self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
        self.terminal = False
        self.turbulence_threshold = turbulence_threshold
        # initalize state: inital portfolio return + individual stock return + individual weights
        self.portfolio_value = self.initial_amount

        # memorize portfolio value each step
        self.asset_memory = [self.initial_amount]
        # memorize portfolio return each step
        self.portfolio_return_memory = [0]
        self.actions_memory = [[1/self.stock_dim]*self.stock_dim]
        self.date_memory = [self.data.date.unique()[0]]

    def step(self, actions):
        self.terminal = self.day >= len(self.df.index.unique()) - 1

        if self.terminal:
            df = pd.DataFrame(self.portfolio_return_memory)
            df.columns = ['daily_return']
            plt.plot(df.daily_return.cumsum(), 'r')
            plt.savefig('results/cumulative_reward.png')
            plt.close()
            
            plt.plot(self.portfolio_return_memory, 'r')
            plt.savefig('results/rewards.png')
            plt.close()

            print("=================================")
            print("begin_total_asset:{}".format(self.asset_memory[0]))           
            print("end_total_asset:{}".format(self.portfolio_value))

            df_daily_return = pd.DataFrame(self.portfolio_return_memory)
            df_daily_return.columns = ['daily_return']
            if df_daily_return['daily_return'].std() !=0:
              sharpe = (252**0.5)*df_daily_return['daily_return'].mean()/ \
                       df_daily_return['daily_return'].std()
              print("Sharpe: ",sharpe)
            print("=================================")
            
            return self.state, self.reward, self.terminal, {}
        else:
            weights = self.softmax_normalization(actions) 
            self.actions_memory.append(weights)
            last_day_memory = self.data

            #load next state
            self.day += 1
            self.data = self.df.loc[self.day,:]
            self.covs = self.data['cov_list'].values[0]
            self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
            portfolio_return = sum(((self.data.close.values / last_day_memory.close.values)-1)*weights)
            log_portfolio_return = np.log(sum((self.data.close.values / last_day_memory.close.values)*weights))
            # update portfolio value
            new_portfolio_value = self.portfolio_value*(1+portfolio_return)
            self.portfolio_value = new_portfolio_value

            # save into memory
            self.portfolio_return_memory.append(portfolio_return)
            self.date_memory.append(self.data.date.unique()[0])            
            self.asset_memory.append(new_portfolio_value)

            # the reward is the new portfolio value or end portfolo value
            self.reward = new_portfolio_value
        return self.state, self.reward, self.terminal, {}

    def reset(self):
        self.asset_memory = [self.initial_amount]
        self.day = 0
        self.data = self.df.loc[self.day,:]
        # load states
        self.covs = self.data['cov_list'].values[0]
        self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
        self.portfolio_value = self.initial_amount
        #self.cost = 0
        #self.trades = 0
        self.terminal = False 
        self.portfolio_return_memory = [0]
        self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
        self.date_memory=[self.data.date.unique()[0]] 
        return self.state
    
    def render(self, mode='human'):
        return self.state
        
    def softmax_normalization(self, actions):
        numerator = np.exp(actions)
        denominator = np.sum(np.exp(actions))
        softmax_output = numerator/denominator
        return softmax_output

    def save_asset_memory(self):
        date_list = self.date_memory
        portfolio_return = self.portfolio_return_memory
        #print(len(date_list))
        #print(len(asset_list))
        df_account_value = pd.DataFrame({'date':date_list,'daily_return':portfolio_return})
        return df_account_value

    def save_action_memory(self):
        # date and close price length must match actions length
        date_list = self.date_memory
        df_date = pd.DataFrame(date_list)
        df_date.columns = ['date']
        
        action_list = self.actions_memory
        df_actions = pd.DataFrame(action_list)
        df_actions.columns = self.data.tic.values
        df_actions.index = df_date.date
        #df_actions = pd.DataFrame({'date':date_list,'actions':action_list})
        return df_actions

    def _seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def get_sb_env(self):
        e = DummyVecEnv([lambda: self])
        obs = e.reset()
        return e, obs

In [None]:
# hide
#. was 29 in original notebook !
stock_dimension = len(train['tic'].unique())
state_space = stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")

Stock Dimension: 7, State Space: 7


In [None]:
# hide
config.TECHNICAL_INDICATORS_LIST

['macd',
 'boll_ub',
 'boll_lb',
 'rsi_30',
 'cci_30',
 'dx_30',
 'close_30_sma',
 'close_60_sma']

In [None]:
env_kwargs = {
    "hmax": 100, 
    "initial_amount": INITIAL_AMOUNT, 
    "transaction_cost_pct": TRANSACTION_COST_PCT, 
    "state_space": state_space, 
    "stock_dim": stock_dimension, 
    "tech_indicator_list": config.TECHNICAL_INDICATORS_LIST, 
    "action_space": stock_dimension, 
    "reward_scaling": REWARD_SCALING,
}
e_train_gym = Portfolio(df=train, **env_kwargs)

In [None]:
env_train, _ = e_train_gym.get_sb_env()
print(type(env_train))

<class 'stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv'>


#### 3.3.3 Portfolio Manager (Agent)
* We investigate the performance of two models for the *agent*:

  * A2C
  * PPO

Both models are based on algorithm implementations in the *OpenAI Baselines* and *Stable Baselines* libraries.

In [None]:
# hide
# https://towardsdatascience.com/finrl-for-quantitative-finance-tutorial-for-portfolio-allocation-9b417660c7cd

In [None]:
# hide
# agent = DRLAgent(env=env_train)
# DDPG_PARAMS = {
#   # "n_steps": 10, 
#   # "ent_coef": 0.005, 
#   "learning_rate": 0.0004,
# }
# model_ddpg = agent.get_model(model_name="ddpg", model_kwargs=DDPG_PARAMS)
# model_ddpg

In [None]:
# hide
# %%time
# trained_ddpg = agent.train_model(model=model_ddpg, tb_log_name='ddpg', total_timesteps=1_000)

In [None]:
# hide
# agent = DRLAgent(env=env_train)
# SAC_PARAMS = {
#   # "n_steps": 10, 
#   # "ent_coef": 0.005, 
#   "learning_rate": 0.0004,
# }
# model_sac = agent.get_model(model_name="sac", model_kwargs=SAC_PARAMS)
# model_sac

In [None]:
# hide
# %%time
# trained_sac = agent.train_model(model=model_sac, tb_log_name='sac', total_timesteps=1_000)

In [None]:
# hide
# agent = DRLAgent(env=env_train)
# TD3_PARAMS = {
#   # "n_steps": 10, 
#   # "ent_coef": 0.005, 
#   "learning_rate": 0.0004,
# }
# model_td3 = agent.get_model(model_name="td3", model_kwargs=TD3_PARAMS)
# model_td3

In [None]:
# hide
# %%time
# trained_sac = agent.train_model(model=model_sac, tb_log_name='sac', total_timesteps=1_000)

In [None]:
# hide
# agent = DRLAgent(env=env_train)
# MADDDPG_PARAMS = {
#   # "n_steps": 10, 
#   # "ent_coef": 0.005, 
#   "learning_rate": 0.0004,
# }
# model_madddpg = agent.get_model(model_name="madddpg", model_kwargs=MADDDPG_PARAMS)
# model_madddpg

In [None]:
# hide
# %%time
# trained_madddpg = agent.train_model(model=model_madddpg, tb_log_name='madddpg', total_timesteps=1_000)

##### 3.3.3.1 A2C


In [None]:
agent = DRLAgent(env=env_train)
A2C_PARAMS = {
  "n_steps": 10, 
  "ent_coef": 0.005, 
  "learning_rate": 0.0004,
}
model_a2c = agent.get_model(model_name="a2c", model_kwargs=A2C_PARAMS)
model_a2c

{'n_steps': 10, 'ent_coef': 0.005, 'learning_rate': 0.0004}
Using cuda device


<stable_baselines3.a2c.a2c.A2C at 0x7fc9fc325d10>

In [None]:
%%time
trained_a2c = agent.train_model(model=model_a2c, tb_log_name='a2c', total_timesteps=50_000)

Logging to tensorboard_log/a2c/a2c_1
-------------------------------------
| time/                 |           |
|    fps                | 84        |
|    iterations         | 100       |
|    time_elapsed       | 11        |
|    total_timesteps    | 1000      |
| train/                |           |
|    entropy_loss       | -9.88     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0004    |
|    n_updates          | 99        |
|    policy_loss        | 9.6e+07   |
|    reward             | 1770969.9 |
|    std                | 0.993     |
|    value_loss         | 1.09e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 126       |
|    iterations         | 200       |
|    time_elapsed       | 15        |
|    total_timesteps    | 2000      |
| train/                |           |
|    entropy_loss       | -9.87     |
|    explained_variance | -2.38e-07 |
|    learning

##### 3.3.3.2 PPO

In [None]:
agent = DRLAgent(env = env_train)
PPO_PARAMS = {
  "n_steps": 2048,
  "ent_coef": 0.005,
  "learning_rate": 0.001,
  "batch_size": 128,
}
model_ppo = agent.get_model("ppo",model_kwargs = PPO_PARAMS)
model_ppo

{'n_steps': 2048, 'ent_coef': 0.005, 'learning_rate': 0.001, 'batch_size': 128}
Using cuda device


<stable_baselines3.ppo.ppo.PPO at 0x7fc981276f10>

In [None]:
%%time
# train PPO agent
trained_ppo = agent.train_model(model=model_ppo, tb_log_name='ppo', total_timesteps=50_000)

Logging to tensorboard_log/ppo/ppo_1
----------------------------------
| time/              |           |
|    fps             | 307       |
|    iterations      | 1         |
|    time_elapsed    | 6         |
|    total_timesteps | 2048      |
| train/             |           |
|    reward          | 3199747.8 |
----------------------------------
begin_total_asset:1000000
end_total_asset:3913564.3893383564
Sharpe:  0.6788581038411543
------------------------------------------
| time/                   |              |
|    fps                  | 275          |
|    iterations           | 2            |
|    time_elapsed         | 14           |
|    total_timesteps      | 4096         |
| train/                  |              |
|    approx_kl            | 8.119969e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | 5.96e-08     |
|    learning_rate        | 0.001        |


### 3.4 Evaluation
We now use the most recent data to evaluate the performance of the two models of the agent. This is also referred to as *back-testing* or simply *trading*. This data has never been seen by the training process. The start date of this data is captured in the parameter TRADE_START.

In [None]:
TRADE_START

'2020-07-01'

In [None]:
# hide
env_kwargs

{'action_space': 7,
 'hmax': 100,
 'initial_amount': 1000000,
 'reward_scaling': 0.1,
 'state_space': 7,
 'stock_dim': 7,
 'tech_indicator_list': ['macd',
  'boll_ub',
  'boll_lb',
  'rsi_30',
  'cci_30',
  'dx_30',
  'close_30_sma',
  'close_60_sma'],
 'transaction_cost_pct': 0}

In [None]:
trade = data_split(df, TRADE_START, DATA_END)
e_trade_gym = Portfolio(df=trade, **env_kwargs)
e_trade_gym

<__main__.Portfolio at 0x7fc9816700d0>

In [None]:
baseline_df = get_baseline(
        ticker="^DJI", 
        start=TRADE_START,
        end=DATA_END)
baseline_df_stats = backtest_stats(baseline_df, value_col_name = 'close')
baseline_returns = get_daily_return(baseline_df, value_col_name="close")
dji_cumpod =(baseline_returns + 1).cumprod() - 1

[*********************100%***********************]  1 of 1 completed
Shape of DataFrame:  (295, 8)
Annual return          0.311845
Cumulative returns     0.374034
Annual volatility      0.140762
Sharpe ratio           2.006165
Calmar ratio           3.491806
Stability              0.950106
Max drawdown          -0.089308
Omega ratio            1.397014
Sortino ratio          2.988706
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.094883
Daily value at risk   -0.016614
dtype: float64


In [None]:
df_daily_return_a2c, df_actions_a2c = DRLAgent.DRL_prediction(model=trained_a2c, environment = e_trade_gym)
df_daily_return_ppo, df_actions_ppo = DRLAgent.DRL_prediction(model=trained_ppo, environment = e_trade_gym)
time_ind = pd.Series(df_daily_return_a2c.date)
a2c_cumpod =(df_daily_return_a2c.daily_return + 1).cumprod() - 1
ppo_cumpod =(df_daily_return_ppo.daily_return + 1).cumprod() - 1
DRL_strat_a2c = convert_daily_return_to_pyfolio_ts(df_daily_return_a2c)
DRL_strat_ppo = convert_daily_return_to_pyfolio_ts(df_daily_return_ppo)

perf_func = timeseries.perf_stats
perf_stats_all_a2c = perf_func(returns=DRL_strat_a2c, factor_returns=DRL_strat_a2c, positions=None, transactions=None, turnover_denom="AGB")
perf_stats_all_ppo = perf_func(returns=DRL_strat_ppo, factor_returns=DRL_strat_ppo, positions=None, transactions=None, turnover_denom="AGB")

begin_total_asset:1000000
end_total_asset:1524583.336446113
Sharpe:  2.369076062885316
hit end!
begin_total_asset:1000000
end_total_asset:1507488.2263027485
Sharpe:  2.2708997438017855
hit end!


In [None]:
# hide
len(df_actions_a2c.columns)

7

#### 3.4.1 Inspect actions
For the sake of interest, we inspect some the actions taken by the A2C agent.

In [None]:
# 
# Inspect the actions taken by the A2C agent (for interest sake)
# A2C actions
df_actions_a2c

Unnamed: 0_level_0,FBCVX,FCPGX,FDCAX,FDGFX,FDSCX,FDSVX,FSMVX
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-07-01,0.142857,0.142857,0.142857,0.142857,0.142857,0.142857,0.142857
2020-07-02,0.197429,0.103983,0.103983,0.103983,0.103983,0.103983,0.282655
2020-07-06,0.242351,0.089156,0.089156,0.242351,0.158675,0.089156,0.089156
2020-07-07,0.250948,0.230227,0.092319,0.149551,0.092319,0.092319,0.092319
2020-07-08,0.112299,0.214671,0.078973,0.214671,0.214671,0.078973,0.085741
...,...,...,...,...,...,...,...
2021-08-25,0.257359,0.106575,0.094677,0.094677,0.094677,0.094677,0.257359
2021-08-26,0.270076,0.151744,0.099356,0.099356,0.099356,0.099356,0.180758
2021-08-27,0.222533,0.081865,0.081865,0.222533,0.081865,0.086806,0.222533
2021-08-30,0.124118,0.186260,0.085769,0.233143,0.124187,0.085769,0.160754


Here is a visualization of the A2C agent's actions specifically on two of the funds:

In [None]:
# hide
my_tickers

['FBCVX',
 'FSMVX',
 'FDGFX',
 'FULVX',
 'FDSCX',
 'FDCAX',
 'FDSVX',
 'FCPGX',
 'FIFNX']

In [None]:
fig = go.Figure()
fig.update_layout(width=900, height=600)
fig.add_trace(go.Scatter(x=time_ind, y=df_actions_a2c['FBCVX'], mode='lines', name='FBCVX A2C'))
fig.add_trace(go.Scatter(x=time_ind, y=-df_actions_a2c['FSMVX'], mode='lines', name='FSMVX A2C'))
fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=20,
            color="black"
        ),
        bgcolor="White",
        bordercolor="white",
        borderwidth=2))
fig.update_layout(title={
        'text': "Actions of A2C & PPO",
        'y': 0.87,
        'x': 0.48,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_layout(
    paper_bgcolor='rgba(1, 1, 0, 0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
    xaxis_title="Date",
    yaxis = dict(titlefont=dict(size=26), title="Daily Actions"),
    font=dict(size=15))
# fig.update_layout(font_size = 20)
fig.update_traces(line=dict(width=2))
fig.update_xaxes(showline=True, linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(showline=True,linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='Grey')
fig.show()

#### 3.4.2 Inspect daily return

Here is a visualization of the daily return of the portfolio:

In [None]:
fig = go.Figure()
fig.update_layout(width=900, height=600)
fig.add_trace(go.Scatter(x=time_ind, y=DRL_strat_a2c, mode='lines', name='A2C'))
fig.add_trace(go.Scatter(x=time_ind, y=DRL_strat_ppo, mode='lines', name='PPO'))
fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=20,
            color="black"
        ),
        bgcolor="White",
        bordercolor="white",
        borderwidth=2))
fig.update_layout(title={
        'text': "Daily Return of A2C & PPO",
        'y': 0.87,
        'x': 0.48,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_layout(
    paper_bgcolor='rgba(1, 1, 0, 0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
    xaxis_title="Date",
    yaxis = dict(titlefont=dict(size=26), title="Daily Return"),
    font=dict(size=15))
# fig.update_layout(font_size = 20)
fig.update_traces(line=dict(width=2))
fig.update_xaxes(showline=True, linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(showline=True,linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='Grey')
fig.show()

#### 3.4.3 Inspect cumulative return

Finally, we inspect the cumulative return of the portfolio brought about by each of the agents. The DJIA index is used as a baseline for reference. Both agents end up with a larger cumulative return than the DJIA.

In [None]:
fig = go.Figure()
fig.update_layout(width=900, height=900)
fig.add_trace(go.Scatter(x=time_ind, y=dji_cumpod, mode='lines', name='DJIA', line=dict(color="#a9a9a9")))
fig.add_trace(go.Scatter(x=time_ind, y=a2c_cumpod, mode='lines', name='A2C'))
fig.add_trace(go.Scatter(x=time_ind, y=ppo_cumpod, mode='lines', name='PPO'))
fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=16,
            color="black"
        ),
        bgcolor="White",
        bordercolor="white",
        borderwidth=2))
fig.update_layout(title={
        'text': "Cumulative Return of A2C & PPO against DJIA",
        'y': 0.92,
        'x': 0.48,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_layout(
    paper_bgcolor='rgba(1, 1, 0, 0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
    xaxis_title="Date",
    yaxis = dict(titlefont=dict(size=26), title="Cumulative Return"),
    font=dict(size=15))
# fig.update_layout(font_size = 20)
fig.update_traces(line=dict(width=2))
fig.update_xaxes(showline=True, linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(showline=True,linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='Grey')
fig.show()

### Disclaimer

This content is only meant for research purposes and is not meant to be used in any form of trading. Past performance is no guarantee of future results. If you suffer losses from making use of this content, directly or indirectly, you are the sole person responsible for the losses. The author will not be held responsible in any way.