# FinrRL
FinRL is the first open-source framework to demonstrate the great potential of applying deep reinforcement learning in quantitative finance. We help practitioners establish the development pipeline of trading strategies using deep reinforcement learning (DRL). A DRL agent learns by continuously interacting with an environment in a trial-and-error manner, making sequential decisions under uncertainty, and achieving a balance between exploration and exploitation.

In [None]:
## install finrl library
!pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git

In [2]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('Agg')
%matplotlib inline
import datetime
import calendar

from finrl.apps import config
from finrl.neo_finrl.preprocessor.yahoodownloader import YahooDownloader
from finrl.neo_finrl.preprocessor.preprocessors import FeatureEngineer, data_split
from finrl.neo_finrl.env_portfolio_allocation.env_portfolio import StockPortfolioEnv
from finrl.drl_agents.stablebaselines3.models import DRLAgent
from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline,convert_daily_return_to_pyfolio_ts


import sys
sys.path.append("../FinRL-Library")

  'Module "zipline.assets" not found; multipliers will not be applied'


In [3]:
import os
if not os.path.exists("./" + config.DATA_SAVE_DIR):
    os.makedirs("./" + config.DATA_SAVE_DIR)
if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
    os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
    os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
if not os.path.exists("./" + config.RESULTS_DIR):
    os.makedirs("./" + config.RESULTS_DIR)

## Download Data

 - Using FinRL to download stock data 
  - Yahoo Finance is a website that provides stock data, financial news, financial reports, etc. All the data provided by Yahoo Finance is free. FinRL uses a class YahooDownloader to fetch data from Yahoo Finance API
Call Limit: Using the Public API (without authentication), you are limited to 2,000 requests per hour per IP (or up to a total of 48,000 requests a day).
 - Downloading excel sheets for gold and long vol data

> Date Range 2004-12-01 to 2021-9-01

### Stocks from Yahoo Finance

In [4]:
# list of tickers required from yahoo finance
tickers = ['^BCOM','^SP500TR', 'EEM', 'IEF' , 'AGG']

In [5]:
df_stocks = YahooDownloader(start_date = '2004-12-01',
                     end_date = '2021-09-01',
                     ticker_list = tickers).fetch_data()

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
Shape of DataFrame:  (21083, 8)


In [6]:
df_stocks.tic = df_stocks.tic.replace({'^BCOM': 'COM', '^SP500TR': 'SNP'})

*AGG dividends is not included based on the above

In [7]:
# get dates of stocks valuations
dates = df_stocks.date.unique()

# extract data from git repo
url = [
       'https://github.com/changjulian17/DataSciencePortfolio/blob/main/Investment_Portfolio/data/gold.xlsx?raw=true',
       'https://github.com/changjulian17/DataSciencePortfolio/blob/main/Investment_Portfolio/data/long-vol.xlsx?raw=true'
]
# extract gold data and format
df_gold = pd.read_excel(url[0],sheet_name='Daily_Indexed')
df_gold = df_gold[['Name', 'US dollar']]
df_gold.columns = ['date','close']       # match col names to stocks
df_gold['tic'] = 'GLD'                   # add ticker data
df_gold = df_gold[df_gold.date.isin(dates)] # slice date range
df_gold.date = df_gold.date.dt.strftime('%Y-%m-%d')  # pass date to string

# extract long-vol data and format
# percentage change is already in excel, so we can skip one step
df_lv = pd.read_excel(url[1])  
df_lv.columns = df_lv.iloc[2]
df_lv = df_lv[3:].set_index('ReturnDate')['Index']
df_lv.index = pd.to_datetime(df_lv.index)# set date as index
df_lv = df_lv.resample('24h').ffill()    # upsample month returns to daily return by averaging
df_lv = df_lv.reset_index()              # set date as column
df_lv.columns = ['date','close']         # match col names to stocks
df_lv = df_lv[df_lv.date.isin(dates)]    # slice date range
df_lv['tic'] = 'LOV'                     # add ticker data
df_lv.date = df_lv.date.dt.strftime('%Y-%m-%d')  # pass date to string

# Preprocess Data
Data preprocessing is a crucial step for training a high quality machine learning model. We need to check for missing data and do feature engineering in order to convert the data into a model-ready state.

- Add technical indicators. In practical trading, various information needs to be taken into account, for example the historical stock prices, current holding shares, technical indicators, etc. In this article, we demonstrate two trend-following technical indicators: MACD and RSI.
- Add turbulence index. Risk-aversion reflects whether an investor will choose to preserve the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the financial turbulence index that measures extreme asset price fluctuation.

All the datasets must have date for all the days that the model runs. This is to ensure all tensor calculations is possible

In [8]:
df_stocks.date.nunique(),df_gold.date.nunique(),df_lv.date.nunique()

(4217, 4217, 4217)

In [9]:
# combine the datasets
df = pd.concat([df_stocks,df_gold,df_lv],axis=0).fillna(0)

In [12]:
# BCOM is missing two days '2005-11-25', '2014-09-08'
# these dates are removed
dates = [date for date in df_stocks.date.unique() if date not in ['2005-11-25', '2014-09-08']]
df = df[df['date'].isin(dates)]

In [14]:
df.date.nunique()

4215

## Get Standard Indicators

In [15]:
fe = FeatureEngineer(
                    use_technical_indicator=True,
                    use_turbulence=False,
                    user_defined_feature = False)

#there is an issue with featureEngineer which omits BCOM
# all df and bcom is processed separately then joined
df = pd.concat([fe.preprocess_data(df)] \
               ,axis=0 \
               ,ignore_index=True) \
                    .sort_values(by='date').reset_index(drop=True)

Successfully added technical indicators


In [16]:
df.shape

(29505, 16)

In [17]:
df.head(7)

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma
0,2004-12-01,101.949997,101.949997,101.510002,59.411003,49000.0,AGG,2.0,0.0,59.623912,59.30932,100.0,-66.666667,100.0,59.411003,59.411003
1,2004-12-01,153.320007,153.809998,149.880005,149.880005,0.0,COM,2.0,0.0,59.623912,59.30932,100.0,-66.666667,100.0,149.880005,149.880005
2,2004-12-01,21.799999,21.872223,21.73889,15.839629,6652800.0,EEM,2.0,0.0,59.623912,59.30932,100.0,-66.666667,100.0,15.839629,15.839629
3,2004-12-01,0.0,0.0,0.0,157.35,0.0,GLD,0.0,0.0,59.623912,59.30932,100.0,-66.666667,100.0,157.35,157.35
4,2004-12-01,84.459999,84.459999,84.099998,54.318985,257600.0,IEF,2.0,0.0,59.623912,59.30932,100.0,-66.666667,100.0,54.318985,54.318985
5,2004-12-01,0.0,0.0,0.0,100.0,0.0,LOV,0.0,0.0,59.623912,59.30932,100.0,-66.666667,100.0,100.0,100.0
6,2004-12-01,1766.900024,1766.900024,1766.900024,1766.900024,0.0,SNP,2.0,0.0,59.623912,59.30932,100.0,-66.666667,100.0,1766.900024,1766.900024


## Get Covariance Matrix as States

The Covariance is calculated based on the price movements in a year. But since this cannot be computed for the first year worth of date therefore data from 2014 will be dropped.

In [18]:
# add covariance matrix as states
df=df.sort_values(['date','tic'],ignore_index=True)
df.index = df.date.factorize()[0]

cov_list = []
return_list = []

# look back is one year
lookback=252  # 252 trading days in a year
for i in range(lookback,len(df.index.unique())):
  data_lookback = df.loc[i-lookback:i,:]
  price_lookback=data_lookback.pivot_table(index = 'date',columns = 'tic', values = 'close')
  return_lookback = price_lookback.pct_change().dropna()
  return_list.append(return_lookback)

  covs = return_lookback.cov().values 
  cov_list.append(covs)

  
df_cov = pd.DataFrame({'date':df.date.unique()[lookback:],'cov_list':cov_list,'return_list':return_list})
df = df.merge(df_cov, on='date')
df = df.sort_values(['date','tic']).reset_index(drop=True)

In [19]:
df.shape

(27741, 18)

In [20]:
df.head(7)

Unnamed: 0,date,open,high,low,close,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2005-12-01,99.900002,100.110001,99.519997,60.696129,520700.0,AGG,3.0,0.022095,61.12589,60.233496,46.969058,-60.765433,26.937726,60.7023,60.953128,"[[4.456546252710798e-06, -2.026681109035418e-0...",tic AGG COM EEM ...
1,2005-12-01,166.789993,170.270004,166.789993,169.940002,0.0,COM,3.0,-1.074083,168.988715,162.73829,52.292436,44.65902,7.052643,167.457001,170.427833,"[[4.456546252710798e-06, -2.026681109035418e-0...",tic AGG COM EEM ...
2,2005-12-01,28.283333,28.833332,28.283333,21.119377,7673100.0,EEM,3.0,0.304364,21.03207,19.770905,60.168543,122.724279,28.843042,19.964131,19.96927,"[[4.456546252710798e-06, -2.026681109035418e-0...",tic AGG COM EEM ...
3,2005-12-01,0.0,0.0,0.0,173.64,0.0,GLD,0.0,2.939711,176.091027,155.306973,66.773536,155.922795,28.843042,164.670333,162.871333,"[[4.456546252710798e-06, -2.026681109035418e-0...",tic AGG COM EEM ...
4,2005-12-01,83.099998,83.239998,82.75,55.536125,321100.0,IEF,3.0,0.045126,56.085167,54.809773,47.859808,-13.470797,18.743677,55.456889,55.809508,"[[4.456546252710798e-06, -2.026681109035418e-0...",tic AGG COM EEM ...
5,2005-12-01,0.0,0.0,0.0,105.212552,0.0,LOV,0.0,0.625454,104.578579,103.450777,84.378224,100.822624,18.743677,102.936849,102.3968,"[[4.456546252710798e-06, -2.026681109035418e-0...",tic AGG COM EEM ...
6,2005-12-01,1910.22998,1910.22998,1910.22998,1910.22998,0.0,SNP,3.0,22.514496,1923.324009,1815.232022,61.531653,124.37693,34.557673,1845.098348,1834.943337,"[[4.456546252710798e-06, -2.026681109035418e-0...",tic AGG COM EEM ...


In [21]:
df.to_pickle('processed_data.pkl')