# AI Alpaca Trading Bot
## Introduction
This notebook demonstrates the process of creating an ensemble trading strategy and testing it on the Dow Jones 30 index. The ensemble is composed of three Deep Reinforcement Learning (DRL) algorithms - Advantage Actor-Critic (A2C), Proximal Policy Optimization (PPO), and Deep Deterministic Policy Gradient (DDPG). The code used in this notebook is based on the [FinRL-Library](https://github.com/AI4Finance-Foundation/FinRL) which is a Python library for financial reinforcement learning developed by AI4Finance-LLC.

### Install Required Packages
We begin by installing the packages required to run this notebook. These packages are:

- `setuptools==64.0.2`: A package for downloading and installing Python packages.
- `swig`: A package required by `wrds` package.
- `wrds`: A package for downloading data from the Wharton Research Data Services.
- `git+https://github.com/AI4Finance-LLC/FinRL-Library.git`: The FinRL-Library package.

In [None]:
!pip3 install setuptools==64.0.2
!apt-get install swig
!pip3 install wrds
!pip3 install git+https://github.com/AI4Finance-LLC/FinRL-Library.git

### Importing Libraries
The first line of the script imports the warnings module, which provides a way to handle warnings that may be encountered during the execution of the script. The second line of the script filters out warnings to avoid clutter in the output.

The next lines of the script import the following libraries:

- `pandas` (`pd`) and `numpy` (`np`) for data analysis and manipulation.
- `matplotlib` for creating visualizations of the data.
- `datetime` for handling date and time information.

### Importing Required Modules
The following modules are then imported:

- `DOW_30_TICKER` from `finrl.config_tickers` to specify a list of tickers for the Dow Jones Industrial Average.
- `YahooDownloader` from `finrl.meta.preprocessor.yahoodownloader` to download financial data from Yahoo Finance.
- `FeatureEngineer` and `data_split` from `finrl.meta.preprocessor.preprocessors` for data pre-processing.
- `StockTradingEnv` from `finrl.meta.env_stock_trading.env_stocktrading` to define a custom environment for stock trading.
- `DRLAgent` and `DRLEnsembleAgent` from `finrl.agents.stablebaselines3.models` for reinforcement learning agents.
- `backtest_stats`, `backtest_plot`, `get_daily_return`, and `get_baseline` from `finrl.plot` for creating plots and calculating performance metrics.
- `pprint` for pretty-printing objects.

### Setting Configuration Variables
The last few lines of the script set configuration variables for data pre-processing, model training, and testing. These include:

- `sys.path.append("../FinRL-Library")` to add the FinRL-Library directory to the system path.
- `check_and_make_directories` from `finrl.main` to create directories for data storage, model training, and testing results.
- `DATA_SAVE_DIR`, `TRAINED_MODEL_DIR`, `TENSORBOARD_LOG_DIR`, and `RESULTS_DIR` for specifying the paths to the data storage, model training, and testing results directories.
- `INDICATORS` to specify a list of technical indicators to be used in feature engineering.
- `TRAIN_START_DATE`, `TRAIN_END_DATE`, `TEST_START_DATE`, `TEST_END_DATE`, `TRADE_START_DATE`, and `TRADE_END_DATE` to specify the start and end dates for training, testing, and trading periods.

In [8]:
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
matplotlib.use('Agg')
import datetime

from finrl.config_tickers import DOW_30_TICKER
from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split
from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.agents.stablebaselines3.models import DRLAgent,DRLEnsembleAgent
from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline

from pprint import pprint

import sys
sys.path.append("../FinRL-Library")

import itertools

import os
from finrl.main import check_and_make_directories
from finrl.config import (
    DATA_SAVE_DIR,
    TRAINED_MODEL_DIR,
    TENSORBOARD_LOG_DIR,
    RESULTS_DIR,
    INDICATORS,
    TRAIN_START_DATE,
    TRAIN_END_DATE,
    TEST_START_DATE,
    TEST_END_DATE,
    TRADE_START_DATE,
    TRADE_END_DATE,
)

check_and_make_directories([DATA_SAVE_DIR, TRAINED_MODEL_DIR, TENSORBOARD_LOG_DIR, RESULTS_DIR])

The `DOW_30_TICKER` variable contains a list of 30 stock tickers of companies that are part of the Dow Jones Industrial Average.

The code defines four date variables for training and testing purposes, namely `TRAIN_START_DATE`, `TRAIN_END_DATE`, `TEST_START_DATE`, and `TEST_END_DATE`.

Then, the code creates a DataFrame object `df` using the `YahooDownloader` class from the `finrl` package. The `YahooDownloader` object takes four parameters, namely `start_date`, `end_date`, `ticker_list`, and `fetch_data()`. The `start_date` and `end_date` parameters are set to `TRAIN_START_DATE` and `TEST_END_DATE`, respectively. The `ticker_list` parameter is set to `DOW_30_TICKER`, which is the list of stock tickers imported earlier. The `fetch_data()` method fetches historical stock price data from Yahoo Finance for the specified ticker list and date range.

After creating the df `DataFrame`, the code prints the first five rows of the `DataFrame` using the `head()` method, followed by the last five rows using the `tail()` method, and then the shape of the `DataFrame` using the `shape` attribute.

Next, the code sorts the `df` `DataFrame` by date and ticker using the `sort_values()` method and prints the first five rows of the sorted DataFrame.

The code then prints the number of unique tickers in the `DataFrame` using the `unique()` method applied to the tic column of the `DataFrame`.

Finally, the code prints the count of each ticker in the `DataFrame` using the `value_counts()` method applied to the tic column of the `DataFrame`.

In [9]:
ticker_list = ['NVDA']
print(ticker_list)

TRAIN_START_DATE = '2009-04-01'
TRAIN_END_DATE = '2022-01-01'
TEST_START_DATE = '2022-01-01'
TEST_END_DATE = '2023-04-01'

df = YahooDownloader(start_date = TRAIN_START_DATE,
                     end_date = TEST_END_DATE,
                     ticker_list = ticker_list).fetch_data()

df.head()

['NVDA']
[*********************100%***********************]  1 of 1 completed
Shape of DataFrame:  (3525, 8)


Unnamed: 0,date,open,high,low,close,volume,tic,day
0,2009-04-01,2.435,2.55,2.3425,2.319224,88792000,NVDA,2
1,2009-04-02,2.6225,2.6925,2.585,2.427042,100286000,NVDA,3
2,2009-04-03,2.6475,2.8375,2.6025,2.596798,100320800,NVDA,4
3,2009-04-06,2.7825,2.8625,2.745,2.60368,88728800,NVDA,0
4,2009-04-07,2.7825,2.8125,2.7225,2.514215,60780400,NVDA,1


In [10]:
df.tail()

Unnamed: 0,date,open,high,low,close,volume,tic,day
3520,2023-03-27,268.369995,270.0,263.649994,265.309998,36102600,NVDA,0
3521,2023-03-28,264.470001,265.130005,258.5,264.100006,35610400,NVDA,1
3522,2023-03-29,268.25,270.779999,265.970001,269.839996,39369400,NVDA,2
3523,2023-03-30,272.290009,274.98999,271.019989,273.829987,36451600,NVDA,3
3524,2023-03-31,271.399994,278.339996,271.049988,277.769989,43324300,NVDA,4


In [11]:
df.shape

(3525, 8)

In [12]:
df.sort_values(['date','tic']).head()

Unnamed: 0,date,open,high,low,close,volume,tic,day
0,2009-04-01,2.435,2.55,2.3425,2.319224,88792000,NVDA,2
1,2009-04-02,2.6225,2.6925,2.585,2.427042,100286000,NVDA,3
2,2009-04-03,2.6475,2.8375,2.6025,2.596798,100320800,NVDA,4
3,2009-04-06,2.7825,2.8625,2.745,2.60368,88728800,NVDA,0
4,2009-04-07,2.7825,2.8125,2.7225,2.514215,60780400,NVDA,1


In [13]:
df.tic.unique()
df.tic.value_counts()

NVDA    3525
Name: tic, dtype: int64

The following code block initializes the INDICATORS list with the names of four technical indicators: `macd`, `rsi_30`, `cci_30`, and `dx_30`.

Next, an instance of the `FeatureEngineer` class is created with the following parameters:

- `use_technical_indicator=True` to specify that technical indicators will be used in feature engineering.
- `tech_indicator_list=INDICATORS` to specify the list of technical indicators to be used.
- `use_turbulence=True` to specify that turbulence index will be used as a feature.
- `user_defined_feature=False` to specify that no additional user-defined features will be used.

The `preprocess_data` method of the `FeatureEngineer` instance is then called with the `df` parameter, which contains financial data in the form of a Pandas `DataFrame`. The resulting preprocessed data is then copied to a new `DataFrame` and missing values are filled with zeros using the `fillna(0)` method. Any infinite values are also replaced with zeros using the `replace(np.inf,0)` method.

The `sample` method is then called on the processed `DataFrame` to display a random sample of five rows of the preprocessed data.

The `stock_dimension` variable is then initialized to the number of unique stock tickers in the processed `DataFrame`, while `state_space` is initialized to a calculated value based on the number of stocks, technical indicators, and other features used. The `print` statement at the end of the script outputs the values of `stock_dimension` and `state_space`.

In [14]:
INDICATORS = ['macd',
               'rsi_30',
               'cci_30',
               'dx_30']

print("==============Preprocessing Data===========")

fe = FeatureEngineer(use_technical_indicator=True,
                     tech_indicator_list = INDICATORS,
                     use_turbulence=False,
                     user_defined_feature = False)

processed = fe.preprocess_data(df)
processed = processed.copy()
processed = processed.fillna(0)
processed = processed.replace(np.inf,0)
processed['turbulence'] = 50
print(processed.sample(5))

stock_dimension = len(processed.tic.unique())
state_space = 1 + 2*stock_dimension + len(INDICATORS)*stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")

#print(max(processed['turbulence']))

Successfully added technical indicators
            date      open      high         low       close    volume   tic  \
1689  2015-12-15    8.2350    8.3000    8.190000    8.048919  30729600  NVDA   
3295  2022-05-03  194.0000  198.2500  191.330002  195.836990  47575100  NVDA   
351   2010-08-23    2.5000    2.5625    2.447500    2.252699  74862000  NVDA   
1633  2015-09-25    5.9325    6.0175    5.845000    5.742071  36969200  NVDA   
873   2012-09-17    3.4675    3.4675    3.327500    3.087711  49334800  NVDA   

      day       macd     rsi_30      cci_30      dx_30  turbulence  
1689    1   0.239482  64.338068   95.056617  27.873623          50  
3295    1 -15.051834  42.289397  -84.430495  26.575841          50  
351     0  -0.047301  43.976644   14.796426   1.003341          50  
1633    4   0.101654  57.649957  131.609190  14.029180          50  
873     0  -0.026802  48.233042  -91.147643   7.425155          50  
Stock Dimension: 1, State Space: 7


The `env_kwargs` dictionary contains the configuration of the `StockTradingEnv`. Here are the definitions of the variables in the dictionary:

- `hmax`: The maximum number of shares that can be traded per action.
- `initial_amount`: The amount of cash with which the agent starts trading.
- `buy_cost_pct`: The cost of buying stocks. This is a percentage of the total value of the stocks purchased.
- `sell_cost_pct`: The cost of selling stocks. This is a percentage of the total value of the stocks sold.
- `state_space`: The dimension of the state space of the environment. It is calculated as `1 + 2 * stock_dimension + len(INDICATORS) * stock_dimension`, where `stock_dimension` is the number of unique stock tickers in the dataset and `INDICATORS` is the list of technical indicators used to preprocess the data.
- `stock_dim`: The number of unique stock tickers in the dataset.
- `tech_indicator_list`: The list of technical indicators used to preprocess the data.
- `action_space`: The dimension of the action space of the environment. It is equal to stock_dimension.
- `reward_scaling`: A scaling factor used to normalize the reward.
- `print_verbosity`: The level of verbosity of the environment.

The `rebalance_window` and `validation_window` variables determine the duration of the rebalance and validation windows, respectively. The rebalance window is the number of days after which the model is retrained, while the validation window is the number of days used for validation and trading.

The `DRLEnsembleAgent` object is used to train and evaluate the ensemble trading strategy. It takes in the preprocessed data, training and validation periods, rebalance and validation windows, and environment configuration as input arguments.

The `A2C_model_kwargs`, `PPO_model_kwargs`, and `DDPG_model_kwargs` Dictionaries contain the hyperparameters for the A2C, PPO, and DDPG models, respectively. The hyperparameters include the learning rate, batch size, number of steps, entropy coefficient, and buffer size.

The `timesteps_dict` dictionary contains the number of training steps for each model. The number of steps is set to 1 in this example.

The `df_summary` `DataFrame` contains the summary statistics for the ensemble trading strategy. The statistics include the Sharpe ratio, annual return, maximum drawdown, and total number of trades.

The `df_trade_date` `DataFrame` contains the unique trade dates for the trading period. The `df_account_value` `DataFrame` contains the account value for each trading day, as well as the portfolio value, daily return, and total return. These values are stored in separate CSV files for each rebalance period.


In [15]:
print(processed.dtypes)

date           object
open          float64
high          float64
low           float64
close         float64
volume          int64
tic            object
day             int64
macd          float64
rsi_30        float64
cci_30        float64
dx_30         float64
turbulence      int64
dtype: object


In [None]:
env_kwargs = {
    "hmax": 100, 
    "initial_amount": 100000, 
    "buy_cost_pct": 0.001, 
    "sell_cost_pct": 0.001, 
    "state_space": state_space, 
    "stock_dim": stock_dimension, 
    "tech_indicator_list": INDICATORS,
    "action_space": stock_dimension, 
    "reward_scaling": 1e-4,
    "print_verbosity":5
}

rebalance_window = 3 # rebalance_window is the number of days to retrain the model
validation_window = 3 # validation_window is the number of days to do validation and trading (e.g. if validation_window=63, then both validation and trading period will be 63 days)

ensemble_agent = DRLEnsembleAgent(df=processed,
                 train_period=(TRAIN_START_DATE,TRAIN_END_DATE),
                 val_test_period=(TEST_START_DATE,TEST_END_DATE),
                 rebalance_window=rebalance_window, 
                 validation_window=validation_window, 
                 **env_kwargs)


A2C_model_kwargs = {
                    'n_steps': 5,
                    'ent_coef': 0.005,
                    'learning_rate': 0.0007
                    }

PPO_model_kwargs = {
                    "ent_coef":0.01,
                    "n_steps": 2, #2048
                    "learning_rate": 0.00025,
                    "batch_size": 128
                    }

DDPG_model_kwargs = {
                      #"action_noise":"ornstein_uhlenbeck",
                      "buffer_size": 1, #10_000
                      "learning_rate": 0.0005,
                      "batch_size": 64
                    }

timesteps_dict = {'a2c' : 1, #10_000 each
                 'ppo' : 1, 
                 'ddpg' : 1
                 }

The code block performs an ensemble strategy run using an instance of the `DRLEnsembleAgent` class called `ensemble_agent`. This ensemble agent is trained to combine the predictions of multiple Deep Reinforcement Learning (DRL) models for better performance in stock trading.

The `run_ensemble_strategy` method of the `DRLEnsembleAgent` instance is called with the following parameters:

- `A2C_model_kwargs`, `PPO_model_kwargs`, and `DDPG_model_kwargs`: dictionaries containing keyword arguments that will be used to instantiate A2C, PPO, and DDPG models, respectively. These arguments can include hyperparameters such as learning rate, discount factor, number of hidden layers, etc.
- `timesteps_dict`: a dictionary specifying the number of timesteps for training and testing each model. This can be useful for comparing performance of models with different training lengths.
The `run_ensemble_strategy` method executes the ensemble strategy run and returns a summary of the results, which is stored in the `df_summary` DataFrame. This summary includes statistics such as total return, Sharpe ratio, maximum drawdown, and other performance metrics for the ensemble strategy.

In [None]:
df_summary = ensemble_agent.run_ensemble_strategy(A2C_model_kwargs,
                                                 PPO_model_kwargs,
                                                 DDPG_model_kwargs,
                                                 timesteps_dict)

df_summary

This code block performs an analysis of the performance of the trading strategy over a test period. The first step is to identify the unique trading dates within the test period using the `unique_trade_date` variable. This is achieved by filtering the processed DataFrame to include only dates that are greater than `TEST_START_DATE` and less than or equal to `TEST_END_DATE`, and then selecting only the unique dates using the `unique()` method.

The `df_trade_date` DataFrame is then created to store these unique trading dates in a column named `datadate`. An empty DataFrame called `df_account_value` is also initialized to store the account value data from each rebalancing period.

A loop is then executed to read the account value data from the rebalancing periods and concatenate it into `df_account_value`. The loop iterates over each rebalancing period, which has a length of `rebalance_window + validation_window`. The `pd.read_csv()` function reads the CSV file that contains the account value data for the corresponding rebalancing period and saves it to a temporary DataFrame called `temp`. The `df_account_value` DataFrame is then concatenated with `temp` using the `pd.concat()` function to append the data from the current rebalancing period to the overall DataFrame. The `ignore_index=True` parameter ensures that the indices of the original DataFrames are not used in the concatenated DataFrame.

Finally, the Sharpe ratio of the trading strategy is calculated using the formula `sharpe = (252**0.5)*df_account_value.account_value.pct_change(1).mean()/df_account_value.account_value.pct_change(1).std()`. The Sharpe ratio is a measure of risk-adjusted return that is commonly used to evaluate investment strategies. It is calculated as the ratio of the average excess return earned over the risk-free rate per unit of volatility or standard deviation of returns. In this case, the daily returns of the trading strategy are used to calculate the Sharpe ratio. The Sharpe ratio is printed to the console using the `print()` function.

In [None]:
unique_trade_date = processed[(processed.date > TEST_START_DATE)&(processed.date <= TEST_END_DATE)].date.unique()

df_trade_date = pd.DataFrame({'datadate':unique_trade_date})
df_account_value = pd.DataFrame()

for i in range(rebalance_window+validation_window, len(unique_trade_date)+1,rebalance_window):
    temp = pd.read_csv('results/account_value_trade_{}_{}.csv'.format('ensemble',i))
    df_account_value = pd.concat([df_account_value, temp], ignore_index=True)

sharpe=(252**0.5)*df_account_value.account_value.pct_change(1).mean()/df_account_value.account_value.pct_change(1).std()

print('Sharpe Ratio: ',sharpe)

Following code block aims to plot the account value over time for the rebalancing periods in the `df_account_value` DataFrame. To achieve this, `df_account_value` is joined with `df_trade_date` on the datadate column. The `validation_window` number of rows from the beginning of `df_trade_date` are skipped using the `df_trade_date[validation_window:]` slicing syntax. The `reset_index()` method is called on the sliced DataFrame to reset the index to start from zero, and the `drop=True` parameter is used to drop the original index column.

The resulting DataFrame is stored back in `df_account_value`. This ensures that both DataFrames have the same number of rows, which is required for plotting.

Next, the `head()` method is called on `df_account_value` to display the first few rows of the DataFrame. This provides an overview of the data, including the account value and the corresponding dates for each rebalancing period.

Finally, the `account_value` column of `df_account_value` is selected and plotted using the `plot()` method. This generates a line plot of the account value over time, with the x-axis representing the dates and the y-axis representing the account value. The plot provides a visual representation of the performance of the trading strategy over the rebalancing periods. It can be used to identify trends, patterns, and anomalies in the account value data.

In [None]:
df_account_value=df_account_value.join(df_trade_date[validation_window:].reset_index(drop=True))

df_account_value.head()

In [None]:
%matplotlib inline
df_account_value.account_value.plot()

Backtesting is the process of evaluating a trading strategy using historical data to see how it would have performed in the past. It is an essential step in developing and refining trading strategies and can help traders to identify potential risks and opportunities.

The `backtest_stats()` function is called on the `df_account_value` DataFrame to calculate the performance statistics for the trading strategy. This function takes the account value data as input and calculates various performance metrics such as total return, annualized return, Sharpe ratio, and maximum drawdown. The resulting performance statistics are stored in the `perf_stats_all` variable.

The `perf_stats_all` variable is then converted to a pandas DataFrame using the `pd.DataFrame()` function. This converts the performance statistics into a tabular format that is easier to read and analyze.

Finally, the backtest results are printed to the console using the `print()` function. This provides a summary of the performance of the trading strategy, including the various performance metrics calculated by the `backtest_stats()` function. The current date and time are also calculated using the `datetime.datetime.now()` function and the `strftime()` method to format the output.

In [None]:
print("==============Get Backtest Results===========")
now = datetime.datetime.now().strftime('%Y%m%d-%Hh%M')

perf_stats_all = backtest_stats(account_value=df_account_value)
perf_stats_all = pd.DataFrame(perf_stats_all)

This code block calculates the performance statistics for a baseline trading strategy and compares it with the performance of the trading strategy used in the previous code block.

The `get_baseline()` function is called to download the historical price data for the Dow Jones Industrial Average (^DJI) index, which is commonly used as a benchmark for the performance of the stock market. This function takes the start and end dates as input and returns a DataFrame containing the historical price data for the specified time period.

The `backtest_stats()` function is then called on the `baseline_df` DataFrame to calculate the performance statistics for the baseline trading strategy. This function takes the price data as input and calculates various performance metrics such as total return, annualized return, Sharpe ratio, and maximum drawdown. The resulting performance statistics are stored in the stats variable.

Comparing the backtest results of the baseline strategy with the performance of the trading strategy used in the previous code block can help to evaluate the effectiveness of the trading strategy relative to the overall market. If the trading strategy outperforms the baseline strategy, it may indicate that the strategy has a significant edge in the market. Conversely, if the trading strategy underperforms the baseline strategy, it may suggest that the strategy needs further optimization or refinement.

In [None]:
print("==============Get Baseline Stats===========")
baseline_df = get_baseline(
        ticker="NVDA", 
        start = df_account_value.loc[0,'date'],
        end = df_account_value.loc[len(df_account_value)-1,'date'])

stats = backtest_stats(baseline_df, value_col_name = 'close')

This code block compares the backtest results obtained from the trading strategy to the performance of the Dow Jones Industrial Average (DJIA) over the same period.

The `backtest_plot` function takes three arguments:

- `df_account_value`: A DataFrame containing the account value over the period of the backtest.
- `baseline_ticker`: A string indicating the ticker symbol for the baseline index. In this case, it is set to '^DJI', which represents the DJIA.
- `baseline_start` and `baseline_end`: Strings representing the start and end dates for the baseline index data. In this case, they are set to the start and end dates of the trading period.

The function plots two lines on the same graph:

- The first line represents the account value of the trading strategy over the backtest period.
- The second line represents the value of the baseline index (DJIA) over the same period.

This allows for a direct comparison of the performance of the trading strategy with that of the benchmark index.

In [None]:
print("==============Compare to DJIA===========")
# S&P 500: ^GSPC
# Dow Jones Index: ^DJI
# NASDAQ 100: ^NDX
backtest_plot(df_account_value, 
              baseline_ticker = '^DJI', 
              baseline_start = df_account_value.loc[0,'date'],
              baseline_end = df_account_value.loc[len(df_account_value)-1,'date'])