<a target="_blank" href="https://colab.research.google.com/github/Aaronau667/FinRL-Tutorials/blob/master/1-Introduction/Stock_NeurIPS2018_ElegantRL.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Deep Reinforcement Learning for Stock Trading from Scratch: Multiple Stock Trading

* **Pytorch Version**



# Content

* [1. Task Description](#0)
* [2. Install Python packages](#1)
    * [2.1. Install Packages](#1.1)    
    * [2.2. A List of Python Packages](#1.2)
    * [2.3. Import Packages](#1.3)
    * [2.4. Create Folders](#1.4)
* [3. Download and Preprocess Data](#2)
* [4. Preprocess Data](#3)        
    * [4.1. Technical Indicators](#3.1)
    * [4.2. Perform Feature Engineering](#3.2)
* [5. Build Market Environment in OpenAI Gym-style](#4)  
    * [5.1. Data Split](#4.1)  
    * [5.3. Environment for Training](#4.2)    
* [6. Train DRL Agents](#5)
* [7. Backtesting Performance](#6)  
    * [7.1. BackTestStats](#6.1)
    * [7.2. BackTestPlot](#6.2)   
  

<a id='0'></a>
# Part 1. Task Discription

We train a DRL agent for stock trading. This task is modeled as a Markov Decision Process (MDP), and the objective function is maximizing (expected) cumulative return.

We specify the state-action-reward as follows:

* **State s**: The state space represents an agent's perception of the market environment. Just like a human trader analyzing various information, here our agent passively observes many features and learns by interacting with the market environment (usually by replaying historical data).

* **Action a**: The action space includes allowed actions that an agent can take at each state. For example, a ∈ {−1, 0, 1}, where −1, 0, 1 represent
selling, holding, and buying. When an action operates multiple shares, a ∈{−k, ..., −1, 0, 1, ..., k}, e.g.. "Buy
10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively

* **Reward function r(s, a, s′)**: Reward is an incentive for an agent to learn a better policy. For example, it can be the change of the portfolio value when taking a at state s and arriving at new state s',  i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio values at state s′ and s, respectively


**Market environment**: 30 consituent stocks of Dow Jones Industrial Average (DJIA) index. Accessed at the starting date of the testing period.


The data for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume.


<a id='1'></a>
# Part 2. Install Python Packages

<a id='1.1'></a>
## 2.1. Install packages


In [1]:
## install finrl library
!pip install wrds
!pip install swig
!pip install ccxt==1.66.1
!pip install git+https://github.com/AI4Finance-Foundation/FinRL.git
!pip install pyportfolioopt
!pip install pandas-market-calendars
!pip install zipline-reloaded
## instal elegantrl
!pip install elegantrl



Collecting wrds
  Downloading wrds-3.3.0-py3-none-any.whl.metadata (5.7 kB)
Collecting psycopg2-binary<2.10,>=2.9 (from wrds)
  Downloading psycopg2_binary-2.9.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)
Downloading wrds-3.3.0-py3-none-any.whl (13 kB)
Downloading psycopg2_binary-2.9.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m34.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: psycopg2-binary, wrds
Successfully installed psycopg2-binary-2.9.10 wrds-3.3.0
Collecting swig
  Downloading swig-4.3.1-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (3.5 kB)
Downloading swig-4.3.1-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m29.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: swig
Successfully i


<a id='1.2'></a>
## 2.2. A list of Python packages
* Yahoo Finance API
* pandas
* numpy
* matplotlib
* stockstats
* OpenAI gym
* stable-baselines
* tensorflow
* pyfolio

<a id='1.3'></a>
## 2.3. Import Packages

In [2]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('Agg')
import datetime

%matplotlib inline
from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split
from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.agents.elegantrl.models import DRLAgent
from stable_baselines3.common.logger import configure
from finrl.meta.data_processor import DataProcessor

from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline
from pprint import pprint

from elegantrl.agents import AgentDDPG
from elegantrl.agents import AgentPPO
from elegantrl.agents import AgentTD3
from elegantrl.agents import AgentSAC

import sys
sys.path.append("../FinRL")

import itertools

  PANDAS_VERSION = LooseVersion(pd.__version__)


<a id='1.4'></a>
## 2.4. Create Folders

In [3]:
from finrl import config
from finrl import config_tickers
import os
from finrl.main import check_and_make_directories
from finrl.config import (
    DATA_SAVE_DIR,
    TRAINED_MODEL_DIR,
    TENSORBOARD_LOG_DIR,
    RESULTS_DIR,
    INDICATORS,
    TRAIN_START_DATE,
    TRAIN_END_DATE,
    TEST_START_DATE,
    TEST_END_DATE,
    TRADE_START_DATE,
    TRADE_END_DATE,
)
check_and_make_directories([DATA_SAVE_DIR, TRAINED_MODEL_DIR, TENSORBOARD_LOG_DIR, RESULTS_DIR])



<a id='2'></a>
# Part 3. Download Data
Yahoo Finance provides stock data, financial news, financial reports, etc. Yahoo Finance is free.
* FinRL uses a class **YahooDownloader** in FinRL-Meta to fetch data via Yahoo Finance API
* Call Limit: Using the Public API (without authentication), you are limited to 2,000 requests per hour per IP (or up to a total of 48,000 requests a day).



-----
class YahooDownloader:
    Retrieving daily stock data from
    Yahoo Finance API

    Attributes
    ----------
        start_date : str
            start date of the data (modified from config.py)
        end_date : str
            end date of the data (modified from config.py)
        ticker_list : list
            a list of stock tickers (modified from config.py)

    Methods
    -------
    fetch_data()


In [4]:
# from config.py, TRAIN_START_DATE is a string
TRAIN_START_DATE
# from config.py, TRAIN_END_DATE is a string
TRAIN_END_DATE

'2020-07-31'

In [5]:
TRAIN_START_DATE = '2009-01-01'
TRAIN_END_DATE = '2020-07-01'
TRADE_START_DATE = '2020-07-01'
TRADE_END_DATE = '2021-10-31'


In [6]:
import os
import pandas as pd
from datetime import datetime
import time
import random
import requests

def download_and_save_data(start_date, end_date, ticker_list, data_dir='./data'):

    # Create data storage directory
    if not os.path.exists(data_dir):
        os.makedirs(data_dir)

    # Generate filename (using date range)
    file_name = f"dow30_data_{start_date}_{end_date}.csv"
    file_path = os.path.join(data_dir, file_name)

    # Check if file already exists
    if os.path.exists(file_path):
        print(f"Loading existing data from {file_path}")
        return pd.read_csv(file_path, index_col=0, parse_dates=True)

    print("Downloading data from Yahoo Finance...")
    # Use modified download function
    def download_with_retry(ticker, start_date, end_date, max_retries=3, delay=2):
        for attempt in range(max_retries):
            try:
                data = YahooDownloader(start_date=start_date,
                                     end_date=end_date,
                                     ticker_list=[ticker]).fetch_data()
                return data
            except Exception as e:
                if attempt < max_retries - 1:
                    sleep_time = delay + random.uniform(0, 1)
                    print(f"Download failed for {ticker}, retrying in {sleep_time:.2f} seconds...")
                    time.sleep(sleep_time)
                else:
                    print(f"Failed to download {ticker} after {max_retries} attempts")
                    raise e

    # Download all stock data
    all_data = []
    for ticker in ticker_list:
        try:
            data = download_with_retry(ticker, start_date, end_date)
            all_data.append(data)
            print(f"Successfully downloaded {ticker}")
        except Exception as e:
            print(f"Error downloading {ticker}: {str(e)}")

    # Merge all data
    if all_data:
        final_df = pd.concat(all_data, axis=0)
        # Save to CSV file
        final_df.to_csv(file_path)
        print(f"Data saved to {file_path}")
        return final_df
    else:
        raise Exception("No data was downloaded successfully")

# Use modified function
df = download_and_save_data(
    start_date=TRAIN_START_DATE,
    end_date=TRADE_END_DATE,
    ticker_list=config_tickers.DOW_30_TICKER
)

Downloading data from Yahoo Finance...


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded AXP


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded AMGN


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded AAPL


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded BA


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded CAT


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded CSCO


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded CVX


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded GS


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded HD


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded HON


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded IBM


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded INTC


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded JNJ


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded KO


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded JPM


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded MCD


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded MMM


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded MRK


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded MSFT


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded NKE


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded PG


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded TRV


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded UNH


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded CRM


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded VZ


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded V


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded WBA


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3230, 8)
Successfully downloaded WMT


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

Shape of DataFrame:  (3230, 8)
Successfully downloaded DIS





Shape of DataFrame:  (661, 8)
Successfully downloaded DOW
Data saved to ./data/dow30_data_2009-01-01_2021-10-31.csv


In [8]:
print(config_tickers.DOW_30_TICKER)

['AXP', 'AMGN', 'AAPL', 'BA', 'CAT', 'CSCO', 'CVX', 'GS', 'HD', 'HON', 'IBM', 'INTC', 'JNJ', 'KO', 'JPM', 'MCD', 'MMM', 'MRK', 'MSFT', 'NKE', 'PG', 'TRV', 'UNH', 'CRM', 'VZ', 'V', 'WBA', 'WMT', 'DIS', 'DOW']


In [9]:
df.shape

(94331, 8)

In [10]:
df.sort_values(['date','tic'],ignore_index=True).head()

Price,date,close,high,low,open,volume,tic,day
0,2009-01-02,2.727417,2.736134,2.559415,2.581054,746015200,AAPL,4
1,2009-01-02,40.791451,40.853685,39.933992,40.51485,6547900,AMGN,4
2,2009-01-02,14.929294,15.076038,14.211019,14.342317,10955700,AXP,4
3,2009-01-02,33.941097,34.173623,32.0884,32.103402,7010200,BA,4
4,2009-01-02,30.344687,30.389967,28.921571,29.050946,7117200,CAT,4


# Part 4: Preprocess Data
We need to check for missing data and do feature engineering to convert the data point into a state.
* **Adding technical indicators**. In practical trading, various information needs to be taken into account, such as historical prices, current holding shares, technical indicators, etc. Here, we demonstrate two trend-following technical indicators: MACD and RSI.
* **Adding turbulence index**. Risk-aversion reflects whether an investor prefers to protect the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the turbulence index that measures extreme fluctuation of asset price.

In [11]:
import os

# Create data directory if it doesn't exist
if not os.path.exists("data"):
    os.makedirs("data")


processed_file_path = f"data/dow30_data_{TRAIN_START_DATE}_{TRADE_END_DATE}_processed.csv"

# Check if processed data already exists
if os.path.exists(processed_file_path):
    print(f"Loading existing processed data from {processed_file_path}")
    processed = pd.read_csv(processed_file_path)
else:
    print("Processing data...")
    fe = FeatureEngineer(
                        use_technical_indicator=True,
                        tech_indicator_list = INDICATORS,
                        use_vix=True,
                        use_turbulence=True,
                        user_defined_feature = False)

    processed = fe.preprocess_data(df)

    # Save processed data to data directory
    processed.to_csv(processed_file_path, index=False)
    print(f"Processed data saved to {processed_file_path}")

Processing data...
Successfully added technical indicators


[*********************100%***********************]  1 of 1 completed


Shape of DataFrame:  (3229, 8)
Successfully added vix
Successfully added turbulence index
Processed data saved to data/dow30_data_2009-01-01_2021-10-31_processed.csv


In [12]:
list_ticker = processed["tic"].unique().tolist()
list_date = list(pd.date_range(processed['date'].min(),processed['date'].max()).astype(str))
combination = list(itertools.product(list_date,list_ticker))

processed_full = pd.DataFrame(combination,columns=["date","tic"]).merge(processed,on=["date","tic"],how="left")
processed_full = processed_full[processed_full['date'].isin(processed['date'])]
processed_full = processed_full.sort_values(['date','tic'])

processed_full = processed_full.fillna(0)

In [None]:
processed_full.sort_values(['date','tic'],ignore_index=True).head(10)

Unnamed: 0,date,tic,close,high,low,open,volume,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
0,2009-01-02,AAPL,2.727418,2.736134,2.559416,2.581054,746015200.0,4.0,0.0,2.947757,2.622185,100.0,66.666667,100.0,2.727418,2.727418,39.189999,0.0
1,2009-01-02,AMGN,40.791458,40.853693,39.934,40.514858,6547900.0,4.0,0.0,2.947757,2.622185,100.0,66.666667,100.0,40.791458,40.791458,39.189999,0.0
2,2009-01-02,AXP,14.92929,15.076034,14.211015,14.342313,10955700.0,4.0,0.0,2.947757,2.622185,100.0,66.666667,100.0,14.92929,14.92929,39.189999,0.0
3,2009-01-02,BA,33.941101,34.173627,32.088404,32.103406,7010200.0,4.0,0.0,2.947757,2.622185,100.0,66.666667,100.0,33.941101,33.941101,39.189999,0.0
4,2009-01-02,CAT,30.344677,30.389958,28.921562,29.050937,7117200.0,4.0,0.0,2.947757,2.622185,100.0,66.666667,100.0,30.344677,30.344677,39.189999,0.0
5,2009-01-02,CRM,8.444491,8.489171,7.856207,7.967906,4069200.0,4.0,0.0,2.947757,2.622185,100.0,66.666667,100.0,8.444491,8.444491,39.189999,0.0
6,2009-01-02,CSCO,11.166082,11.192418,10.698635,10.803975,40980600.0,4.0,0.0,2.947757,2.622185,100.0,66.666667,100.0,11.166082,11.166082,39.189999,0.0
7,2009-01-02,CVX,39.716721,40.121573,38.190754,38.528129,13695900.0,4.0,0.0,2.947757,2.622185,100.0,66.666667,100.0,39.716721,39.716721,39.189999,0.0
8,2009-01-02,DIS,20.346144,20.439709,19.138304,19.359458,9796600.0,4.0,0.0,2.947757,2.622185,100.0,66.666667,100.0,20.346144,20.346144,39.189999,0.0
9,2009-01-02,GS,65.680351,66.331402,62.220702,63.606071,14088500.0,4.0,0.0,2.947757,2.622185,100.0,66.666667,100.0,65.680351,65.680351,39.189999,0.0


<a id='4'></a>
# Part 5. Build A Market Environment in OpenAI Gym-style
The training process involves observing stock price change, taking an action and reward's calculation. By interacting with the market environment, the agent will eventually derive a trading strategy that may maximize (expected) rewards.

Our market environment, based on OpenAI Gym, simulates stock markets with historical market data.

## Data Split
We split the data into training set and testing set as follows:

Training data period: 2009-01-01 to 2020-07-01

Trading data period: 2020-07-01 to 2021-10-31


In [13]:
train = data_split(processed_full, TRAIN_START_DATE,TRAIN_END_DATE)
trade = data_split(processed_full, TRADE_START_DATE,TRADE_END_DATE)
print(len(train))
print(len(trade))



83897
9744


In [14]:
train.tail()

Unnamed: 0,date,tic,close,high,low,open,volume,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
2892,2020-06-30,UNH,275.247742,276.647544,268.444696,269.293913,2932900.0,1.0,-0.018627,290.693727,259.441664,52.413051,-20.026341,0.598858,275.481006,268.767353,30.43,12.918767
2892,2020-06-30,V,186.425095,186.984845,183.5202,184.803762,9040100.0,1.0,1.025076,194.257212,180.858031,53.021033,-51.428421,2.103785,187.155987,177.570355,30.43,12.918767
2892,2020-06-30,VZ,41.323421,41.443351,40.746257,41.166011,17414800.0,1.0,-0.358557,44.228631,39.972071,48.097015,-50.671864,8.321557,41.844625,42.215852,30.43,12.918767
2892,2020-06-30,WBA,32.880199,33.027577,32.391533,32.670771,4782100.0,1.0,-0.070742,35.890252,30.733451,48.83019,-14.266547,0.948832,32.963969,32.795457,30.43,12.918767
2892,2020-06-30,WMT,37.177689,37.286321,36.792813,37.003876,20509200.0,1.0,-0.283846,38.250863,36.341647,48.159682,-69.838615,3.557864,37.71103,38.330749,30.43,12.918767


In [15]:
trade.head()

Unnamed: 0,date,tic,close,high,low,open,volume,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
0,2020-07-01,AAPL,88.485008,89.274814,88.436409,88.730458,110737200.0,2.0,2.967003,91.235603,78.913903,62.807139,107.472255,29.811397,82.608454,76.490386,28.620001,53.06782
0,2020-07-01,AMGN,218.33638,219.286352,199.04624,201.562348,6575800.0,2.0,3.306031,211.284524,182.02397,61.279632,272.794887,47.010065,195.339143,196.314668,28.620001,53.06782
0,2020-07-01,AXP,88.493973,91.23206,88.10819,89.62308,3301000.0,2.0,-0.373984,106.116186,84.335737,48.504808,-62.638451,1.752174,93.451014,87.157393,28.620001,53.06782
0,2020-07-01,BA,180.320007,190.610001,180.039993,185.880005,49036700.0,2.0,5.443193,220.721139,160.932863,50.925771,24.220608,15.93644,176.472335,155.614168,28.620001,53.06782
0,2020-07-01,CAT,113.824577,116.840393,113.662048,116.822344,2807800.0,2.0,1.20062,123.232905,106.939033,52.865432,35.366645,14.542298,112.655857,107.215974,28.620001,53.06782


In [16]:
INDICATORS

['macd',
 'boll_ub',
 'boll_lb',
 'rsi_30',
 'cci_30',
 'dx_30',
 'close_30_sma',
 'close_60_sma']

In [17]:
stock_dimension = len(train.tic.unique())
state_space = 1 + 2*stock_dimension + len(INDICATORS)*stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")


Stock Dimension: 29, State Space: 291


In [18]:
buy_cost_list = sell_cost_list = [0.001] * stock_dimension
num_stock_shares = [0] * stock_dimension




## Environment for Training



In [19]:
from finrl.meta.env_stock_trading.env_stocktrading_np import StockTradingEnv
e_train_gym = StockTradingEnv

<a id='5'></a>
# Part 6: Train DRL Agents
* The DRL algorithms are from **Stable Baselines 3**. Users are also encouraged to try **ElegantRL** and **Ray RLlib**.
* FinRL includes fine-tuned standard DRL algorithms, such as DQN, DDPG, Multi-Agent DDPG, PPO, SAC, A2C and TD3. We also allow users to
design their own DRL algorithms by adapting these DRL algorithms.

In [20]:


# Assuming 'processed' DataFrame is already loaded and contains 'date', 'tic', 'close', INDICATORS, and 'turbulence' columns

# Pivot close prices to get a 2D array: rows are dates, columns are tickers
price_array = train.pivot(index="date", columns="tic", values="close").values

# Pivot technical indicators: stack all indicators for all stocks
tech_list = []
for indicator in INDICATORS:
    tech_list.append(train.pivot(index="date", columns="tic", values=indicator).values)
# Stack along the last axis and flatten stocks and indicators
# The reshape here assumes that for each date, all tech indicators for all stocks are concatenated
tech_array = np.stack(tech_list, axis=2)  # shape: (dates, stocks, indicators)
tech_array = tech_array.reshape(tech_array.shape[0], -1)  # shape: (dates, stocks * indicators)


# Pivot turbulence: one value per date
# Use .values to get a numpy array from the pandas Series
turbulence_array = train.groupby("date")["turbulence"].first().values
print("Price array shape:", price_array.shape)
print("Tech array shape:", tech_array.shape)
print("Turbulence array shape:", turbulence_array.shape)

print("\nFirst few rows of price array:")
print(price_array[:5])

print("\nFirst few rows of tech array:")
print(tech_array[:5])

print("\nFirst few values of turbulence array:")
print(turbulence_array[:5])




Price array shape: (2893, 29)
Tech array shape: (2893, 232)
Turbulence array shape: (2893,)

First few rows of price array:
[[ 2.72741723 40.7914505  14.92929363 33.94109726 30.34468651  8.44449234
  11.16607761 39.7167244  20.34614182 65.68034363 16.1486187  22.74680328
  47.71528625  9.5169487  37.49147034 20.9731102  13.85558891 40.46824265
  29.90019608 16.849123   14.89742565 10.76207447 38.793396   30.62788391
  21.92787743 11.87676144 13.93254471 15.30581665 13.43323708]
 [ 2.84252524 41.24783707 15.40813732 34.63116455 29.80778313  8.27818298
  11.2648344  39.78940964 19.98890305 67.20951843 16.53677177 22.62867355
  47.41493988  9.33537483 37.12057495 19.56821823 13.71673203 40.34762192
  29.55162621 16.59367371 15.03665447 10.83914948 38.51541138 30.17388153
  21.57022095 11.96121311 13.06377888 16.07860184 13.27817535]
 [ 2.79564095 40.3419838  16.27315521 34.73618317 29.62666702  8.58845901
  11.71253014 40.14752197 20.67787552 67.15653992 16.90485382 23.65247536
  48.73109

In [21]:
# Import required classes from FinRL and ElegantRL
# Note: Import path may need to be adjusted based on your FinRL installation
from finrl.agents.elegantrl.models import DRLAgent

# Environment setup using FinRL's approach
# Dictionary containing data arrays and environment configuration parameters
env_kwargs_with_data = {
    "hmax": 100,
    "initial_amount": 1000000,
    "num_stock_shares": num_stock_shares,
    "buy_cost_pct": buy_cost_list,
    "sell_cost_pct": sell_cost_list,
    "price_array": price_array,
    "tech_array": tech_array,
    "turbulence_array": turbulence_array,
    "if_train": True,
    "state_space": state_space,
    "stock_dim": stock_dimension,
    "action_space": stock_dimension,
    "tech_indicator_list": INDICATORS,
    "reward_scaling": 1e-4,
}

# Create agent instance
agent = DRLAgent(
    env=e_train_gym,
    price_array=price_array,
    tech_array=tech_array,
    turbulence_array=turbulence_array
)

# Define ElegantRL DDPG algorithm hyperparameters
erl_params = {
    "learning_rate": 3e-5,
    "gamma": 0.99,
    "tau": 0.01,
    "buffer_size": 50000,
    "net_dims": [512, 512],
    "batch_size": 4096,
    "target_step": 5000,
    "eval_gap": 60,
    "eval_times": 1,
    "gpu_id": 0,
}


In [22]:

if_using_a2c = False
if_using_ddpg = True
if_using_ppo = False
if_using_td3 = False
if_using_sac = False

### Agent Training: 5 algorithms (A2C, DDPG, PPO, TD3, SAC)


In [23]:
import os

cpu_count = os.cpu_count()
print(f"Colab 分配的 CPU 线程数: {cpu_count}")

!nvidia-smi

Colab 分配的 CPU 线程数: 2


### Agent 1: A2C


### Agent 2: DDPG

In [24]:

# Get ElegantRL DDPG model instance
# Call get_model method of DRLAgent_erl instance with algorithm name ("ddpg") and ElegantRL hyperparameters (erl_params)
model_name = "ddpg" # Specify the ElegantRL algorithm to train
model = agent.get_model(model_name, model_kwargs = erl_params)

# Define training results save directory
# cwd will be created relative to your current working directory
cwd = RESULTS_DIR + '/' + model_name + '_finrl_integration' # Save to a new folder under RESULTS_DIR

# Set total training steps
total_timesteps = 100000 # Adjust total training steps as needed

# --- Start Training ---
import time
print(f"=== Starting {model_name.upper()} Training ===")
start_time = time.time()

# Call train_model method of DRLAgent instance to start training
# This method will handle the ElegantRL internal training loop
trained_model = agent.train_model(
    model = model, # Pass the ElegantRL model instance obtained above
    cwd = cwd, # Pass the save directory
    total_timesteps = total_timesteps # Pass the total steps parameter
)

print(f"=== {model_name.upper()} Training Complete ===")
print(f"Total time: {time.time() - start_time:.2f} seconds")

=== Starting DDPG Training ===
| train_agent_multiprocessing() with GPU_ID 0
| Arguments Remove cwd: results/ddpg_finrl_integration
=== DDPG Training Complete ===
Total time: 964.84 seconds


### Agent 3: PPO

### Agent 4: TD3

### Agent 5: SAC

In [None]:
import numpy as np

# Data processing
price_array = processed.pivot(index="date", columns="tic", values="close").values

# Process technical indicators
tech_list = []
for indicator in INDICATORS:
    tech_list.append(processed.pivot(index="date", columns="tic", values=indicator).values)
tech_array = np.stack(tech_list, axis=2)  # shape: (dates, stocks, indicators)
tech_array = tech_array.reshape(tech_array.shape[0], -1)  # shape: (dates, stocks*indicators)

# Process turbulence
turbulence_array = processed.groupby("date")["turbulence"].first().values

# Initialize DRLAgent
agent = DRLAgent(
    env=env_train,
    price_array=price_array,
    tech_array=tech_array,
    turbulence_array=turbulence_array
)

# Optimized SAC parameters
model_kwargs = {
    "learning_rate": 0.001,        # Increased learning rate for faster convergence
    "buffer_size": 50000,          # Reduced buffer size to decrease memory usage
    "learning_starts": 500,        # Reduced warm-up steps
    "batch_size": 128,             # Increased batch size for better training efficiency
    "tau": 0.01,                   # Target network update rate
    "gamma": 0.99,                 # Discount factor
    "train_freq": 4,               # Increased training frequency
    "gradient_steps": 4,           # Increased gradient steps per update
    "action_noise": None,          # No action noise
    "optimize_memory_usage": True, # Enabled memory optimization
    "ent_coef": "auto",            # Automatic entropy coefficient
    "target_update_interval": 500, # Reduced target network update interval
    "target_entropy": "auto",      # Automatic target entropy
    "use_sde": False,              # No state-dependent exploration
    "sde_sample_freq": -1,         # No SDE sampling
    "use_sde_at_warmup": False,    # No SDE at warmup
    "policy_kwargs": {
        "net_arch": [128, 128]     # Simplified network architecture
    },
    "verbose": 1                   # Verbosity level
}

# Get SAC model
model_sac = agent.get_model("sac", model_kwargs)

# Set training flag
if_using_sac = True  # Set to True to train SAC model



In [None]:
# Train SAC model
trained_sac = agent.train_model(
    model=model_sac,
    cwd="results/sac",         # Directory to save model and logs
    total_timesteps=50000      # Total number of timesteps to train
) if if_using_sac else None    # Only train if if_using_sac is True

| train_agent_multiprocessing() with GPU_ID 0
| Arguments Remove cwd: results/sac


## In-sample Performance

Assume that the initial capital is $1,000,000.

### Set turbulence threshold
Set the turbulence threshold to be greater than the maximum of insample turbulence data. If current turbulence index is greater than the threshold, then we assume that the current market is volatile

In [25]:
data_risk_indicator = processed_full[(processed_full.date<TRAIN_END_DATE) & (processed_full.date>=TRAIN_START_DATE)]
insample_risk_indicator = data_risk_indicator.drop_duplicates(subset=['date'])

In [26]:
insample_risk_indicator.vix.describe()

Unnamed: 0,vix
count,2893.0
mean,18.824245
std,8.489311
min,9.14
25%,13.33
50%,16.139999
75%,21.309999
max,82.690002


In [27]:
insample_risk_indicator.vix.quantile(0.996)

np.float64(57.40400183105453)

In [28]:
insample_risk_indicator.turbulence.describe()

Unnamed: 0,turbulence
count,2893.0
mean,34.56796
std,43.790811
min,0.0
25%,14.962499
50%,24.124662
75%,39.161588
max,652.505641


In [29]:
insample_risk_indicator.turbulence.quantile(0.996)

np.float64(276.45263915168357)

### Trading (Out-of-sample Performance)

We update periodically in order to take full advantage of the data, e.g., retrain quarterly, monthly or weekly. We also tune the parameters along the way, in this notebook we use the in-sample data from 2009-01 to 2020-07 to tune the parameters once, so there is some alpha decay here as the length of trade date extends.

Numerous hyperparameters – e.g. the learning rate, the total number of samples to train on – influence the learning process and are usually determined by testing some variations.

In [None]:
#e_trade_gym = StockTradingEnv(df = trade, turbulence_threshold = 70,risk_indicator_col='vix', **env_kwargs)
# env_trade, obs_trade = e_trade_gym.get_sb_env()

In [30]:
trade.head()

Unnamed: 0,date,tic,close,high,low,open,volume,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,vix,turbulence
0,2020-07-01,AAPL,88.485008,89.274814,88.436409,88.730458,110737200.0,2.0,2.967003,91.235603,78.913903,62.807139,107.472255,29.811397,82.608454,76.490386,28.620001,53.06782
0,2020-07-01,AMGN,218.33638,219.286352,199.04624,201.562348,6575800.0,2.0,3.306031,211.284524,182.02397,61.279632,272.794887,47.010065,195.339143,196.314668,28.620001,53.06782
0,2020-07-01,AXP,88.493973,91.23206,88.10819,89.62308,3301000.0,2.0,-0.373984,106.116186,84.335737,48.504808,-62.638451,1.752174,93.451014,87.157393,28.620001,53.06782
0,2020-07-01,BA,180.320007,190.610001,180.039993,185.880005,49036700.0,2.0,5.443193,220.721139,160.932863,50.925771,24.220608,15.93644,176.472335,155.614168,28.620001,53.06782
0,2020-07-01,CAT,113.824577,116.840393,113.662048,116.822344,2807800.0,2.0,1.20062,123.232905,106.939033,52.865432,35.366645,14.542298,112.655857,107.215974,28.620001,53.06782


In [32]:

# Pivot close prices to get a 2D array: rows are dates, columns are tickers
price_array = trade.pivot(index="date", columns="tic", values="close").values

# Pivot technical indicators: stack all indicators for all stocks
tech_list = []
for indicator in INDICATORS:
    tech_list.append(trade.pivot(index="date", columns="tic", values=indicator).values)
# Stack along the last axis and flatten stocks and indicators
# The reshape here assumes that for each date, all tech indicators for all stocks are concatenated
tech_array = np.stack(tech_list, axis=2)  # shape: (dates, stocks, indicators)
tech_array = tech_array.reshape(tech_array.shape[0], -1)  # shape: (dates, stocks * indicators)


# Pivot turbulence: one value per date
# Use .values to get a numpy array from the pandas Series
turbulence_array = trade.groupby("date")["turbulence"].first().values
print("Price array shape:", price_array.shape)
print("Tech array shape:", tech_array.shape)
print("Turbulence array shape:", turbulence_array.shape)

print("\nFirst few rows of price array:")
print(price_array[:5])

print("\nFirst few rows of tech array:")
print(tech_array[:5])

print("\nFirst few values of turbulence array:")
print(turbulence_array[:5])


e_trade_gym = StockTradingEnv

Price array shape: (336, 29)
Tech array shape: (336, 232)
Turbulence array shape: (336,)

First few rows of price array:
[[ 88.48500824 218.33638     88.49397278 180.32000732 113.82457733
  190.52481079  39.54502106  70.36942291 111.63095856 176.1073761
  221.37142944 130.24783325  90.75138855  52.42576599 122.84657288
   81.15140533  38.7870903  165.65065002 107.14508057  64.44498444
  196.12014771  91.97579956 106.07865143 101.57627106 277.8420105
  187.01377869  40.97862625  31.70895576  37.14975739]
 [ 88.48500824 221.00654602  88.75743866 180.80999756 115.32344055
  191.16023254  39.48443985  70.92355347 110.81108093 175.95581055
  221.68363953 131.07032776  91.63946533  52.71102524 123.36286163
   81.41501617  38.83901978 164.62797546 108.25492859  64.98946381
  197.61474609  92.94841766 106.87437439 101.54014587 278.3366394
  188.83781433  41.06856918  32.56217575  37.00076675]
 [ 90.85202026 219.30349731  90.87451172 187.91000366 116.86745453
  196.31330872  40.16804886  71.132

<a id='6'></a>
# Part 7: Backtesting Results
Backtesting plays a key role in evaluating the performance of a trading strategy. Automated backtesting tool is preferred because it reduces the human error. We usually use the Quantopian pyfolio package to backtest our trading strategies. It is easy to use and consists of various individual plots that provide a comprehensive image of the performance of a trading strategy.

In [34]:
import torch
from elegantrl.agents import *
from elegantrl.train.config import Config
from elegantrl.train.run import train_agent

MODELS = {
    "ddpg": AgentDDPG,
    "td3": AgentTD3,
    "sac": AgentSAC,
    "ppo": AgentPPO,
    "a2c": AgentA2C,
}

model_name = "ddpg"
stock_dim = price_array.shape[1] # Number of stocks
state_dim = 1 + 2 + 3 * stock_dim + tech_array.shape[1] # State space dimension
action_dim = stock_dim  # Action space dimension

print(f"stock_dim: {stock_dim}")
print(f"tech_array.shape: {tech_array.shape}")
print(f"state_dim: {state_dim}")

# Build environment configuration
config = {
    "price_array": price_array,
    "tech_array": tech_array,
    "turbulence_array": turbulence_array,
    "if_train": False  # Test mode
}

# Create environment instance
e_trade_gym = StockTradingEnv(
    config=config,
    initial_account=1000000,
    max_stock=100,
    buy_cost_pct = buy_cost_list,
    sell_cost_pct = sell_cost_list,
    reward_scaling=1e-4
)
state, _ = e_trade_gym.reset()
# Build env_args needed for DRL_prediction
env_args = {
    "env_num": 1,
    "env_name": "StockEnv",
    "state_dim": state.shape[0],
    "action_dim": action_dim,
    "if_discrete": False,
    "max_step": price_array.shape[0] - 1,
    "price_array": price_array,
    "tech_array": tech_array,
    "turbulence_array": turbulence_array,
    "if_train": False  # Test mode
}
# --- Start Training ---
import time
print(f"=== Starting {model_name.upper()} Test ===")
start_time = time.time()
state, _ = e_trade_gym.reset()
print(f"Initial state shape: {state.shape}")
# Load elegantrl needs state dim, action dim and net dim

cwd = RESULTS_DIR + '/' + model_name + '_finrl_integration'
print("cwd:", cwd)

print("price_array: ",len(price_array))

# Check model file
print(f"Model file exists: {os.path.exists(cwd)}")
print(f"Model file size: {os.path.getsize(cwd) if os.path.exists(cwd) else 'N/A'}")
environment=e_trade_gym
import torch
gpu_id = 0  # >=0 means GPU ID, -1 means CPU
agent_class = MODELS[model_name]
stock_dim = env_args["price_array"].shape[1]
state_dim = 1 + 2 + 3 * stock_dim + env_args["tech_array"].shape[1]
action_dim = stock_dim
env_args = {
    "env_num": 1,
    "env_name": "StockEnv",
    "state_dim": state_dim,
    "action_dim": action_dim,
    "if_discrete": False,
    "max_step": env_args["price_array"].shape[0] - 1,
    "config": env_args,
}

actor_path = f"{cwd}/act.pth"
net_dim = [512, 512]

"""init"""
env = environment
env_class = env
args = Config(agent_class=agent_class, env_class=env_class, env_args=env_args)
args.cwd = cwd
act = agent_class(
    net_dim, env.state_dim, env.action_dim, gpu_id=gpu_id, args=args
).act
parameters_dict = {}
act = torch.load(actor_path, weights_only=False)
for name, param in act.named_parameters():
    parameters_dict[name] = torch.tensor(param.detach().cpu().numpy())

act.load_state_dict(parameters_dict)

if_discrete = env.if_discrete
device = next(act.parameters()).device
state = env.reset()
episode_returns = []  # the cumulative_return / initial_account
episode_total_assets = [env.initial_total_asset]
max_step = env.max_step
state, _ = env.reset()
for steps in range(max_step):
    #print(f"step {steps} before state shape: {state.shape}")
    assert state.shape[0] == 322, f"step {steps} state shape error: {state.shape}"
    s_tensor = torch.as_tensor(
        state, dtype=torch.float32, device=device
    ).unsqueeze(0)
    a_tensor = act(s_tensor).argmax(dim=1) if if_discrete else act(s_tensor)
    #print("a_tensor type:", type(a_tensor), "a_tensor shape:", a_tensor.shape)
    #print("a_tensor:", a_tensor)
    action = a_tensor.detach().cpu().numpy()
    #print("action type:", type(action), "action shape:", action.shape)
    #print("action:", action)

    action = np.asarray(action).flatten().astype(float)
    #print("action after flatten/astype:", action)
    state, reward, done, _, _ = env.step(action)
    #print(f"step {steps} after state shape: {state.shape}")
    total_asset = env.amount + (env.price_ary[env.day] * env.stocks).sum()
    episode_total_assets.append(total_asset)
    episode_return = total_asset / env.initial_total_asset
    episode_returns.append(episode_return)
    if done:
        break
print("Test Finished!")
print("episode_return", episode_return)

# --- Calculate and Print Metrics ---
import pandas as pd
import numpy as np

# Convert total assets to pandas Series for easier calculations
assets_series = pd.Series(episode_total_assets)

# Calculate initial and final capital
initial_capital = assets_series.iloc[0]
final_capital = assets_series.iloc[-1]

# Calculate daily returns
daily_returns = assets_series.pct_change().dropna()

# Calculate Cumulative Return (Return Rate)
cumulative_return = (final_capital / initial_capital) - 1

# Calculate Annualized Return (assuming 252 trading days per year)
annualized_return = (1 + cumulative_return) ** (252 / len(daily_returns)) - 1 if len(daily_returns) > 0 else 0

# Calculate Annualized Volatility (assuming 252 trading days per year)
annualized_volatility = daily_returns.std() * np.sqrt(252) if len(daily_returns) > 0 else 0

# Calculate Sharpe Ratio (assuming 0 risk-free rate)
sharpe_ratio = annualized_return / annualized_volatility if annualized_volatility > 0 else 0

# Calculate Max Drawdown
peak = assets_series.cummax()
drawdown = (assets_series - peak) / peak
max_drawdown = drawdown.min() if len(drawdown) > 0 else 0

print("\n--- Test Results ---")
print(f"Initial Capital: {initial_capital:.2f}")
print(f"Final Capital: {final_capital:.2f}")
print(f"Return Rate (Cumulative Return): {cumulative_return:.4f}")
print(f"Annualized Return: {annualized_return:.4f}")
print(f"Annualized Volatility: {annualized_volatility:.4f}")
print(f"Sharpe Ratio: {sharpe_ratio:.4f}")
print(f"Max Drawdown: {max_drawdown:.4f}") # This is the rate

print(f"=== {model_name.upper()} Test Completed ===")
print(f"Total time: {time.time() - start_time:.2f} seconds")

stock_dim: 29
tech_array.shape: (336, 232)
state_dim: 322
=== Starting DDPG Test ===
Initial state shape: (322,)
cwd: results/ddpg_finrl_integration
price_array:  336
Model file exists: True
Model file size: 4096


TypeError: unsupported operand type(s) for -: 'int' and 'list'

<a id='6.1'></a>
## 7.1 BackTestStats
pass in df_account_value, this information is stored in env class


In [None]:
print("==============Get Backtest Results===========")
now = datetime.datetime.now().strftime('%Y%m%d-%Hh%M')

perf_stats_all = backtest_stats(account_value=df_account_value)
perf_stats_all = pd.DataFrame(perf_stats_all)
perf_stats_all.to_csv("./"+RESULTS_DIR+"/perf_stats_all_"+now+'.csv')

In [None]:
#baseline stats
print("==============Get Baseline Stats===========")
baseline_df = get_baseline(
        ticker="^DJI",
        start = df_account_value.loc[0,'date'],
        end = df_account_value.loc[len(df_account_value)-1,'date'])

stats = backtest_stats(baseline_df, value_col_name = 'close')


In [None]:
df_account_value.loc[0,'date']

In [None]:
df_account_value.loc[len(df_account_value)-1,'date']

<a id='6.2'></a>
## 7.2 BackTestPlot

In [None]:
print("==============Compare to DJIA===========")
%matplotlib inline
# S&P 500: ^GSPC
# Dow Jones Index: ^DJI
# NASDAQ 100: ^NDX
backtest_plot(df_account_value,
             baseline_ticker = '^DJI',
             baseline_start = df_account_value.loc[0,'date'],
             baseline_end = df_account_value.loc[len(df_account_value)-1,'date'])