In [1]:
%matplotlib inline

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

<h1>FinRL - A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance<sup>1</sup></h1>
<hr/>
<h2>Authorts: Xiao-Yang Liu, Hongyang Yang, Qian Chen, Runjia Zhang, Liuqing Yang, Bowen Xiao, Christina Dan Wang</h2>
<h3>Scientific paper summary</h3>

<h3>Abstract</h3>
<h5>Problem Statement</h5>
<p>Deep reinforcement learning (DRL)<sup>2</sup> has been recognized as an effective approach in quantitative finance<sup>3</sup>. 
However, to train a practical DRL trading agent that decides where to trade, at what price, and what quantity involves error-prone and arduous development and debugging. In short it is very hard to implement an entirely new DRL algorithm for stock trading purposes.</p> 
<h5>Problem Solution</h5>    
<p>The paper introduce a DRL library FinRL<sup>4</sup> that facilitates
beginners to expose themselves to quantitative finance and to develop their own
stock trading strategies. Along with easily-reproducible tutorials, FinRL library
allows users to streamline their own developments and to compare with existing
schemes easily.
FinRL is featured with completeness, hands-on tutorial
and reproducibility that favors beginners: 
    <ol>
        <li>At multiple levels of time granularity,
FinRL simulates trading environments across various stock markets, including
            NASDAQ-100<sup>5</sup>, DJIA<sup>6</sup>, S&P 500<sup>7</sup>, HSI<sup>8</sup>, SSE 50<sup>9</sup>, and CSI 300<sup>10</sup>;</li>
        <li>Organized in a layered architecture with modular structure, FinRL provides fine-tuned state-of-the-art DRL algorithms (DQN<sup>11</sup>, DDPG<sup>12</sup>, PPO<sup>13</sup>, SAC<sup>14</sup>, A2C<sup>15</sup>, TD3<sup>16</sup>, etc.), commonly used reward functions and standard evaluation baselines to alleviate the debugging workloads and promote the reproducibility;</li>
        <li>Being highly extendable, FinRL reserves a complete set of user-import interfaces (single stock trading, multiple
stock trading, and portfolio allocation).</li>
        </ol></p>

<h3>Introduction</h3>
<h5>Target - quantitative finance</h5>
<p>In quantitative finance, stock trading is essentially making dynamic decisions, namely to decide where to trade, at what price, and what quantity, over a highly stochastic and complex stock market. As a result, DRL provides useful toolk-
its for stock trading, because it balances exploration (of uncharted territory) and exploitation (of current knowledge).</p>
<h5>What and why is solved?</h5>
<p>Taking many complex financial factors into account, DRL trading agents build a multi-factor model and provide algorithmic trading strategies, which are difficult for human traders. Here is the provided solution - a beginner-friendly library (FinRL) with fine-tuned standard DRL algorithms. It has been developed under three primary principles:
    <ul>
        <li>Completeness</li>
        <li>Hands-on tutorials</li>
        <li>Reproducibility</li>
    </ul>
</p>
<h5>Definitions, names, models, approaches, etc.</h5>
<p>Used models are several variants of DRL algorithms. DRL has been implemented on sentimental analysis on portfolio allocation and liquidation strategy analysis, showing the potential of DRL on various financial tasks. However, to implement a DRL or RL driven trading strategy is nowhere near as easy. The development and debugging processes are arduous and error-prone. 
The proposed three layered FinRL Library streamlines the development stock trading strategies. 
    <ol>
        <li>First layer - <b>environment</b>, which simulates the financial market environment</li>
        <li>Second layer - <b>agent</b> layer that provides fine-tuned standard DRL algorithms (DQN, DDPG, Adaptive DDPG<sup>17</sup>, Multi-Agent DDPG<sup>18</sup>, PPO, SAC, A2C and TD3), commonly used reward functions and standard evaluation baselines</li>
        <li>Third layer - <b>applications</b> in automated stock trading, where are demonstrated three use cases (single stock trading, multiple stock trading and portfolio allocation)</li>
    </ol>
    
The contributions of this paper are summarized as follows:
<ul>
<li>FinRL is an open source library specifically designed and implemented for quantitative
finance.</li>
<li>Trading tasks accompanied by hands-on tutorials with built-in DRL agents are available
in a beginner-friendly and reproducible fashion using Jupyter notebook.</li>
<li>FinRL has good scalability, with a broad range of fine-tuned state-of-the-art DRL algo-
rithms. Adjusting the implementations to the rapid changing stock market is well sup-
ported.</li>
<li>Typical use cases are selected and used to establish a benchmark for the quantitative finance
community. Standard backtesting and evaluation metrics are also provided for easy and
effective performance evaluation.</li>
    </ul></p>

<h3>Related Works</h3>
<h5>State-of-the-Art Algorithms</h5>
<p>Recent works can be categorized into three approaches: value based algorithm, policy based algo-
rithm, and actor-critic based algorithm. The proposed FinRL has consolidated and elaborated upon those algo-
rithms to build financial DRL models. There are a number of machine learning libraries that share
similar features as FinRL library:
    <ul>
        <li>OpenAI Gym (open source library).</li>
        <li>Google Dopamine (features plugability and reusability).</li>
        <li>RLlib (modular framework and is very well maintained).</li>
        <li>Horizon (DL-focused framework dominated by PyTorch).</li>
    </ul>
</p>
<h5>DRL in Finance</h5>
<p>Recent works show that DRL has many applications in quantitative finance. Stock trading is usually considered as one of the most challenging applications due to its noisy and volatile features. Some example of DRL usage in quantitative finance: 
    <ol>
        <li>Volatility scaling to trade futures.</li>
        <li>Headline sentiments and knowledge graphs combination with time series stock data (optimal policy for DRL).</li>
        <li>High Frequency Trading.</li>
        <li>Deep Hedging</li>
    </ol>
DRL approach has two key advantages in quantitative finance, which are <strong>scalability and model independent</strong>.
</p>

<h3>FinRL Architecture</h3>
<p><b><i>Three-layer architecture:</i></b> The three layers of FinRL library are stock market environment,
DRL trading agent, and stock trading applications. The agent layer interacts with the environment layer in an exploration-exploitation manner, whether to repeat prior working-well decisions or to make new actions hoping to get greater rewards. The lower layer provides APIs for the upper layer, making the lower layer transparent to the upper layer.
<img src="https://raw.githubusercontent.com/AI4Finance-Foundation/FinRL/master/docs/source/image/FinRL-Architecture.png" alt="FinRL Architecture" width = 100%/>
    
<b><i>Modularity:</i></b> Each layer includes several modules and each module defines a separate function.
    
<b><i>Simplicity, Applicability and Extendibility:</i></b> Specifically designed for automated stock
trading, FinRL presents DRL algorithms as modules. Each layer includes reserved interfaces that allow users to develop new modules.
    
<b><i>Better Market Environment Modeling:</i></b> Trading simulator that replicates live
stock market and provides backtesting support that incorporates important market frictions
such as transaction cost, market liquidity and the investor’s degree of risk-aversion. All of
those are crucial among key determinants of net returns.

<b><i>Environment: Time-driven Trading Simulator:</i></b> The financial task of FinRL is modeled as a Markov Decision Process (MDP)<sup>19</sup> problem. FinRL library strives to provide trading environments constructed by six datasets across five stock exchanges.

<b><i>State Space, Action Space, and Reward Function</i></b>
    
<i><u>State space $S$.</u></i> The state space describes the observations that the agent receives from the environment. FinRL provides various features for users:
    <ul>
        <li>Balance $b_t \in \mathbb{R}_+$R+: the amount of money left in the account at the current time step $t$.</li>
        <li>Shares own $h_t \in \mathbb{Z}_+^n$: current shares for each stock, $n$ represents the number of stocks.</li>
        <li>Closing price $p_t \in \mathbb{R}^n_+$: one of the most commonly used feature.</li>
        <li>Opening/high/low prices $o_t, h_t, l_t \in \mathbb{R}^n_+$: used to track stock price changes.</li>
        <li>Trading volume $v_t \in \mathbb{R}^n_+$: total quantity of shares traded during a trading slot.</li>
        <li>Technical indicators: Moving Average Convergence Divergence (MACD) $M_t \in \mathbb{R}^n$ and Relative Strength Index (RSI) $R_t \in \mathbb{R}^n_+$, etc.</li>
        <li>Multiple-level of granularity: we allow data frequency of the above features to be daily, hourly or on a minute basis.</li>
    </ul>
    
<i><u>Action space $A$.</u></i> The action space describes the allowed actions that the agent interacts with the environment. Normally, $a \in A$ includes three actions: $a \in \{−1, 0, 1\}$, where $−1, 0, 1$ represent
selling, holding, and buying one stock. Also, an action can be carried upon multiple shares - action space $\{−k, ..., −1, 0, 1, ..., k\}$, where $k$ denotes the number of shares. For example, "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively.
    
<i><u>Reward function</u></i> $r(s, a, s^\prime)$ is the incentive mechanism for an agent to learn a better action. There
are many forms of reward functions. FinRL provides commonly used ones (Fig 2 shows it)<sup>20</sup>:
    <ul>
        <li>The change of the portfolio value when action $a$ is taken at state $s$ and arriving at new state $s^\prime$, i.e., $r(s, a, s^\prime) = v^\prime − v$, where $v^\prime$ and $v$ represent the portfolio values at state $s^\prime$ and $s$, respectively.</li>
        <li>The portfolio $log$ return when action $a$ is taken at state s and arriving at new state $s^\prime$, i.e., $r(s, a, s^\prime) = log( \frac{v^\prime}{v} )$.
        <li>The Sharpe ratio for periods $t = \{1, ..., T \}$, i.e., $S_T = \frac{mean(R_t)}{std(R_t)}$ , where $R_t = v_t − v_{t−1}$.</li>
        <li>FinRL also supports user defined reward functions to include risk factor or transaction cost term.</li>
    </ul>
<img src = "images\finrl_portfolio.png" alt = "use cases tables" />
<b><i>Standard and User Import Datasets</i></b>

The application of DRL in finance is different from that in other fields, such as playing chess and card
games - the latter inherently have clearly defined rules for environments. Various finance markets require different DRL algorithms to get the most appropriate automated trading agent. Realizing that setting up training environment is a time-consuming and laborious work, FinRL provides six environments based on representative listings, including NASDAQ-100, DJIA, S&P 500, SSE 50, CSI 300, and HSI, plus one user-defined environment. With those efforts, this library frees users
from tedious and time-consuming data pre-processing workload. Also to be more flexible FinRL provides convenient support to user imported data to adjust the granularity of time steps. Users only need to pre-process their data sets
according to our data format instructions.

<b><i>Deep Reinforcement Learning Agents</i></b>

FinRL library includes fine-tuned standard DRL algorithms, namely, DQN, DDPG, Multi-Agent DDPG, PPO, SAC, A2C and TD3. Of course it's possible for users to design their own DRL algorithms by adapting these DRL algorithms, e.g., Adaptive DDPG, or employing ensemble methods. The implementation of the DRL algorithms are based on OpenAI Baselines and Stable Baselines.

Different DRL algorithms comparison:
<img src="https://raw.githubusercontent.com/AI4Finance-Foundation/FinRL/master/docs/source/image/alg_compare.png" alt="DRL Comparison" width = 100% /></p>

<h3>Data and results</h3>
<p><b><i>Training-Validation-Testing Flow</i></b>
Here we can see the data (stock market tradings data) split divided into three phases: 
    <ol>
        <li>Training dataset is the sample of data to fit the DRL model.</li>
        <li>Validation dataset is used for parameter tuning and to avoid overfitting.</li>
        <li>Testing (trading) dataset is the sample of data to provide an unbiased evaluation of a fine-tuned model.</li>
    </ol>
<img src="https://raw.githubusercontent.com/AI4Finance-Foundation/FinRL/master/figs/example_data.PNG" alt="Data split" width = 50% />    
<b><i>Metrics:</i></b> FinRL provides five evaluation metrics to help users evaluate the stock trading performance directly,
which are final portfolio value, annualized return, annualized standard deviation, maximum draw-
down ratio, and Sharpe ratio.
    
<b><i>Baseline Trading Strategies:</i></b>
    <ol>
        <li>passive buy-and-hold trading strategy</li>
        <li>mean-variance strategy</li>
        <li>min-variance strategy</li>
        <li>momentum trading strategy</li>
        <li>equal-weighted strategy</li>
    </ol>

<b><i>Backtesting with Trading Constraints</i></b>

In order to better simulate practical trading, FinRL incorporate trading constraints, risk-aversion and automated backtesting tools.

<i><u>Automated Backtesting.</u></i> It plays a key role in performance evaluation. Automated backtesting tool is preferable because it reduces the human error. FinRL library, uses the Quantopian pyfolio package to backtest trading strategies. This package is easy to use and consists of various individual plots that provide a comprehensive image of the performance of a trading strategy.

<i><u>Incorporating Trading Constraints.</u></i> Transaction costs incur when executing a trade. There are many types of transaction costs, such as broker commissions and SEC fee. FinRL allow users to treat transaction costs as a parameter in provided environments:
    <ul>
        <li>Flat fee: a fixed dollar amount per trade regardless of how many shares traded.</li>
        <li>Per share percentage: a per share rate for every share traded, for example, 1/1000 or 2/1000 are the most commonly used transaction cost rate for each trade.</li>
    </ul>
Market liquidity for stock trading, such as bid-ask spread is considered and can be used as a parameter to the
stock closing price to simulate real world trading experience. 

<i><u>Risk-aversion.</u></i> Risk-aversion reflects whether an investor will choose to preserve the capital. It also
influences one’s trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the financial turbulence index $turbulence_t$ that measures extreme asset price fluctuation:
    $$turbulence_t = (y_t − \mu) \sum^{−1}(y_t − \mu)^\prime \in \mathbb{R},$$ 
where $y_t \in \mathbb{R}_n$ denotes the stock returns for current period $t, \mu \in \mathbb{R}_n$ denotes the average of historical returns, and $\sum \in \mathbb{R}_{n_xn}$ denotes the covariance of historical returns. It is used as a parameter that controls buying or selling action, for example if the turbulence index reaches a pre-defined threshold,
the agent will halt buying action and starts selling the holding shares gradually.

<b><i>Demonstration of Three Use Cases</i></b>
    <ol>
        <li>single stock trading</li>\
        <li>multiple stock trading</li>
        <li>portfolio allocation</li>
    </ol>
FinRL library provides practical and reproducible solutions for each use case, with online walk-through tutorial using Jupyter notebook (e.g., the configurations of the running environment and commands).

Fig. 4 and Table 1 demonstrate the performance evaluation of single stock trading. There are picked large-cap ETFs such as SPDR S&P 500 ETF Trust (SPY) and Invesco QQQ Trust Series 1 (QQQ), and stocks such as Google (GOOGL), Amazon (AMZN), Apple (AAPL), and Microsoft (MSFT). PPO algorithm is used in FinRL and train a trading agent. The maximum drawdown in Table 1 is large
due to Covid-19 market crash.
Fig. 5 and Table 2 show the performance and multiple stock trading and portfolio allocation over the
Dow Jones 30 constitutes. DDPG DRL algorithm is used and TD3 to trade multiple stocks, and allocate portfolio.
<img src = "https://raw.githubusercontent.com/AI4Finance-Foundation/FinRL/master/figs/performance.PNG" alt = "use cases performance" />
<img src = "images\finrl_table1_table2.png" alt = "use cases tables" />
</p>

<h3>Code experiment<sup>21</sup></h3>
<p>My code try (executed in Google Colab) in the notebook (FinRL_Ensemble_StockTrading_ICAIF_2020_my_try.ipynb) starts after this text  - "Let's do some practice with train and test/trade data and try some parameter tuning and see the results." The reason is that I decided to use the already prepared notebook by the developers of FinRL. Below I just copy the code which I test - N.B.! it will be not possible to execute here.</p>

In [None]:
# Create a new dataset with more data than initially provided.
df_train_trade = YahooDownloader(start_date = '2009-01-01',
                     end_date = '2022-02-08',
                     ticker_list = config.DOW_30_TICKER).fetch_data().sort_values(['date','tic'])
# Dataset head
df_train_trade.head()
# Dataset tail
df_train_trade.tail()
# Check the lenght of traded stocks and check their unique values
len(df_train_trade.tic.unique()), df_train_trade.tic.value_counts()

# Stockstats technical indicator column names
# Check https://pypi.org/project/stockstats/ for different names
tech_indicators_train_trade = [
    "macd",
    "boll_ub",
    "boll_lb",
    "rsi_30",
    "cci_30",
    "dx_30",
    "close_30_sma",
    "close_60_sma",
]

# Dataset Feature Engineering
fe = FeatureEngineer(
                    use_technical_indicator=True,
                    tech_indicator_list = tech_indicators,
                    use_turbulence=True,
                    user_defined_feature = False)

processed_train_trade = fe.preprocess_data(df_train_trade)
processed_train_trade = processed_train_trade.copy()
processed_train_trade = processed_train_trade.fillna(0)
processed_train_trade = processed_train_trade.replace(np.inf,0)

# Sample of the FE dataset
processed_train_trade.sample(10)

In [None]:
# Stock dimension and State Space
stock_dimension_train_trade = len(processed_train_trade.tic.unique())
state_space_train_trade = 1 + 2*stock_dimension_train_trade + len(tech_indicators)*stock_dimension_train_trade

print(f"Stock Dimension Train Trade: {stock_dimension_train_trade}, State Space Train Trade: {state_space_train_trade}")

# Kwargs/initial setup/configs for the DRLEnsembleAgent
env_kwargs_train_trade = {
    "hmax": 100, 
    "initial_amount": 10000, 
    "buy_cost_pct": 0.0001, 
    "sell_cost_pct": 0.0001, 
    "state_space": state_space_train_trade, 
    "stock_dim": stock_dimension_train_trade, 
    "tech_indicator_list": tech_indicators_train_trade,
    "action_space": stock_dimension_train_trade, 
    "reward_scaling": 1e-5,
    "print_verbosity":7
}

# Creation of the DRLEnsembleAgent
rebalance_window_train_trade = 42 # rebalance_window is the number of days to retrain the model
validation_window_train_trade = 90 # validation_window is the number of days to do validation and trading (e.g. if validation_window=42, then both validation and trading period will be 42 days)
train_start_train_trade = '2009-01-01'
train_end_train_trade = '2019-01-01'
val_test_start_train_trade = '2019-01-01'
val_test_end_train_trade = '2022-02-08'

ensemble_agent_train_trade = DRLEnsembleAgent(df=processed_train_trade,
                 train_period=(train_start_train_trade,train_end_train_trade),
                 val_test_period=(val_test_start_train_trade,val_test_end_train_trade),
                 rebalance_window=rebalance_window_train_trade, 
                 validation_window=validation_window_train_trade, 
                 **env_kwargs_train_trade)

In [None]:
# Parameters setup of the few DRL models
A2C_model_train_trade_kwargs = {
                    'n_steps': 10,
                    'ent_coef': 0.001,
                    'learning_rate': 0.005
                    }

PPO_model_train_trade_kwargs = {
                    "ent_coef":0.001,
                    "n_steps": 1024,
                    "learning_rate": 0.000025,
                    "batch_size": 42
                    }

DDPG_model_train_trade_kwargs = {
                      #"action_noise":"ornstein_uhlenbeck",
                      "buffer_size": 100_000,
                      "learning_rate": 0.00005,
                      "batch_size": 42
                    }

timesteps_train_trade_dict = {'a2c' : 10_000, 
                 'ppo' : 10_000, 
                 'ddpg' : 10_000
                 }

In [None]:
# Apply the picked ensemble strategy
df_train_trade_summary = ensemble_agent_train_trade.run_ensemble_strategy(A2C_model_train_trade_kwargs,
                                                 PPO_model_train_trade_kwargs,
                                                 DDPG_model_train_trade_kwargs,
                                                 timesteps_train_trade_dict)
# Check the output of the strategy
df_train_trade_summary

In [None]:
unique_trade_date_train_trade = processed_train_trade[(processed_train_trade.date > val_test_start_train_trade)&(processed_train_trade.date <= val_test_end_train)].date.unique()
_trade_date_train_trade = pd.DataFrame({'datadate':unique_trade_date_train_trade})

df_account_value_train_trade=pd.DataFrame()
for i in range(rebalance_window+validation_window, len(unique_trade_date_train_trade)+1,rebalance_window):
    temp = pd.read_csv('results/account_value_trade_{}_{}.csv'.format('ensemble',i))
    df_account_value_train_trade = df_account_value_train_trade.append(temp,ignore_index=True)
# print(df_account_value_train)
sharpe=(252**0.5)*df_account_value_train_trade.account_value.pct_change(1).mean()/df_account_value_train_trade.account_value.pct_change(1).std()
print('Sharpe Ratio: ',sharpe)
df_account_value_train_trade=df_account_value_train_trade.join(df_trade_date_train_trade[validation_window:].reset_index(drop=True))

df_account_value_train_trade.account_value.plot()
plt.title("Portfolio value performance")
plt.show()

In [None]:
print("==============Get Backtest Results===========")
now = datetime.datetime.now().strftime('%Y%m%d-%Hh%M')

perf_stats_all = backtest_stats(account_value=df_account_value_train_trade)
perf_stats_all = pd.DataFrame(perf_stats_all)

In [None]:
#baseline stats
print("==============Get Baseline Stats===========")
baseline_df = get_baseline(
        ticker="^DJI", # ""^DJI" - Dow Jones Indes ""^GCPS" - S&P 500 ""^NDX" - NASDAQ 100
        start = df_account_value_train_trade.loc[0,'date'],
        end = df_account_value_train_trade.loc[len(df_account_value_train_trade)-1,'date'])

stats = backtest_stats(baseline_df, value_col_name = 'close')

In [None]:
print("==============Compare to DJIA===========") 
backtest_plot(df_account_value_train_trade, 
             baseline_ticker = '^DJI', # ""^DJI" - Dow Jones Indes ""^GCPS" - S&P 500 ""^NDX" - NASDAQ 100
             baseline_start = df_account_value_train_trade.loc[0,'date'],
             baseline_end = df_account_value_train_trade.loc[len(df_account_value_train_trade)-1,'date'])

<h3>Conclusions</h3>
<p>FinRL library that is a DRL library designed specifically for automated stock trading with an effort for educational and demonstrative purpose. FinRL is characterized by its extendability, more-than-basic market environment and extensive performance evaluation tools also for quantitative investors and strategy builders. Customization is easily accessible on all
layers, from market simulator, trading agents’ learning algorithms up towards profitable strategies. With FinRL Library, implementation of powerful DRL driven trading strategies is made
an accessible, efficient and delightful experience.

The result of my code try with the picked parameters is that in the end the model will lead to a significant loss to the investor/trader. For better result it should be setup with different parameters and perform a new test of the picked strategy.</p>

<h3>Resources</h3>
<ol>
    <li><a href="https://arxiv.org/pdf/2011.09607v1.pdf">Arxiv Paper Source</a></li>
    <li><a href="https://en.wikipedia.org/wiki/Deep_reinforcement_learning">DRL Wikipedia</a></li>
    <li><a href="https://corporatefinanceinstitute.com/resources/knowledge/finance/quantitative-finance/">Quantitative Finance</a></li>
    <li><a href="https://github.com/AI4Finance-Foundation/FinRL">FinRL Library at GitHub</a></li>
    <li><a href="https://www.nasdaq.com/nasdaq-100">NASDAQ-100 Index</a></li>
    <li><a href="https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average">Dow Jones Industrial Average (DJIA) Wikipedia</a></li>
    <li><a href="https://en.wikipedia.org/wiki/S%26P_500">S&P 500 Wikipedia</a></li>
    <li><a href="https://en.wikipedia.org/wiki/Hang_Seng_Index">Hang Seng Index (HSI) Wikipedia</a></li>
    <li><a href="https://en.wikipedia.org/wiki/SSE_50_Index">Shanghai Stock Exchange Index (SSE 50) Wikipedia</a></li>
    <li><a href="https://en.wikipedia.org/wiki/CSI_300_Index">Capitalization-Weighted Stock Market Index (CSI 300) Wikipedia</a></li>
    <li><a href="https://en.wikipedia.org/wiki/Q-learning#Deep_Q-learning">Q-learning Wikipedia</a></li>
    <li><a href="https://towardsdatascience.com/deep-deterministic-policy-gradients-explained-2d94655a9b7b">Deep Deterministic Policy Gradient (DDPG)</a></li>
    <li><a href="https://paperswithcode.com/method/ppo">Proximal Policy Optimization (PPO) Paperswithcode</a></li>
    <li><a href="https://spinningup.openai.com/en/latest/algorithms/sac.html">Soft Actor-Critic (SAC)</a></li>
    <li><a href="https://paperswithcode.com/method/a2c">Advantage Actor-Critic (A2C) Paperswithcode</a></li>
    <li><a href="https://spinningup.openai.com/en/latest/algorithms/td3.html">Twin Delayed DDPG (TD3)</a></li>
    <li><a href="https://arxiv.org/pdf/1907.01503.pdf">Adaptive DDPG Arxiv</a></li>
    <li><a href="https://paperswithcode.com/method/maddpg">Multi-Agent DDPG Paperswithcode</a></li>
    <li><a href="https://en.wikipedia.org/wiki/Markov_decision_process">Markov Decision Process Wikipedia</a></li>
    <li><a href="https://deliverypdf.ssrn.com/delivery.php?ID=400020126087064097122093093072124010096068026065069063076084017102112030127118097127028021100056061044043005101114105122080105049022017012058023009065124074123064030073087060027102079119127027031000115081068069122108007111007120124081074010065090017123&EXT=pdf&INDEX=TRUE">Portfolio State - Deep Reinforcement Learning for Automated
Stock Trading: An Ensemble Strategy Paper</a></li>
    <li><a href="./code_tests/">Code Experiment</a></li>
</ol>