In [1]:
%matplotlib inline

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

<h1>FinRL-Meta: A Universe of Near-Real Market Environments for Data-Driven Deep Reinforcement Learning in Quantitative Finance<sup>1</sup></h1>
<hr/>
<h2>Authorts: Xiao-Yang Liu, ingyang Rui, Jiechao Gao, Liuqing Yang, Hongyang Yang, Zhaoran Wang, Christina Dan Wang, Jian Guo</h2>
<h3>Scientific paper summary</h3>

<h3>Abstract</h3>
<h5>Problem Statement</h5>
<p>Deep reinforcement learning (DRL)<sup>2</sup> has shown huge potentials in building financial market simulators recently. But the accuracy of DRL-based market simulators heavily relies on numerous and diverse DRL agents, which increases demand
for a universe of market environments and imposes a challenge on simulation speed. </p>
<h5>Problem Solution</h5>    
<p>The paper introduce a FinRL-Meta<sup>3</sup> framework that builds a universe of market environments for data-driven financial reinforcement learning:
    <ol>
        <li>separates financial data processing from the design pipeline of DRL-based strategy and provides open-source data engineering tools for financial big data;</li>
        <li>provides hundreds of market environments for various trading tasks;</li>
        <li>enables multiprocessing simulation and training by exploiting thousands of GPU cores.</li>
    </ol>
</p>

<h3>Introduction</h3>
<p>In quantitative finance<sup>4</sup>, market simulators play important roles in studying the complex market phenomena and investigating financial regulations. Compared to traditional simulation models, deep reinforcement learning (DRL) has shown huge potentials in building financial market simulators through multi-agent systems. However, due to the high complexity of real-world markets, raw historical financial data involve significant noise and may not reflect the future of markets. This issue usually degrades the fidelity of DRL-based simulation. The potentials of DRL-based market simulators are not fully explored yet. The already proposed FinRL Library contains a DRL framework, but it focuses on developing trading strategies instead of building market simulations.
    
Here the authors of this paper introduce a brand new FinRL-Meta framework that is a universe of near real-market environments
for data-driven financial reinforcement learning:
    <ol>
        <li>DataOps paradigm is applied and unified and automated data processor for data accessing, data cleaning and feature engineering</li>
        <li>Hundreds of near real-market DRL environments (connected to the data processor of FinRL-Meta framework) for various trading tasks (high-frequency trading, cryptocurrencies trading, stock portfolio allocation, etc.);</li> 
        <li>Utilization of thousands of GPU cores to perform multiprocessing training.</li>
    
<img src="https://raw.githubusercontent.com/AI4Finance-Foundation/FinRL-Meta/master/figs/neofinrl_overview.png" alt = "Overview of FinRL-Meta Architecture" width=100%/>
</p>

<h3>FinRL-Meta Framework Overview</h3>
<ul>
    <li><b>MDP (Markove Descision Process)<sup>5</sup> Model for Trading Tasks</b> - model a trading task as MDP
$(S, A, P, r, \gamma)$, where $S$ and $A$ denote the state space and action space, respectively, $P (s^\prime|s, a)$
denotes the transition probability, $r(s, a)$ is a reward function, and $\gamma \in (0, 1)$ is a discount factor.
Specifically, the state denotes an observation that a DRL agent receives from a market environment; the action space consists of actions that an agent is allowed to take at a state; the reward function $r(s, a, s^\prime)$ is the incentive for agents to learn a better policy. A trading agent aims to learn a policy $\pi(s_t|a_t)$ that maximizes the expected return $R = \sum^\infty_{t=0} \gamma^tr(s_t, a_t)$.</li>
    <li><b>Overview of the framework</b> - FinRL-Meta consists of three layers: data layer, environment layer, and agent layer. Each layer executes its functions and is relatively independent.</li>
    <li><b>DataOps for Data-Driven DRL in Finance</b> - FinRL-Meta follow the DataOps paradigm in the data layer.
        <ol>
            <li>establishment of a standard pipeline for financial data engineering, ensuring data of different
                formats from different sources can be incorporated in a unified RL framework.</li>
            <li>automation of the pipeline with a data processor, which can access and claen data, and extract features from various data sources with high quality and efficiency. Here are some of the data sources - Yahoo!Finance<sup>6</sup>, CCXT<sup>7</sup>, WRDS.TAQ<sup>8</sup>, Alpaca<sup>9</sup>, RiceQuant<sup>10</sup>, JoinQuant<sup>11</sup>, QuantConnect<sup>12</sup>
        </ol>
        <img src="https://raw.githubusercontent.com/AI4Finance-Foundation/FinRL-Meta/master/figs/finrl_meta_dataops.png" alt="FinRL-Meta DataOps" width=75%/>
    </li>                    
    <li><b>Multiprocessing Training</b> - utilization of thousands of GPU cores (CUDA cores) to perform multiprocessing training, which significantly accelerates the training process and improve the performance of DRL trading agents on large datasets.</li>
    <li><b>Plug-and-Play</b> - separation of market environments from the data layer and the agent layer. Any DRL agent can be directly plugged into market environments, then trained and tested.</li>
    <li><b>Training-Testing-Trading Pipeline</b> - the DRL agent first learns from the training environment and is then validated in the validation environment for further adjustment. Then the validated agent is tested on historical datasets. Finally, the tested agent will be deployed in paper trading or live trading markets.
    <img src="https://raw.githubusercontent.com/AI4Finance-Foundation/FinRL-Meta/master/figs/timeline.png" alt="Training-Testing-Trading Pipeline" width=75%/></li>
    <li><b>Supported Trading Tasks</b> - satisfactory trading performance for trading tasks such as stock trading, cryptocurrency trading, and portfolio allocation. Derivatives such as futures and forex are also supported.</li>
        </ul>

<h3>Performance Evaluation</h3>
<p>Here are some results of stock trading and cryptocurrency trading using FinRL-meta framework.</p>
<h5>Experiment Settings</h5>

<p><b><i>Stock trading task</i></b> - selection of the 30 constituent stocks in Dow Jones Industrial Average (DJIA). It's used the Proximal Policy Optimization (PPO) algorithm of ElegantRL, Stable-baselines3 and RLlib. Data used - from 06/01/2021 to 08/14/2021 for training and data from 08/15/2021 to 08/31/2021 for validation (backtesting). Data for retrain the agent - from 06/01/2021 to 08/31/2021 and trading from 09/03/2021 to 09/16/2021. Source - the Alpaca’s database and paper trading APIs.</p>

<p><b><i>Cryptocurrency trading task</i></b> selection of top 10 market cap cryptocurrencies. Again it's used PPO algorithm (of ElegantRL and the Bitcoin (BTC) price as the baseline). Data used - from 06/01/2021 to 08/14/2021 for training and from 08/15/2021 to 08/31/2021 for validation (backtesting). Data for retrain the agent - from 06/01/2021 to 08/31/2021 and trading from 09/01/2021 to 09/15/2021. Source - Binance.</p>

<h5>Trading Performance</h5>

<p><b><i>Stock trading</i></b> - in the backtesting stage, both ElegantRL and Stable-baselines3 agents
outperform DJIA (baseline) in annual return and Sharpe ratio, as shown in Fig. 2 and Table 2. The ElegantRL
agent achieves an annual return of 22.425% and a Sharpe ratio of 1.457. The Stable-baselines3 agent
achieves an annual return of 32.106% and a Sharpe ratio of 1.621. In the paper trading stage, the
results are consistent with the backtesting results.
<img src="images/finrl_meta_stock_trading_graphics.png" alt="stock trading graphics" />
<img src="images/finrl_meta_stock_trading_table.png" alt="stock trading table" /></p>

<p><b><i>Cryptocurrency trading</i></b> - in the backtesting stage, the ElegantRL agent outperforms the benchmark
(BTC price) in most performance metrics, as shown in Fig. 3 and Table 3. It achieves an annual return
of 360.823% and a Sharpe ratio of 2.992. The ElegantRL agent also outperforms the benchmark
(BTC price) in the paper trading stage, which is consistent with the backtesting results.
<img src="images/finrl_meta_cryptocurrency_trading_graphics.png" alt="stock trading graphics" />
<img src="images/finrl_meta_cryptocurrency_trading_table.png" alt="stock trading table" /></p>


<h3>Code experiment<sup>13</sup></h3>
<p>My code try (executred on Google Colab) were unsuccessful, because all the provided files from the authors ends with errors. Also there is no official guide how to test/implement the provided FinRL-Meta solution like for the other library of the same authors - FinRL (you can see more for it on my second scientific paper summary).</p>

<h3>Conclusions</h3>
<p>In this paper, the authors followed the DataOps paradigm and developed a FinRL-Meta framework. FinRLMeta provides open-source data engineering tools and hundreds of market environments with multiprocessing simulation. For future work, authors will build a multi-agent based market simulator that will consists of over ten thousands of agents, namely, a FinRL-Metaverse. First, FinRL-Metaverse aims to build a universe of market environments. To improve the performance for large-scale markets, will be employed GPU-based massive parallel simulation as Isaac Gym. Moreover, it will be interesting to explore the evolutionary perspective to simulate the markets. The authors believe that FinRL-Metaverse will provide insights into complex market phenomena and offer guidance for financial regulations.</p>

<h3>Resources</h3>
<ol>
    <li><a href="https://arxiv.org/pdf/2112.06753v1.pdf">Arxiv Paper Source</a></li>
    <li><a href="https://en.wikipedia.org/wiki/Deep_reinforcement_learning">DRL Wikipedia</a></li>
    <li><a href="https://github.com/AI4Finance-Foundation/FinRL-Meta">FinRL-Meta Framework at GitHub</a></li>
    <li><a href="https://corporatefinanceinstitute.com/resources/knowledge/finance/quantitative-finance/">Quantitative Finance</a></li>
    <li><a href="https://en.wikipedia.org/wiki/Markov_decision_process">Markov Decision Process Wikipedia</a></li>
    <li><a href="https://finance.yahoo.com/">Yahoo!Finance</a></li>
    <li><a href="https://github.com/ccxt/ccxt">CCTX</a></li>
    <li><a href="https://libguides.babson.edu/c.php?g=26412&p=161316">WRDS.TAQ</a></li>
    <li><a href="https://alpaca.markets/">Alpaca</a></li>
    <li><a href="https://rinkeby.etherscan.io/token/0x60bfa41fa438c96efb0df5904f6e23288cb86910">RiceQuant</a></li>
    <li><a href="https://github.com/JoinQuant">JoinQuant</a></li>
    <li><a href="https://www.quantconnect.com/">QuantConnect</a></li>
    <li><a href="./code_tests/">Code Experiment</a></li>
</ol>
    