DRL Portfolio Optimization

A portfolio optimization framework leveraging Deep Reinforcement Learning (DRL)

This document gives an overview of the project and contains many links to other resources. Developers are directed to the Wiki, which provides much more execution and implementation detail. For a high level discussion of this project, please see report 1 and report 2.

There is YouTube playlist found here that include videos on the following topics.

Starting an AWS SageMaker Instance, link
Data Processing for LSTM, link
LSTM Signal Generation, link
Data Processing for Technical & Fundamental signals, link
Reinforcement Learning on AWS Sagemaker, link

Motivation

This repo was created during an Independent Study of Daniel Fudge with Professor Yelena Larkin as part of a concurrent Master of Business Administration (MBA) and a Diploma in Financial Engineering from the Schulich School of Business. The study was broken into 3 terms as described below.

Term 1 - Initial Investigation (Winter 2019)

The first term was a general investigation into the "Application of Machine Learning to Portfolio Optimization". Here we reviewed the different aspects of machine learning and their possible applications to Portfolio Optimization. During this investigation we highlighted Reinforcement Learning (RL) as an especially promising area to research and proposed the development of a Reinforcement Learning framework to better understand its possible applications.

Please see the 1st term report for the detailed discussion.

Term 2 - Architectural Design (Fall 2019)

Udacity Deep Reinforcement Learning Nanodegree

The 1st term report identified the Udacity "Deep Reinforcement Learning" Nanodegree as a first step toward gaining a better understanding of this topic. Both the syllabus and the site provide a detailed description of the Nanodegree.
This Nanodegree involved the developing DRL networks to solve 3 different Unity ML-Agents environments. The solutions to these environments can be found in the following GitHub repositories:

Architectural Design Report

With the Udacity Nanodegree complete and a greater understanding of DRL obtained, the 2nd term report, found here, was generated to detail the proposed "Deep Reinforcement Learning Architecture for Portfolio Optimization". This report also described the future research required prior to the 3rd term implementation and suggested future work after the 3rd term implementation is complete.

Term 3 - Implementation (Winter 2020)

With the problem statement and architecture defined in the term 2 report, term 3 fully implemented the architecture. Note that the intent of this implementation is not to advance the state-of-the-art in DRL or its application to portfolio optimization. Instead it is to generate a functioning DRL platform for portfolio optimization.
From this conventional implementation, we can experiment with more advanced techniques as highlighted in the 2nd term report. Term 3 summarizes not only the implementation but also the development environment to lower the barrier to entry for researchers new to DRL and the deployment of cloud-based applications.

Term 4 - Enhancement (Summer 2020)

The training process discussed in the future work below was addressed in a 4th term captured in a new repo.

Udacity - Deep Learning Nanodegree

The term 2 report identified the Udacity "Deep Learning" Nanodegree as a good way to gain a better understand of deep learning networks with a special focus on PyTorch, LSTM networks and AWS SageMaker. This was completed in January 2020 and laid the foundation for this project. For more information on this nanodegree, please see this site.

A Cloud Guru - AWS Certified Machine Learning - Specialty 2020

During the implementation of the DRL architecture it became evident that a deeper understanding of AWS Sagemaker was required. To satisfy this need, the "AWS Certified Machine Learning - Specialty 2020" from A Cloud Guru was completed. The content was excellent and is highly recommended.

AWS Sagemaker - Example Notebooks

One of the great aspects of developing on AWS is the large community and resources available. The YouTube videos illustrate how to access the library of Sagemaker examples and they can also be access on GitHub here.

AWS Execution

It is highly recommended that training of the deep learning models be executed on AWS to leverage its computing power. Many other parts may be executed locally to avoid AWS charges or on AWS since the charges for light instances is less than pennies per hour. The cost of a simple notebook instance on SageMaker is extremely cheap and cost really only accumulates when you train the model (don't forget to shut down when finished!!!). If you are new to AWS and SageMaker, I recommend the AWS SageMarker tutorial.
For instructions on how to setup and run on AWS, please see the associated Wiki Page.

Local Execution

Although it is not recommended to train on a local PC, you may want to run locally to debug. If so, please see the instructions on the associated Wiki page.

Data Preparation

The first and one of the most important stage of this project is data preparation. This was actually completed twice in this project. Please see the LSTM data-preparation and the technical and fundamental signals data-preparation-no-memory notebooks for the data preparation process.

LSTM Development

Figure 8 on page 18 of report 2 copied below illustrates how the 1st layer of both the actor and critic networks is a Long Short-Term Memory (LSTM) network that processes the price history into signals that are passed to the Fully Connect (FC) Feed-Forward (FF) neural networks.

Before beginning training of the full Deep Deterministic Policy Gradient (DDPG) architecture above, we experimented with different LSTM architectures, hyper-parameters and possible predictions. For instance, predicting the stock movement 1 day in the future is much easier than 1 month. Also predicting the direction is much easier than the actual price. The signal-processing notebook and associated YouTube video captured this experimentation. Unfortunately this experimentation indicated that the signals from the LSTM were not as strong as expected so we switch to the development of technical and fundamental signals as captured in the data-preparation-no-memory notebook and the associated video.

Deep Reinforcement Learning (DRL)

The two critical inputs to the DRL model are the signals that define the environment and the reward function. The fundamental and technical signals defined the environment and the 1-day log-returns defined the reward function. The drl-portfolio-optimization notebook and associated video capture this training and evaluation process.

Conclusions and Future Work

The intent of this project was to gain a better understanding of how machine learning could be used to perform portfolio optimization. Report 1 began this journey with a broad review of machine learning and its applications. It identified Deep Reinforcement Learning (DRL) as a promising area of research. Report 2 extending this research by performing a deep dive into DRL and proposing a development approach and DRL architecture.
This culminated in the 3rd term development and testing of a DRL process on AWS Sagemaker.
As demonstrated in the training video the DRL algorithm reliably beat the market (buy and hold), which is a good first step but I believe this can be greatly improved with the research into the following areas.

Deep Learning Architectures
Training Process (Continued in 4th term)
Signal Generation
Reward Function

Deep Learning Architectures

The LSTM tested in this project was one attempt to use deep learning to extract signals from the price history, effectively generate a new and hopefully superior technical analysis. Although we were not successful in extracting meaningful signals with a LSTM, a better and/or larger architecture may have been successful.
We also used a Proximal Policy Optimization (PPO) algorithm from the Intel AI Labs RL Coach. Although this is a well respected algorithm, it is quite possible that a different algorithm or even a PPO with different hyper parameters could have performed better.
Considering the intense research in this area from many fields outside finance, I believe this is an area best left to the experts in the field and should not be a focus of finance practitioners.

Training Process

As discussed in the training video, the standard train-test procedures for machine learning are not well suited for portfolio optimization. The fundamental issue is the rules of the game keep evolving for portfolio optimization. An interesting analogy is a game playing DRL model. For instance when a DRL is trained for chess, the rules never change. A model that could beat a grand master in chess would fail to beat a child at checkers because the rules changed. However, there is a great deal of research into models that learn new games; effectively adapting to a new set of rules. In the game playing model the rules change very abruptly and the model is given time to adapt.
In portfolio optimization the rules are constantly in flux and a the true measure of a model is how quickly you adapt.
This is equivalent to a trader who notices a change in the market and rapidly switches strategies. Training a model on 10 years of data and then running it for 2 years, as was proposed in report 2, is simply not realistic. The true test would be to train extensively on all historical data upto and including yesterday, then make your trades. Once the feedback from the environment is received, retrain and make more trades.
This is effectively the RL feedback loop but when testing the algorithm you only get one pass through the "timeline".
At time T you can use a prioritized replay of the price history from t= [0, T-1] to retrain as much as possible. However, the performance of your model can only be assessed at t = T. If assessed before T, there is data leakage because the model knows how the rules have changed. If assessed after T, the model will be unfairly penalized because it is missing information on [T, t], which it would have in real world trading. To go back to the game playing analogy, the model is only evaluated on its first game of checkers after mastering chess and every day it has to play a new game.
I believe this is a very novel application of DRL and one worth further investigation.

Signal Generation

Signal generation is an area where financial experts can truly add value. All deep learning models live and die on the quantity and quality of the input date. Here we can bring the concepts of value investing first outlined by Graham and Dodd in Security Analysis into the model. In our highly connected world we can replicate, with more accuracy each year, an analyst laboriously reviewing financial statements to uncover the true value of a company. A simple example is the p/e ratio used in this model. As shown below there is a massive inversion of the p/e ratio generated by a violent swing in the net income reported by Activision Blizzard, Inc. (ATVI) in the 4th quarter of 2009. An investigation into this swing may uncover a better way to represent the net income and resulting p/e ratio.

Reward Function

The reward function in this report was simply the log return of the portfolio before the next set of trades minus trading costs. This is an other area where the financial expert can add significant value by adding other positive or negative rewards. For instance we may care more about the long term reward function or want to add an extra negative reward for a large negative return.

License

This code is copyright under the MIT License.

Contributions

Please feel free to raise issues against this repo if you have any questions or suggestions for improvement.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.idea		.idea
common		common
docs		docs
ref_data		ref_data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data-preparation-no-memory.ipynb		data-preparation-no-memory.ipynb
data-preparation.ipynb		data-preparation.ipynb
drl-portfolio-optimization.ipynb		drl-portfolio-optimization.ipynb
settings.yml		settings.yml
signal-processing.ipynb		signal-processing.ipynb
sp500-companies.csv		sp500-companies.csv
stock-data.pkl		stock-data.pkl
volume-data.pkl		volume-data.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DRL Portfolio Optimization

Table of Contents

Motivation

Term 1 - Initial Investigation (Winter 2019)

Term 2 - Architectural Design (Fall 2019)

Udacity Deep Reinforcement Learning Nanodegree

Architectural Design Report

Term 3 - Implementation (Winter 2020)

Term 4 - Enhancement (Summer 2020)

Udacity - Deep Learning Nanodegree

A Cloud Guru - AWS Certified Machine Learning - Specialty 2020

AWS Sagemaker - Example Notebooks

AWS Execution

Local Execution

Data Preparation

LSTM Development

Deep Reinforcement Learning (DRL)

Conclusions and Future Work

Deep Learning Architectures

Training Process

Signal Generation

Reward Function

License

Contributions

About

Releases

Packages

Languages

License

daniel-fudge/DRL-Portfolio-Optimization

Folders and files

Latest commit

History

Repository files navigation

DRL Portfolio Optimization

Table of Contents

Motivation

Term 1 - Initial Investigation (Winter 2019)

Term 2 - Architectural Design (Fall 2019)

Udacity Deep Reinforcement Learning Nanodegree

Architectural Design Report

Term 3 - Implementation (Winter 2020)

Term 4 - Enhancement (Summer 2020)

Udacity - Deep Learning Nanodegree

A Cloud Guru - AWS Certified Machine Learning - Specialty 2020

AWS Sagemaker - Example Notebooks

AWS Execution

Local Execution

Data Preparation

LSTM Development

Deep Reinforcement Learning (DRL)

Conclusions and Future Work

Deep Learning Architectures

Training Process

Signal Generation

Reward Function

License

Contributions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages