<a id='0'></a>
# Part 1. Problem Definition

This problem is to design an automated trading solution for single stock trading. We model the stock trading process as a Markov Decision Process (MDP). We then formulate our trading goal as a maximization problem.

The algorithm is trained using Deep Reinforcement Learning (DRL) algorithms and the components of the reinforcement learning environment are:


* Action: 
* Reward function: 
* State: 
* Environment: 

<a id='1'></a>
# Part 2. Getting Started- Load Python Packages

<a id='1.1'></a>
## 2.1. Install all the packages

In [1]:
!pip install -r requirements.txt --user



<a id='1.2'></a>
## 2.2. Import Packages

In [2]:
from config import config
from dataset.download_dataset.cryptodownloader import CryptoDownloader

<a id='1.3'></a>
## 2.3 Create Folders

In [3]:
import os
download_data = False
if not os.path.exists("./" + config.DATA_SAVE_DIR):
    os.makedirs("./" + config.DATA_SAVE_DIR)
    download_data = True
    
# if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
#     os.makedirs("./" + config.TRAINED_MODEL_DIR)
# if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
#     os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
# if not os.path.exists("./" + config.RESULTS_DIR):
#     os.makedirs("./" + config.RESULTS_DIR)

<a id='2'></a>
# Part 3. Download Data

In [4]:
data_downloader = CryptoDownloader(config.START_DATE, config.END_DATE, config.test)
if download_data:    
    data_downloader.download_data()
df = data_downloader.load()

In [5]:
df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,open,close,high,low,volume,ticker,datetime
idx1,idx2,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
ada,2021-05-01 00:01:00,1.35,1.3463,1.35,1.3463,2577.832092,ada,2021-05-01 00:01:00
alg,2021-05-01 00:01:00,1.3942,1.3976,1.3976,1.3942,158.42366,alg,2021-05-01 00:01:00
amp,2021-05-01 00:01:00,1.1148,1.1148,1.1148,1.1148,10.436055,amp,2021-05-01 00:01:00
ato,2021-05-01 00:01:00,22.623,22.636,22.636,22.622,180.148645,ato,2021-05-01 00:01:00
ada,2021-05-01 00:02:00,1.3528,1.3531,1.3531,1.3528,29.3,ada,2021-05-01 00:02:00


In [6]:
df.tail()

Unnamed: 0_level_0,Unnamed: 1_level_0,open,close,high,low,volume,ticker,datetime
idx1,idx2,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
ada,2021-05-01 23:55:00,1.3539,1.3559,1.3559,1.3539,22000.0,ada,2021-05-01 23:55:00
alg,2021-05-01 23:56:00,1.3936,1.3946,1.3946,1.3936,160.0,alg,2021-05-01 23:56:00
amp,2021-05-01 23:56:00,1.1411,1.1442,1.1442,1.1411,4.474008,amp,2021-05-01 23:56:00
bal,2021-05-01 23:58:00,63.517,63.517,63.517,63.517,143.514936,bal,2021-05-01 23:58:00
bal,2021-05-01 23:59:00,63.399,63.399,63.399,63.399,0.188671,bal,2021-05-01 23:59:00


# Part 4: Preprocess Data

<a id='4'></a>
# Part 5. Design Environment
Considering the stochastic and interactive nature of the automated stock trading tasks, a financial task is modeled as a **Markov Decision Process (MDP)** problem. The training process involves observing stock price change, taking an action and reward's calculation to have the agent adjusting its strategy accordingly. By interacting with the environment, the trading agent will derive a trading strategy with the maximized rewards as time proceeds.

Our trading environments, based on OpenAI Gym framework, simulate live stock markets with real market data according to the principle of time-driven simulation.

The action space describes the allowed actions that the agent interacts with the environment. Normally, action a includes three actions: {-1, 0, 1}, where -1, 0, 1 represent selling, holding, and buying one share. Also, an action can be carried upon multiple shares. We use an action space {-k,…,-1, 0, 1, …, k}, where k denotes the number of shares to buy and -k denotes the number of shares to sell. For example, "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or -10, respectively. The continuous action space needs to be normalized to [-1, 1], since the policy is defined on a Gaussian distribution, which needs to be normalized and symmetric.

<a id='5'></a>
# Part 6: Implement DRL Algorithms

<a id='6'></a>
# Part 7: Backtest