<a id='0'></a>
# Part 1. Problem Definition

This problem is to design an automated trading solution for multiple cryptocurrency. We model the crypto trading process as a Markov Decision Process (MDP). We then formulate our trading goal as the maximatization of the value of the portfolio.

The algorithm is trained using Deep Reinforcement Learning (DRL) algorithms and the components of the reinforcement learning environment are:

* Action: Every hour we can rebalance the portfolio, decide which percentage of money have in every crypto considered and in dollars. (to implemet the part in dollars)
* Reward function: The difference of total money respect to the previous hour.(To check if we want to modify this)
* Environment: For every hour and every crypto we consider the following variables: <br>
    at the moment -> just covariance and the techinical indicators for the six cryto <br>
    to implement -> covariance for the day, past 2(?) months data for all the crypto <br>
* State: How much money we have in the portfolio. 

<a id='1'></a>
# Part 2. Getting Started- Load Python Packages

<a id='1.1'></a>
## 2.1. Install all the packages

In [9]:
!pip install -r requirements.txt --user



<a id='1.2'></a>
## 2.2. Import Packages

In [1]:
import pandas as pd
from config import config
from dataset.download_dataset.cryptodownloader_binance import CryptoDownloader_binance
from preprocessing.preprocessors import FeatureEngineer
from preprocessing.preprocessors import add_covariance
from preprocessing.data import data_split
from env.env_portfolio import StockPortfolioEnv
from env.env_portfolio import StockPortfolioEnv2
from model.models import DRLAgent
# from trade.backtest import backtest_stats, backtest_plot, get_daily_return, get_baseline,convert_daily_return_to_pyfolio_ts


<a id='1.3'></a>
## 2.3 Create Folders

In [2]:
import os
download_data = False
if not os.path.exists(config.DATA_SAVE_DIR):
    os.makedirs(config.DATA_SAVE_DIR)
    download_data = True
if not os.path.exists(config.TRAINED_MODEL_DIR):
    os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists(config.TENSORBOARD_LOG_DIR):
    os.makedirs(config.TENSORBOARD_LOG_DIR)
if not os.path.exists(config.RESULTS_DIR):
    os.makedirs(config.RESULTS_DIR)

<a id='2'></a>
# Part 3. Download Data

In [3]:
data_downloader = CryptoDownloader_binance(config.START_DATE, config.END_DATE, config.MULTIPLE_TICKER_8, config.DATA_SAVE_DIR, config.DATA_GRANULARITY)
if download_data:    
    data_downloader.download_data()
df = data_downloader.load()

In [4]:
df

Unnamed: 0,date,open,high,low,close,volume,tic
0,2020-01-01 00:00:00,7195.24,7196.25,7175.46,7177.02,511.814901,btc
1,2020-01-01 01:00:00,7176.47,7230.00,7175.71,7216.27,883.052603,btc
2,2020-01-01 02:00:00,7215.52,7244.87,7211.41,7242.85,655.156809,btc
3,2020-01-01 03:00:00,7242.66,7245.00,7220.00,7225.01,783.724867,btc
4,2020-01-01 04:00:00,7225.00,7230.00,7215.03,7217.27,467.812578,btc
...,...,...,...,...,...,...,...
87547,2021-03-31 19:00:00,193.98,195.25,192.35,193.05,38115.533510,ltc
87548,2021-03-31 20:00:00,193.03,196.39,192.48,195.66,32785.188130,ltc
87549,2021-03-31 21:00:00,195.65,196.90,194.63,195.29,20567.010810,ltc
87550,2021-03-31 22:00:00,195.31,197.00,195.23,195.80,8967.744990,ltc


# Part 4: Preprocess Data

We have added 8 of the most important technical indicators. 

In [None]:
fe = FeatureEngineer(
                    use_technical_indicator=True,
                    use_turbulence=False,
                    user_defined_feature = False)

df = fe.preprocess_data(df)

In [None]:
df

## Add covariance matrix as states

For a given day, we consider the previous six months and we look at the convariance matrix of the close value of the six cryptocurrencies. This value is particular interesting in the "crypto world" where al the altcoins (name for all the cryptocurrencies different from bitcoin) are strictly correlated to the behaviour of the bitcoin. 

In [None]:
# We add a column were for we have the covariance matrix calculated with a lookback period as specify
# The index of the column will be order by hour and by crypto.

df = add_covariance(df, lookback = 24 * 30 * 6 )      

In [None]:
df

<a id='4'></a>
# Part 5. Design Environment

Considering the stochastic and interactive nature of the cryptocurrency trading, a financial task is modeled as a **Markov Decision Process (MDP)** problem. The training process involves observing  the cryptocurrencies price change, taking an action and reward's calculation to have the agent adjusting its strategy accordingly. By interacting with the environment, the trading agent will derive a trading strategy with the maximized rewards as time proceeds.

Our trading environments, based on OpenAI Gym framework, simulate live crypto markets with real data according to the principle of time-driven simulation.

The action space will hve the dimension of the number of stocks plus 1 for saving also the money that we keep in dollars. The values vary from 0 to 1 and the total sum is one. This will represents the distribution of the portfolio in percentage  for the different cryptos and dollars. 

## Training data 

For training data we will use the last six month of 2021.

In [2]:
train = data_split(df, '2020-07-01','2020-12-31')

NameError: name 'df' is not defined

In [2]:
import pickle
#pickle.dump(train, open("train", "wb"))
train = pickle.load(open('train','rb'))

## Environment for Portfolio Allocation


In [3]:
stock_dimension = len(train.tic.unique())
state_space = stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")

env_kwargs = {
    "initial_amount": 100000, 
    "transaction_cost_pct": 0.001, 
    "state_space": state_space, 
    "stock_dim": stock_dimension, 
    "tech_indicator_list": config.TECHNICAL_INDICATORS_LIST, 
    "action_space": stock_dimension, 
    "reward_scaling": 1e-4
    
}

e_train_gym = StockPortfolioEnv(df = train, **env_kwargs)

env_train, _ = e_train_gym.get_sb_env()

Stock Dimension: 8, State Space: 8


<a id='5'></a>
# Part 6: Implement DRL Algorithms

In [13]:
# initialize
agent = DRLAgent(env = env_train)

### Model 1: **A2C**


In [14]:
A2C_PARAMS = {"n_steps": 5, "ent_coef": 0.005, "learning_rate": 0.0002}
model_a2c = agent.get_model(model_name="a2c",model_kwargs = A2C_PARAMS)

{'n_steps': 5, 'ent_coef': 0.005, 'learning_rate': 0.0002}
Using cpu device


In [15]:
trained_a2c = agent.train_model(model=model_a2c, 
                                tb_log_name='a2c',
                                total_timesteps=60000)

Logging to ./tensorboard_log/a2c\a2c_2
------------------------------------
| time/                 |          |
|    fps                | 212      |
|    iterations         | 100      |
|    time_elapsed       | 2        |
|    total_timesteps    | 500      |
| train/                |          |
|    entropy_loss       | -11.3    |
|    explained_variance | 0        |
|    learning_rate      | 0.0002   |
|    n_updates          | 99       |
|    policy_loss        | 4.43e+06 |
|    std                | 0.995    |
|    value_loss         | 1.85e+11 |
------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 308       |
|    iterations         | 200       |
|    time_elapsed       | 3         |
|    total_timesteps    | 1000      |
| train/                |           |
|    entropy_loss       | -11.3     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0002    |
|    n_updates          | 

------------------------------------
| time/                 |          |
|    fps                | 498      |
|    iterations         | 1600     |
|    time_elapsed       | 16       |
|    total_timesteps    | 8000     |
| train/                |          |
|    entropy_loss       | -11.2    |
|    explained_variance | 1.79e-07 |
|    learning_rate      | 0.0002   |
|    n_updates          | 1599     |
|    policy_loss        | 6.24e+06 |
|    std                | 0.985    |
|    value_loss         | 4.69e+11 |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 501      |
|    iterations         | 1700     |
|    time_elapsed       | 16       |
|    total_timesteps    | 8500     |
| train/                |          |
|    entropy_loss       | -11.2    |
|    explained_variance | 5.96e-08 |
|    learning_rate      | 0.0002   |
|    n_updates          | 1699     |
|    policy_loss        | 8.65e+06 |
|

------------------------------------
| time/                 |          |
|    fps                | 517      |
|    iterations         | 3000     |
|    time_elapsed       | 28       |
|    total_timesteps    | 15000    |
| train/                |          |
|    entropy_loss       | -11.2    |
|    explained_variance | 0        |
|    learning_rate      | 0.0002   |
|    n_updates          | 2999     |
|    policy_loss        | 5.52e+06 |
|    std                | 0.979    |
|    value_loss         | 2.55e+11 |
------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 518       |
|    iterations         | 3100      |
|    time_elapsed       | 29        |
|    total_timesteps    | 15500     |
| train/                |           |
|    entropy_loss       | -11.2     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0002    |
|    n_updates          | 3099      |
|    policy_loss        | 6

------------------------------------
| time/                 |          |
|    fps                | 522      |
|    iterations         | 4500     |
|    time_elapsed       | 43       |
|    total_timesteps    | 22500    |
| train/                |          |
|    entropy_loss       | -11.1    |
|    explained_variance | 5.96e-08 |
|    learning_rate      | 0.0002   |
|    n_updates          | 4499     |
|    policy_loss        | 4.21e+06 |
|    std                | 0.972    |
|    value_loss         | 1.75e+11 |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 522      |
|    iterations         | 4600     |
|    time_elapsed       | 44       |
|    total_timesteps    | 23000    |
| train/                |          |
|    entropy_loss       | -11.1    |
|    explained_variance | 0        |
|    learning_rate      | 0.0002   |
|    n_updates          | 4599     |
|    policy_loss        | 4.89e+06 |
|

------------------------------------
| time/                 |          |
|    fps                | 528      |
|    iterations         | 6000     |
|    time_elapsed       | 56       |
|    total_timesteps    | 30000    |
| train/                |          |
|    entropy_loss       | -11      |
|    explained_variance | 0        |
|    learning_rate      | 0.0002   |
|    n_updates          | 5999     |
|    policy_loss        | 8.12e+06 |
|    std                | 0.962    |
|    value_loss         | 7.44e+11 |
------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 528       |
|    iterations         | 6100      |
|    time_elapsed       | 57        |
|    total_timesteps    | 30500     |
| train/                |           |
|    entropy_loss       | -11       |
|    explained_variance | -2.38e-07 |
|    learning_rate      | 0.0002    |
|    n_updates          | 6099      |
|    policy_loss        | 1

------------------------------------
| time/                 |          |
|    fps                | 525      |
|    iterations         | 7400     |
|    time_elapsed       | 70       |
|    total_timesteps    | 37000    |
| train/                |          |
|    entropy_loss       | -11      |
|    explained_variance | 5.96e-08 |
|    learning_rate      | 0.0002   |
|    n_updates          | 7399     |
|    policy_loss        | 5.19e+06 |
|    std                | 0.955    |
|    value_loss         | 2.61e+11 |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 525      |
|    iterations         | 7500     |
|    time_elapsed       | 71       |
|    total_timesteps    | 37500    |
| train/                |          |
|    entropy_loss       | -11      |
|    explained_variance | 1.19e-07 |
|    learning_rate      | 0.0002   |
|    n_updates          | 7499     |
|    policy_loss        | 4.39e+06 |
|

------------------------------------
| time/                 |          |
|    fps                | 524      |
|    iterations         | 8900     |
|    time_elapsed       | 84       |
|    total_timesteps    | 44500    |
| train/                |          |
|    entropy_loss       | -10.9    |
|    explained_variance | 0        |
|    learning_rate      | 0.0002   |
|    n_updates          | 8899     |
|    policy_loss        | 4.27e+06 |
|    std                | 0.94     |
|    value_loss         | 1.75e+11 |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 524      |
|    iterations         | 9000     |
|    time_elapsed       | 85       |
|    total_timesteps    | 45000    |
| train/                |          |
|    entropy_loss       | -10.9    |
|    explained_variance | 0        |
|    learning_rate      | 0.0002   |
|    n_updates          | 8999     |
|    policy_loss        | 5.41e+06 |
|

------------------------------------
| time/                 |          |
|    fps                | 524      |
|    iterations         | 10400    |
|    time_elapsed       | 99       |
|    total_timesteps    | 52000    |
| train/                |          |
|    entropy_loss       | -10.8    |
|    explained_variance | 0        |
|    learning_rate      | 0.0002   |
|    n_updates          | 10399    |
|    policy_loss        | 1.02e+07 |
|    std                | 0.931    |
|    value_loss         | 8.61e+11 |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 524      |
|    iterations         | 10500    |
|    time_elapsed       | 100      |
|    total_timesteps    | 52500    |
| train/                |          |
|    entropy_loss       | -10.8    |
|    explained_variance | 0        |
|    learning_rate      | 0.0002   |
|    n_updates          | 10499    |
|    policy_loss        | 7.75e+06 |
|

------------------------------------
| time/                 |          |
|    fps                | 521      |
|    iterations         | 11800    |
|    time_elapsed       | 113      |
|    total_timesteps    | 59000    |
| train/                |          |
|    entropy_loss       | -10.7    |
|    explained_variance | 1.79e-07 |
|    learning_rate      | 0.0002   |
|    n_updates          | 11799    |
|    policy_loss        | 4.73e+06 |
|    std                | 0.92     |
|    value_loss         | 2.69e+11 |
------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 521       |
|    iterations         | 11900     |
|    time_elapsed       | 114       |
|    total_timesteps    | 59500     |
| train/                |           |
|    entropy_loss       | -10.7     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0002    |
|    n_updates          | 11899     |
|    policy_loss        | 5

### Test new environment

In [3]:
stock_dimension = len(train.tic.unique())
state_space = stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")

env_kwargs = {
    "initial_amount": 100000, 
    "transaction_cost_pct": 0.001, 
    "state_space": state_space, 
    "stock_dim": stock_dimension, 
    "tech_indicator_list": config.TECHNICAL_INDICATORS_LIST, 
    "action_space": stock_dimension, 
    "reward_scaling": 1e-4
    
}

e_train_gym = StockPortfolioEnv2(df = train, **env_kwargs)

env_train, _ = e_train_gym.get_sb_env()
# initialize
agent = DRLAgent(env = env_train)
A2C_PARAMS = {"n_steps": 5, "ent_coef": 0.005, "learning_rate": 0.0002}
model_a2c = agent.get_model(model_name="a2c",model_kwargs = A2C_PARAMS)
trained_a2c = agent.train_model(model=model_a2c, 
                                tb_log_name='a2c',
                                total_timesteps=60000)

Stock Dimension: 8, State Space: 8
{'n_steps': 5, 'ent_coef': 0.005, 'learning_rate': 0.0002}
Using cpu device
Logging to ./tensorboard_log/a2c\a2c_10
------------------------------------
| time/                 |          |
|    fps                | 211      |
|    iterations         | 100      |
|    time_elapsed       | 2        |
|    total_timesteps    | 500      |
| train/                |          |
|    entropy_loss       | -12.8    |
|    explained_variance | 1.37e-06 |
|    learning_rate      | 0.0002   |
|    n_updates          | 99       |
|    policy_loss        | 4.3e+06  |
|    std                | 0.999    |
|    value_loss         | 1.26e+11 |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 300      |
|    iterations         | 200      |
|    time_elapsed       | 3        |
|    total_timesteps    | 1000     |
| train/                |          |
|    entropy_loss       | -12.7    

------------------------------------
| time/                 |          |
|    fps                | 457      |
|    iterations         | 1500     |
|    time_elapsed       | 16       |
|    total_timesteps    | 7500     |
| train/                |          |
|    entropy_loss       | -12.7    |
|    explained_variance | 0        |
|    learning_rate      | 0.0002   |
|    n_updates          | 1499     |
|    policy_loss        | 3.7e+06  |
|    std                | 0.989    |
|    value_loss         | 8.66e+10 |
------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 459      |
|    iterations         | 1600     |
|    time_elapsed       | 17       |
|    total_timesteps    | 8000     |
| train/                |          |
|    entropy_loss       | -12.7    |
|    explained_variance | 2.38e-07 |
|    learning_rate      | 0.0002   |
|    n_updates          | 1599     |
|    policy_loss        | 4.02e+06 |
|

------------------------------------
| time/                 |          |
|    fps                | 469      |
|    iterations         | 2900     |
|    time_elapsed       | 30       |
|    total_timesteps    | 14500    |
| train/                |          |
|    entropy_loss       | -12.6    |
|    explained_variance | 0        |
|    learning_rate      | 0.0002   |
|    n_updates          | 2899     |
|    policy_loss        | 4.65e+06 |
|    std                | 0.979    |
|    value_loss         | 1.45e+11 |
------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 469       |
|    iterations         | 3000      |
|    time_elapsed       | 31        |
|    total_timesteps    | 15000     |
| train/                |           |
|    entropy_loss       | -12.6     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0002    |
|    n_updates          | 2999      |
|    policy_loss        | 3

begin_total_asset:100000
end_total_asset:96813.46812683782
end portfolio distribution:[0.20150247 0.11897066 0.07412862 0.07412862 0.07412862 0.07412862
 0.11686609 0.18325597 0.08289036]
Sharpe:  -0.0150728079231386
-------------------------------------
| time/                 |           |
|    fps                | 473       |
|    iterations         | 4400      |
|    time_elapsed       | 46        |
|    total_timesteps    | 22000     |
| train/                |           |
|    entropy_loss       | -12.5     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0002    |
|    n_updates          | 4399      |
|    policy_loss        | 3.52e+06  |
|    std                | 0.971     |
|    value_loss         | 1.11e+11  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 474       |
|    iterations         | 4500      |
|    time_elapsed       | 47        |
|    total_timesteps   

-------------------------------------
| time/                 |           |
|    fps                | 475       |
|    iterations         | 5800      |
|    time_elapsed       | 60        |
|    total_timesteps    | 29000     |
| train/                |           |
|    entropy_loss       | -12.5     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0002    |
|    n_updates          | 5799      |
|    policy_loss        | 3.28e+06  |
|    std                | 0.967     |
|    value_loss         | 8.47e+10  |
-------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 475      |
|    iterations         | 5900     |
|    time_elapsed       | 61       |
|    total_timesteps    | 29500    |
| train/                |          |
|    entropy_loss       | -12.5    |
|    explained_variance | 0        |
|    learning_rate      | 0.0002   |
|    n_updates          | 5899     |
|    policy_loss       

-------------------------------------
| time/                 |           |
|    fps                | 473       |
|    iterations         | 7200      |
|    time_elapsed       | 75        |
|    total_timesteps    | 36000     |
| train/                |           |
|    entropy_loss       | -12.4     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0002    |
|    n_updates          | 7199      |
|    policy_loss        | 4.93e+06  |
|    std                | 0.957     |
|    value_loss         | 2.01e+11  |
-------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 474      |
|    iterations         | 7300     |
|    time_elapsed       | 76       |
|    total_timesteps    | 36500    |
| train/                |          |
|    entropy_loss       | -12.4    |
|    explained_variance | 0        |
|    learning_rate      | 0.0002   |
|    n_updates          | 7299     |
|    policy_loss       

------------------------------------
| time/                 |          |
|    fps                | 472      |
|    iterations         | 8700     |
|    time_elapsed       | 92       |
|    total_timesteps    | 43500    |
| train/                |          |
|    entropy_loss       | -12.3    |
|    explained_variance | 5.96e-08 |
|    learning_rate      | 0.0002   |
|    n_updates          | 8699     |
|    policy_loss        | 3.04e+06 |
|    std                | 0.944    |
|    value_loss         | 8.31e+10 |
------------------------------------
begin_total_asset:100000
end_total_asset:95424.47168652649
end portfolio distribution:[0.06545775 0.06545775 0.13891335 0.17793262 0.17793262 0.06545775
 0.06545775 0.17793262 0.06545775]
Sharpe:  -0.021495029912909556
-------------------------------------
| time/                 |           |
|    fps                | 471       |
|    iterations         | 8800      |
|    time_elapsed       | 93        |
|    total_timesteps    | 44000     

------------------------------------
| time/                 |          |
|    fps                | 470      |
|    iterations         | 10100    |
|    time_elapsed       | 107      |
|    total_timesteps    | 50500    |
| train/                |          |
|    entropy_loss       | -12.2    |
|    explained_variance | 0        |
|    learning_rate      | 0.0002   |
|    n_updates          | 10099    |
|    policy_loss        | 3.71e+06 |
|    std                | 0.935    |
|    value_loss         | 9.88e+10 |
------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 470       |
|    iterations         | 10200     |
|    time_elapsed       | 108       |
|    total_timesteps    | 51000     |
| train/                |           |
|    entropy_loss       | -12.2     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0002    |
|    n_updates          | 10199     |
|    policy_loss        | 3

-------------------------------------
| time/                 |           |
|    fps                | 469       |
|    iterations         | 11500     |
|    time_elapsed       | 122       |
|    total_timesteps    | 57500     |
| train/                |           |
|    entropy_loss       | -12.1     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0002    |
|    n_updates          | 11499     |
|    policy_loss        | 3.92e+06  |
|    std                | 0.928     |
|    value_loss         | 1.3e+11   |
-------------------------------------
------------------------------------
| time/                 |          |
|    fps                | 469      |
|    iterations         | 11600    |
|    time_elapsed       | 123      |
|    total_timesteps    | 58000    |
| train/                |          |
|    entropy_loss       | -12.1    |
|    explained_variance | 0        |
|    learning_rate      | 0.0002   |
|    n_updates          | 11599    |
|    policy_loss       

### Model 2: **PPO**

In [98]:
# agent = DRLAgent(env = env_train)
# PPO_PARAMS = {
#     "n_steps": 2048,
#     "ent_coef": 0.005,
#     "learning_rate": 0.0001,
#     "batch_size": 128,
# }
# model_ppo = agent.get_model("ppo",model_kwargs = PPO_PARAMS)

In [99]:
# trained_ppo = agent.train_model(model=model_ppo, 
#                              tb_log_name='ppo',
#                              total_timesteps=80000)

## Trading
Assume that we have $1,000,000 initial capital at 2021-01-01.

In [110]:
trade = data_split(df,'2021-01-01', config.END_DATE)
e_trade_gym = StockPortfolioEnv(df = trade, **env_kwargs)

In [111]:
trade.shape

(17280, 16)

In [102]:
trade

Unnamed: 0,date,open,high,low,close,volume,tic,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list
0,2021-01-01 00:00:00,0.181340,0.181460,0.178310,0.180510,1.919492e+07,ada,-0.000153,0.184657,0.176516,50.750459,-27.165052,6.100618,0.180782,0.182191,"[[0.00023491493831790994, 7.732551281244585e-0..."
0,2021-01-01 00:00:00,37.359600,37.442300,36.963600,37.376400,9.511383e+04,bnb,-0.083237,37.946399,36.719461,51.231566,-33.515795,11.715168,37.421230,37.590240,"[[0.00023491493831790994, 7.732551281244585e-0..."
0,2021-01-01 00:00:00,28923.630000,29031.340000,28690.170000,28995.130000,2.311811e+03,btc,133.938215,29333.932277,28372.364723,57.001762,31.346237,5.617701,28857.221667,28217.099500,"[[0.00023491493831790994, 7.732551281244585e-0..."
0,2021-01-01 00:00:00,0.004672,0.004701,0.004601,0.004679,2.768207e+07,doge,0.000018,0.004729,0.004558,53.910140,48.169530,6.372370,0.004639,0.004574,"[[0.00023491493831790994, 7.732551281244585e-0..."
0,2021-01-01 00:00:00,736.420000,739.000000,729.330000,734.070000,2.793270e+04,eth,-0.496311,752.068407,727.317593,50.347614,-92.128309,16.290326,741.657333,735.789500,"[[0.00023491493831790994, 7.732551281244585e-0..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2159,2021-03-31 23:00:00,0.053143,0.053911,0.053000,0.053770,4.448111e+07,doge,-0.000103,0.054076,0.052762,50.006619,-10.311768,28.977404,0.053646,0.053938,"[[0.0001184825404164931, 7.097453732794227e-05..."
2159,2021-03-31 23:00:00,1903.970000,1924.210000,1901.620000,1919.370000,2.122478e+04,eth,26.214167,1948.647647,1762.109353,64.618395,164.610885,23.659415,1851.563000,1833.205500,"[[0.0001184825404164931, 7.097453732794227e-05..."
2159,2021-03-31 23:00:00,28.571700,29.440100,28.550100,29.416700,4.557268e+05,link,0.276832,29.035767,26.359743,64.396467,226.662894,36.167462,27.746403,27.922328,"[[0.0001184825404164931, 7.097453732794227e-05..."
2159,2021-03-31 23:00:00,194.620000,197.630000,194.470000,196.700000,2.863109e+04,ltc,0.332601,198.176985,188.501015,55.327678,76.382622,7.269085,194.139000,194.302667,"[[0.0001184825404164931, 7.097453732794227e-05..."


In [112]:
df_daily_return, df_actions = DRLAgent.DRL_prediction(model=trained_a2c,
                        environment = e_trade_gym)

begin_total_asset:1000000
end_total_asset:3955745.7869483377
Sharpe:  0.8952605536069735
hit end!


In [104]:
df_daily_return

Unnamed: 0,date,daily_return
0,2021-01-01 00:00:00,0.000000
1,2021-01-01 01:00:00,0.022631
2,2021-01-01 02:00:00,0.003002
3,2021-01-01 03:00:00,0.006520
4,2021-01-01 04:00:00,0.001370
...,...,...
2155,2021-03-31 19:00:00,-0.005520
2156,2021-03-31 20:00:00,0.011206
2157,2021-03-31 21:00:00,-0.000456
2158,2021-03-31 22:00:00,0.001301


In [105]:
df_actions

Unnamed: 0_level_0,ada,bnb,btc,doge,eth,link,ltc,xrp
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2021-01-01 00:00:00,0.125000,0.125000,0.125000,0.125000,0.125000,0.125000,0.125000,0.125000
2021-01-01 01:00:00,0.066554,0.076688,0.180913,0.066554,0.066554,0.180913,0.180913,0.180913
2021-01-01 02:00:00,0.077391,0.130851,0.077391,0.077391,0.198717,0.121681,0.175248,0.141330
2021-01-01 03:00:00,0.099662,0.099662,0.201063,0.139609,0.161020,0.099662,0.099662,0.099662
2021-01-01 04:00:00,0.181079,0.104423,0.104423,0.145047,0.104423,0.104423,0.124155,0.132029
...,...,...,...,...,...,...,...,...
2021-03-31 19:00:00,0.162344,0.116857,0.070699,0.069601,0.189195,0.189195,0.077495,0.124615
2021-03-31 20:00:00,0.219924,0.080906,0.080906,0.080906,0.080906,0.219063,0.156485,0.080906
2021-03-31 21:00:00,0.194661,0.194661,0.071612,0.071612,0.071612,0.071612,0.129571,0.194661
2021-03-31 22:00:00,0.066529,0.093833,0.164047,0.180844,0.180844,0.180844,0.066529,0.066529


In [113]:
df_actions.to_csv('df_actions.csv')

<a id='6'></a>
# Part 7: Backtest

<a id='6.1'></a>
## 7.1 BackTestStats
pass in df_account_value, this information is stored in env class


In [107]:
# from pyfolio import timeseries
# DRL_strat = convert_daily_return_to_pyfolio_ts(df_daily_return)
# perf_func = timeseries.perf_stats 
# perf_stats_all = perf_func( returns=DRL_strat, 
#                               factor_returns=DRL_strat, 
#                                 positions=None, transactions=None, turnover_denom="AGB")

In [108]:
# print("==============DRL Strategy Stats===========")
# perf_stats_all

<a id='6.2'></a>
## 7.2 BackTestPlot

In [109]:
# print("==============Compare to IHSG===========")
# %matplotlib inline
# BackTestPlot(df_account_value, 
#              baseline_ticker = '^JKSE', 
#              baseline_start = df_account_value.loc[0,'date'],
#              baseline_end = df_account_value.loc[len(df_account_value)-1,'date'])