# Classical RL for Portfolio Management on Foreign Exchange
This tutorial is to demonstrate an example of using classical reinforcement learning to do portfolio management on foreign exchange
## Set up Experinment Environment

In [1]:
from IPython.display import clear_output
import argparse
import sys
import numpy as np
import torch
from torch import nn
import yaml
import os
import pandas as pd
module_path = os.path.abspath(os.path.join('..'))
sys.path.append(module_path)
requirements_path=module_path+"/requirements.txt"
print(requirements_path)
command="pip install -r "+requirements_path
os.system(command)
clear_output(wait=True)
! conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
clear_output(wait=True)

Collecting package metadata (current_repodata.json): \ ^C
failed

CondaError: KeyboardInterrupt



## Download and Preprocess the Data
The dataconfig part in TradeMaster is shared by all other parts, it is worth noticing that for algorithms trading, only the dataset BTC is supported.

The following code will help to download the data in the folder [.data/data/exchange](https://github.com/qinmoelei/TradeMaster_reframe/tree/master/tutorial/data/data/exchange), where 4 files could be found: the whole data and train, valid and test data which will be used when we try to construct the RL environment for the agent

In [None]:
from data.download_data import Dataconfig
parser = argparse.ArgumentParser()

parser.add_argument("--data_path",
                    type=str,
                    default="./data/data/",
                    help="the path for storing the downloaded data")
#where we store the dataset
parser.add_argument(
    "--output_config_path",
    type=str,
    default="./config/output_config/data",
    help="the path for storing the generated config file for data")
# where we store the config file
parser.add_argument(
    "--dataset",
    choices=["exchange","dj30","sz50","crypto"],
    default="exchange",
    help="the name of the dataset",
)
parser.add_argument("--split_proportion",
                    type=list,
                    default=[0.8, 0.1, 0.1],
                    help="the split proportion for train, valid and test")
parser.add_argument(
    "--generate_config",
    type=bool,
    default=False,
    help=
    "determine whether to generate a yaml file to memorize the train valid and test'data's dict"
)
parser.add_argument(
    "--input_config",
    type=bool,
    default=False,
    help=
    "determine whether to use a yaml file as the overall input of the Dataconfig, this is needed when have other format of dataset"
)

parser.add_argument(
    "--input_config_path",
    type=str,
    default="config/input_config/data/custom.yml",
    help=
    "determine the location of a yaml file used to initialize the Dataconfig Class"
)
args = parser.parse_args(args=[])
a = Dataconfig(args)
clear_output(wait=True)
clear_output(wait=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  (temp_indicator.close.rolling(2).sum() - temp_indicator.close)) - 1
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  5) / temp_indicator.adjcp - 1
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  10) / temp_indicator.adjcp - 1
A value is trying to be set on a copy of a slice from a DataFrame.
Try using

The preprocessed data follows the following structure:


In [None]:
data=pd.read_csv("data/data/exchange/exchange.csv",index_col=0)
data.head(5)

Unnamed: 0,date,close,tic,open,high,low,adjcp,zopen,zhigh,zlow,zadjcp,zclose,zd_5,zd_10,zd_15,zd_20,zd_25,zd_30
0,2000-01-03,0.659109,AUSTRALIAN DOLLAR,0.659109,0.659109,0.659109,0.659109,0.0,0.0,0.0,0.0,-0.004397,0.001568,-0.011881,0.00816,0.027933,0.02804,0.032544
1,2000-01-04,0.656211,AUSTRALIAN DOLLAR,0.656211,0.656211,0.656211,0.656211,0.0,0.0,0.0,0.0,-0.004397,0.001568,-0.011881,0.00816,0.027933,0.02804,0.032544
2,2000-01-05,0.655008,AUSTRALIAN DOLLAR,0.655008,0.655008,0.655008,0.655008,0.0,0.0,0.0,0.0,-0.001834,0.001568,-0.011881,0.00816,0.027933,0.02804,0.032544
3,2000-01-06,0.653979,AUSTRALIAN DOLLAR,0.653979,0.653979,0.653979,0.653979,0.0,0.0,0.0,0.0,-0.00157,0.001568,-0.011881,0.00816,0.027933,0.02804,0.032544
4,2000-01-07,0.654793,AUSTRALIAN DOLLAR,0.654793,0.654793,0.654793,0.654793,0.0,0.0,0.0,0.0,0.001244,0.001568,-0.011881,0.00816,0.027933,0.02804,0.032544


the index column are corresponding to date one by one and in the algorithm trading case, there are multiple tickers. 

Besides OHLC, the data also has some normalized features

## RL Enviornment Construction, Agent Training, Model Picking and Testing
For the simplicity, we use the yaml file to store the configuration for the RL environment construction, which can be found [here](https://github.com/qinmoelei/TradeMaster_reframe/tree/master/tutorial/config/input_config/env/AT/DeepScalper)
but first, let us import the package we might use

In [4]:
from agent.ClassicRL.SOTA import trader,env_creator,load_yaml,select_algorithms
from env.PM.portfolio_management import TradingEnv
parser = argparse.ArgumentParser()

parser.add_argument(
    "--env_name",
    choices=["portfolio"],
    default="portfolio",
    help="the name of TradingEnv ",
)
parser.add_argument(
    "--dict_trained_model",
    default="result/PM/SOTA/trained_model/",
    help="the dict of the trained model ",
)

parser.add_argument(
    "--train_env_config_dict",
    default="config/input_config/env/portfolio/portfolio/train.yml",
    help="the dict of the train config of TradingEnv ",
)

parser.add_argument(
    "--valid_env_config_dict",
    default="config/input_config/env/portfolio/portfolio/valid.yml",
    help="the dict of the valid config of TradingEnv ",
)

parser.add_argument(
    "--test_env_config_dict",
    default="config/input_config/env/portfolio/portfolio/test.yml",
    help="the dict of the test config of TradingEnv ",
)

parser.add_argument(
    "--name_of_algorithms",
    choices=["PPO", "A2C", "SAC", "TD3", "PG", "DDPG"],
    type=str,
    default="A2C",
    help="name_of_algorithms ",
)
parser.add_argument(
    "--num_epochs",
    type=int,
    default=10,
    help="the number of training epoch",
)

parser.add_argument(
    "--random_seed",
    type=int,
    default=12345,
    help="the number of training epoch",
)

parser.add_argument(
    "--model_config_dict",
    type=str,
    default="config/input_config/agent/SOTA/A2C.yml",
    help="the dict of the model_config file",
)

parser.add_argument(
    "--result_dict",
    type=str,
    default="result/PM/SOTA/test_result/",
    help="the dict of the result of the test",
)


args = parser.parse_args(args=[])
a = trader(args)
a.train_with_valid()
a.test()

2022-08-20 23:55:00,169	INFO worker.py:973 -- Calling ray.init() again after it has already been called.


<class 'ray.rllib.agents.a3c.a2c.A2CTrainer'>




the profit margin is -2.657352222665521 %
the sharpe ratio is -0.27945782731776925
the Volatility is 0.0037477005240960917
the max drawdown is 0.11981696588375827
the Calmar Ratio is -0.19545547586237488
the Sortino Ratio is -0.40876710691513896
[2m[36m(RolloutWorker pid=3866172)[0m the profit margin is -51.61219528306229 %
[2m[36m(RolloutWorker pid=3866172)[0m the sharpe ratio is -2.634938863151262
[2m[36m(RolloutWorker pid=3866172)[0m the Volatility is 0.004140504909855382
[2m[36m(RolloutWorker pid=3866172)[0m the max drawdown is 0.5182227113652991
[2m[36m(RolloutWorker pid=3866172)[0m the Calmar Ratio is -1.334148759697776
[2m[36m(RolloutWorker pid=3866172)[0m the Sortino Ratio is -3.4944424536508794
the profit margin is -5.159756755010092 %
the sharpe ratio is -0.6094276604532715
the Volatility is 0.003643611694495803
the max drawdown is 0.08546099170063744
the Calmar Ratio is -0.580993566389602
the Sortino Ratio is -0.8519870108015014
[2m[36m(RolloutWorker pid=

2022-08-20 23:57:15,083	INFO trainable.py:589 -- Restored on 172.21.100.188 from checkpoint: /home/sunshuo/ray_results/A2CTrainer_portfolio_2022-08-20_23-55-00ypvuw33r/checkpoint_000009/checkpoint-9
2022-08-20 23:57:15,085	INFO trainable.py:597 -- Current state after restoring: {'_iteration': 9, '_timesteps_total': None, '_time_total': 94.79972052574158, '_episodes_total': 6}


the profit margin is -6.830574814325696 %
the sharpe ratio is -0.7348759446989664
the Volatility is 0.00405463608225716
the max drawdown is 0.14311833704354762
the Calmar Ratio is -0.4655385324381294
the Sortino Ratio is -0.9377936831335998
the profit margin is -14.081375922478955 %
the sharpe ratio is -2.3356888920887653
the Volatility is 0.0028718863556721807
the max drawdown is 0.1821742235412546
the Calmar Ratio is -0.8216937618087252
the Sortino Ratio is -3.342926920783735
