<a href="https://colab.research.google.com/github/brady-at-claradata/FinRL/blob/master/eRL_demo_StockTrading.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Stock Trading Application in ElegantRL**






In [1]:
import torch

In [2]:
torch.cuda.get_device_properties(0)


_CudaDeviceProperties(name='Tesla P100-PCIE-16GB', major=6, minor=0, total_memory=16280MB, multi_processor_count=56)

# **Part 1: Problem Formulation**
Formally, we model stock trading as a Markov Decision Process (MDP), and formulate the trading objective as maximization of expected return:



*   **State s = [b, p, h]**: a vector that includes the remaining balance b, stock prices p, and stock shares h. p and h are vectors with D dimension, where D denotes the number of stocks. 
*   **Action a**: a vector of actions over D stocks. The allowed actions on each stock include selling, buying, or holding, which result in decreasing, increasing, or no change of the stock shares in h, respectively.
*   **Reward r(s, a, s’)**: The asset value change of taking action a at state s and arriving at new state s’.
*   **Policy π(s)**: The trading strategy at state s, which is a probability distribution of actions at state s.
*   **Q-function Q(s, a)**: the expected return (reward) of taking action a at state s following policy π.
*   **State-transition**: After taking the actions a, the number of shares h is modified, as shown in Fig 3, and the new portfolio is the summation of the balance and the total value of the stocks.

# **Part 2: Stock Trading Environment Design**

**State Space and Action Space**


*   **State Space**: We use a 181-dimensional vector consists of seven parts of information to represent the state space of multiple stocks trading environment: [b, p, h, M, R, C, X], where b is the balance, p is the stock prices, h is the number of shares, M is the Moving Average Convergence Divergence (MACD), R is the Relative Strength Index (RSI), C is the Commodity Channel Index (CCI), and X is the Average Directional Index (ADX).
*   **Action Space**: As a recap, we have three types of actions: selling, buying, and holding for a single stock. We use the negative value for selling, positive value for buying, and zero for holding. In this case, the action space is defined as {-k, …, -1, 0, 1, …, k}, where k is the maximum share to buy or sell in each transaction.


**Easy-to-customize Features**


*   **initial_capital**: the initial capital that the user wants to invest.
*   **tickers**: the stocks that the user wants to trade with.
*   **initial_stocks**: the initial amount of each stock and the default could be zero.
*   **buy_cost_pct, sell_cost_pct**: the transaction fee of each buying or selling transaction.
*   **max_stock**: the user is able to define the maximum number of stocks that are allowed to trade per transaction. Users can also set the maximum percentage of capitals to invest in each stock.
*   **tech_indicator_list**: the list of financial indicators that are taken into account, which is used to define a state.
*   **start_date, start_eval_date, end_eval_date**: the training and backtesting time intervals. Thee time dates (or timestamps) are used, once the training period is specified, the rest is backtesting.


# **Part 3: Install ElegantRL and related packages**

In [4]:
!pip install git+https://github.com/AI4Finance-LLC/ElegantRL.git
!pip install yfinance stockstats

Collecting git+https://github.com/AI4Finance-LLC/ElegantRL.git
  Cloning https://github.com/AI4Finance-LLC/ElegantRL.git to /tmp/pip-req-build-g_nu0u5_
  Running command git clone -q https://github.com/AI4Finance-LLC/ElegantRL.git /tmp/pip-req-build-g_nu0u5_
Collecting pybullet
  Downloading pybullet-3.1.7.tar.gz (79.0 MB)
[K     |████████████████████████████████| 79.0 MB 38 kB/s 
Collecting box2d-py
  Downloading box2d_py-2.3.8-cp37-cp37m-manylinux1_x86_64.whl (448 kB)
[K     |████████████████████████████████| 448 kB 58.8 MB/s 
Building wheels for collected packages: elegantrl, pybullet
  Building wheel for elegantrl (setup.py) ... [?25l[?25hdone
  Created wheel for elegantrl: filename=elegantrl-0.3.1-py3-none-any.whl size=65662 sha256=b64f8bedeb60e0af458173849a4b591ae54e5a4b64a66f1c88b5d0613495153b
  Stored in directory: /tmp/pip-ephem-wheel-cache-e1ru6dyw/wheels/52/9a/b3/08c8a0b5be22a65da0132538c05e7e961b1253c90d6845e0c6



  Building wheel for pybullet (setup.py) ... [?25l[?2

# **Part 4: Import Packages**


*   **ElegantRL**
*   **yfinance**: yfinance aims to solve this problem by offering a reliable, threaded, and Pythonic way to download historical market data from Yahoo! finance.
*   **StockDataFrame**: stockstats inherits and extends pandas.DataFrame to support Stock Statistics and Stock Indicators.



In [106]:
from elegantrl.run import *
from elegantrl.agent import *
from elegantrl.envs.FinRL.StockTrading import StockTradingEnv, check_stock_trading_env
import yfinance as yf
from stockstats import StockDataFrame as Sdf

# **Part 5: Specify Agent and Environment**

*   **args.agent**: firstly chooses one DRL algorithm to use from agent.py. In this application, we prefer to choose DDPG and PPO agent.
*   **args.env**: creates the environment, and the user can either customize own environment or preprocess environments from OpenAI Gym and PyBullet Gym from env.py. In this application, we create the self-designed stock trading environment.


> Before finishing initialization of **args**, please see Arguments() in run.py for more details about adjustable hyper-parameters.




In [109]:
# Agent
args = Arguments(if_on_policy=True)
args.agent = AgentPPO() # AgentSAC(), AgentTD3(), AgentDDPG()
args.agent.if_use_gae = True
args.agent.lambda_entropy = 0.04

# Environment
tickers = [
  'AAPL', 'ADBE', 'ADI', 'ADP', 'ADSK', 'ALGN', 'ALXN', 'AMAT', 'AMD', 'AMGN',
  'AMZN', 'ASML', 'ATVI', 'BIIB', 'BKNG', 'BMRN', 'CDNS', 'CERN', 'CHKP', 'CMCSA',
  'COST', 'CSCO', 'CSX', 'CTAS', 'CTSH', 'CTXS', 'DLTR', 'EA', 'EBAY', 'FAST',
  'FISV', 'GILD', 'HAS', 'HSIC', 'IDXX', 'ILMN', 'INCY', 'INTC', 'INTU', 'ISRG',
  'JBHT', 'KLAC', 'LRCX', 'MAR', 'MCHP', 'MDLZ', 'MNST', 'MSFT', 'MU', 'MXIM',
  'NLOK', 'NTAP', 'NTES', 'NVDA', 'ORLY', 'PAYX', 'PCAR', 'PEP', 'QCOM', 'REGN',
  'ROST', 'SBUX', 'SIRI', 'SNPS', 'SWKS', 'TTWO', 'TXN', 'VRSN', 'VRTX', 'WBA',
  'WDC', 'WLTW', 'XEL', 'XLNX']  # finrl.config.NAS_74_TICKER

tech_indicator_list = [
  'macd', 'boll_ub', 'boll_lb', 'rsi_30', 'cci_30', 'dx_30',
  'close_30_sma', 'close_60_sma']  # finrl.config.TECHNICAL_INDICATORS_LIST

gamma = 0.99
max_stock = 1e2
initial_capital = 1e6
initial_stocks = np.zeros(len(tickers), dtype=np.float32)
buy_cost_pct = 0.
sell_cost_pct = 0.
start_date = '2008-03-19'
start_eval_date = '2021-01-01'
end_eval_date = '2021-07-23'

args.env = StockTradingEnv('./', gamma, max_stock, initial_capital, buy_cost_pct, 
                           sell_cost_pct, start_date, start_eval_date, 
                           end_eval_date, tickers, tech_indicator_list, 
                           initial_stocks, if_eval=False)
args.env_eval = StockTradingEnv('./', gamma, max_stock, initial_capital, buy_cost_pct, 
                           sell_cost_pct, start_date, start_eval_date, 
                           end_eval_date, tickers, tech_indicator_list, 
                           initial_stocks, if_eval=True)

args.env.target_reward = 3
args.env_eval.target_reward = 3

# Hyperparameters
args.gamma = gamma
args.break_step = int(2e5)
args.net_dim = 2 ** 9
args.max_step = args.env.max_step
args.max_memo = args.max_step * 4
args.batch_size = 2 ** 9
args.repeat_times = 2 ** 3
args.eval_gap = 2 ** 4
args.eval_times1 = 2 ** 3
args.eval_times2 = 2 ** 5
args.if_allow_break = False
args.rollout_num = 2 # the number of rollout workers (larger is not always faster)

In [110]:
train_and_evaluate(args) # the training process will terminate once it reaches the target reward.

| GPU id: 0, cwd: ./AgentPPO/StockTradingEnv-v1_0
| Remove history
ID      Step      MaxR |    avgR      stdR       objA      objC |  avgS  stdS
0   5.43e+03      1.68 |
0   5.43e+03      1.68 |    1.68      0.02       0.11     13.95 |   504     0


KeyboardInterrupt: ignored

In [85]:
states = torch.as_tensor((args.env.reset(),), dtype=torch.float32, device=0).detach_()

a = args.agent.act(states)[0].detach().cpu().numpy()


In [111]:
args.env.stocks


array([2.8378e+04, 2.4199e+04, 8.2000e+01, 0.0000e+00, 2.0430e+03,
       3.3770e+03, 0.0000e+00, 0.0000e+00, 3.4400e+03, 0.0000e+00,
       2.5000e+01, 0.0000e+00, 6.0000e+00, 6.7920e+03, 0.0000e+00,
       0.0000e+00, 9.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
       0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
       0.0000e+00, 7.9000e+01, 0.0000e+00, 0.0000e+00, 0.0000e+00,
       0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
       0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
       0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
       0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
       0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
       0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
       0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
       0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
       0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00])

In [10]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [12]:
!cp -R /content/AgentPPO/ /content/drive/MyDrive/trading/

Understanding the above results::
*   **Step**: the total training steps.
*  **MaxR**: the maximum reward.
*   **avgR**: the average of the rewards.
*   **stdR**: the standard deviation of the rewards.
*   **objA**: the objective function value of Actor Network (Policy Network).
*   **objC**: the objective function value (Q-value)  of Critic Network (Value Network).

# **Part 7: Backtest and Draw the Graph**

In [99]:
args = Arguments(if_on_policy=True)
args.agent = AgentPPO()
args.env  = StockTradingEnv('./', gamma, max_stock, initial_capital, buy_cost_pct, 
                           sell_cost_pct, start_date, start_eval_date, 
                           end_eval_date, tickers, tech_indicator_list, 
                           initial_stocks, if_eval=True)

args.if_remove = False
args.cwd = './drive/MyDrive/trading/AgentPPO/StockTradingEnv-v1_0'
args.init_before_training()

args.env.draw_cumulative_return(args, torch)

| GPU id: 0, cwd: ./drive/MyDrive/trading/AgentPPO/StockTradingEnv-v1_0
Loaded act: ./drive/MyDrive/trading/AgentPPO/StockTradingEnv-v1_0


[0.9929761097189177,
 1.0026411781482665,
 1.0071410167680068,
 1.010003536369691,
 1.011890564347327,
 1.0136561607108614,
 1.0119165341100123,
 1.0070209751501238,
 1.0189182925624132,
 1.0181976335784508,
 1.0236214142680304,
 1.0288889979936398,
 1.0145929949956287,
 1.01622519805538,
 1.0216623690418658,
 1.0325092740339217,
 1.0214611593670149,
 1.0118350494427943,
 1.0351606532099928,
 1.0524786547091747,
 1.0328398074125158,
 1.0404783231326808,
 1.0500750964144865,
 1.0448410347643504,
 1.0307317854805165,
 1.029391562709795,
 1.0316871298815955,
 1.0517199947730347,
 1.053625478811574,
 1.0510397113181238,
 1.0536833451263765,
 1.0569073023559163,
 1.0584332633713551,
 1.0522656262053645,
 1.0620798142012704,
 1.0638113402130454,
 1.0652391095789384,
 1.0663944435470765,
 1.0665524783949358,
 1.0793895529451523,
 1.076484688653107,
 1.074588251173291,
 1.0643903624434277,
 1.0507415918791285,
 1.0448476835795963,
 1.0673408301947465,
 1.0706132751257134,
 1.0793548416506922,


(1259, 74)

In [101]:
# Agent
args = Arguments(if_on_policy=True)
args.agent = AgentSAC() # AgentSAC(), AgentTD3(), AgentDDPG()

# Environment
tickers = [
  'AAPL', 'ADBE', 'ADI', 'ADP', 'ADSK', 'ALGN', 'ALXN', 'AMAT', 'AMD', 'AMGN',
  'AMZN', 'ASML', 'ATVI', 'BIIB', 'BKNG', 'BMRN', 'CDNS', 'CERN', 'CHKP', 'CMCSA',
  'COST', 'CSCO', 'CSX', 'CTAS', 'CTSH', 'CTXS', 'DLTR', 'EA', 'EBAY', 'FAST',
  'FISV', 'GILD', 'HAS', 'HSIC', 'IDXX', 'ILMN', 'INCY', 'INTC', 'INTU', 'ISRG',
  'JBHT', 'KLAC', 'LRCX', 'MAR', 'MCHP', 'MDLZ', 'MNST', 'MSFT', 'MU', 'MXIM',
  'NLOK', 'NTAP', 'NTES', 'NVDA', 'ORLY', 'PAYX', 'PCAR', 'PEP', 'QCOM', 'REGN',
  'ROST', 'SBUX', 'SIRI', 'SNPS', 'SWKS', 'TTWO', 'TXN', 'VRSN', 'VRTX', 'WBA',
  'WDC', 'WLTW', 'XEL', 'XLNX']  # finrl.config.NAS_74_TICKER

tech_indicator_list = [
  'macd', 'boll_ub', 'boll_lb', 'rsi_30', 'cci_30', 'dx_30',
  'close_30_sma', 'close_60_sma']  # finrl.config.TECHNICAL_INDICATORS_LIST

gamma = 0.99
max_stock = 1e2
initial_capital = 1e6
initial_stocks = np.zeros(len(tickers), dtype=np.float32)
buy_cost_pct = 1e-3
sell_cost_pct = 1e-3
start_date = '2019-01-01'
start_eval_date = '2021-01-01'
end_eval_date = '2021-07-23'

args.env = StockTradingEnv('./validation', gamma, max_stock, initial_capital, buy_cost_pct, 
                           sell_cost_pct, start_date, start_eval_date, 
                           end_eval_date, tickers, tech_indicator_list, 
                           initial_stocks, if_eval=True)

args.if_remove = False
args.cwd = './AgentPPO/StockTradingEnv-v1_0'
args.init_before_training()

args.env.draw_cumulative_return(args, torch)


NameError: ignored

In [47]:
args.env

AttributeError: ignored

In [100]:
!rm -rf /content/drive/MyDrive/trading/AgentPPO/ 
!cp -R ./AgentPPO /content/drive/MyDrive/trading/

In [51]:
!lscpu

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               79
Model name:          Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping:            0
CPU MHz:             2199.998
BogoMIPS:            4399.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            56320K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_sin