# Deep Stock Trader (Advanced) - START HERE
*Built for TAMU Datathon 2020 by Seth Hamilton and Josiah Coad.*

If you haven't, go ahead and download the zip containing this notebook and more from our [challenges website](https://tamudatathon.com/challenges).

Includes usage examples. 

---

## Introduction

Predicting stock market performance is a centuries old problem. Ambitious investors have designed and redesigned thousands of trading algorithms, custom indexes, and more to gain an edge in market prediction. Some of the biggest banks in the world hire large numbers of ML experts specifically to improve their investing strategies. Some companies even specialize in this approach (see https://www.twosigma.com/).

Reinforcement Learning opens the dooor to a whole new approach to predicting stock prices... in that we don't have to predict prices! Instead, we model trading stocks like a game and train an agent to maximize the reward function we care about: total money gained!

---

## Problem Description

In this challenge, it's your job to train an agent to make a trading decision (buy, sell, hold) to execute at the opening of the following day. You're given an array of values (i.e. open, close, low, high, SMA_10, RSI_14, etc...). If you want, you can write logic to keep track of your current amount of cash and shares. Fractional shares are allowed. 

In other words. For every row/day in the historical dataset, you'll be given the array:

> $[v1, v2, v3, ...]$.

The array contains daily historical Your job is to return a tuple containing one of 

> $Action.BUY, Action.SELL, Action.HOLD$ 

and a value $frac$ where $frac$ represents the fraction of cash to spend or shares to sell in a trade 

> $0 \leq frac \leq 1$ 

Thus, your agent's step function should return something like

> $(Action.BUY, 0.5)$

which can be interpreted as the decision to invest half (0.5) your current amount of cash into the market. Again, you'll return an $(Action, frac)$ tuple for every row/day in the dataset. 

---

## Submission Requirements/Details

- Load train.csv into a pandas dataframe
- Train a RL agent using our custom gym environment* 
- Save model to disk 
- Edit main.py to use model in step function (see sample main.py for details)
- Zip main.py and your model together and submit on [tamudatathon.com/koth]
- Note your score and try again!

*Custom env provided in this notebook and in the util.py. Feel free to modify the env implementation (such as the reward func) to improve performance

---

## About This Notebook

This notebook does several things
- **Creates** a custom gym environment to make RL agent training easy
- **Validates** and tests the custom gym environment
- **Downloads** sample data (not the stock actually used for challenge) and cleans it for use 
- **Trains** a basic agent to play the trading game 
- **Tests** the agent to see how much money it makes!

You can download this notebook and running it locally on the training dataset so you can train a model for your real submission. 

One last note, this custom gym environment only accepts a basic BUY, SELL, or HOLD action, not a tuple containing both an action and a fraction. You'll have to modify your final implementation to make use of the fraction feature. (Or don't and simply set fraction = 1). 


Good luck!

## Custom Gym Environment

In [1]:
import numpy as np
import gym
from gym import spaces
import pandas as pd

In [2]:
class DeepStockTraderEnv(gym.Env):
  """
  Custom Environment that follows gym interface
  This environment enables agents to make a decision at every timestep in
  a historical stock environment.

  The reward function is defined by how much money the bot made in a particular 
  timestep. (This is 0 in cases where no shares are held)
  """

  metadata={ 'render.modes': ['console'] }

  BUY = 0
  SELL = 1
  HOLD = 2

  def __init__(self, pd_data):
    super(DeepStockTraderEnv, self).__init__()

    self.data = pd_data.values
    self.columns_map = {c.lower(): i for i, c in enumerate(pd_data.columns)}

    self.row_size = len(self.columns_map)

    min_val = np.min(self.data)
    low = np.array([min_val for i in range(self.row_size)])

    max_val = np.max(self.data)
    high = np.array([max_val for i in range(self.row_size)])

    self.observation_space = spaces.Box(low=low, 
                                            high=high, 
                                            shape=(self.row_size,), 
                                            dtype=np.float64)

    self.action_space = spaces.Discrete(3)

    # Variables that track the bot's current state
    self.n_shares = 0 # num of shares currently held
    self.cash = 1000  # starting cash
    self.timestep = 0 # cur index of row/timestep in dataset
    self.n_buys = 0   # num of buys
    self.n_sells = 0  # num of sells
    self.n_holds = 0  # num of holds
    self.account_vals = [] # list tracking the account performance over time

  def reset(self):
    self.n_shares = 0 
    self.cash = 1000
    self.timestep = 1 # + 1 since we return the first observation
    self.n_buys = 0
    self.n_sells = 0
    self.n_holds = 0
    self.account_vals = []

    return np.copy(self.data[0])

  def total(self, timestep=-1, open=True):
    return self.cash + self.n_shares * self.data[timestep, self.columns_map["open" if open else "close"]]

  def step(self, action):

    # ********************** EXECUTE ACTION **********************
    open_j = self.columns_map["open"]
    close_j = self.columns_map["close"]
    if action == self.BUY:
        self.n_shares += self.cash / self.data[self.timestep, open_j]
        self.cash = 0
        self.n_buys += 1
    elif action == self.SELL:
        self.cash += self.n_shares * self.data[self.timestep, open_j]
        self.n_shares = 0
        self.n_sells += 1
    elif action == self.HOLD:
        self.n_holds += 1
    else:
        raise ValueError(f"Illegal Action value: {action}")

    self.account_vals.append(self.total(self.timestep))
    # ************************************************************

    # IMPORTANT 
    # We define reward to be (total account value at close) - (total account value at open)
    # Basically your reward is the amount gained over the course of the day 
    reward = self.total(self.timestep, open=False) - self.total(self.timestep)
    done = self.timestep+1 == len(self.data)-1
    info = {
        "n_buys": self.n_buys,
        "n_sells": self.n_sells,
        "n_holds": self.n_holds,
        "cash": self.cash,
        "n_shares": self.n_shares
    }

    self.timestep += 1

    return np.copy(self.data[self.timestep]), reward, done, info

  def render(self, mode='console'):
    if mode != 'console':
        raise NotImplementedError()
    
    print(f"------------Step {self.timestep}------------")
    print(f'total:   \t{self.total(self.timestep)}')
    print(f'cash:    \t{self.cash}')
    print(f'n_shares:\t{self.n_shares}')
    print(f'n_buys:  \t{self.n_buys}')
    print(f'n_sells:\t{self.n_sells}')
    print(f'n_holds:\t{self.n_holds}')

## Data Collection and Cleaning

In [3]:
%tensorflow_version 1.x
!pip install stable-baselines[mpi]==2.10.0
!pip install yfinance
!pip install pandas-ta

TensorFlow 1.x selected.
Collecting stable-baselines[mpi]==2.10.0
[?25l  Downloading https://files.pythonhosted.org/packages/e5/fe/db8159d4d79109c6c8942abe77c7ba6b6e008c32ae55870a35e73fa10db3/stable_baselines-2.10.0-py3-none-any.whl (248kB)
[K     |████████████████████████████████| 256kB 2.7MB/s 
Installing collected packages: stable-baselines
  Found existing installation: stable-baselines 2.2.1
    Uninstalling stable-baselines-2.2.1:
      Successfully uninstalled stable-baselines-2.2.1
Successfully installed stable-baselines-2.10.0
Collecting yfinance
  Downloading https://files.pythonhosted.org/packages/7a/e8/b9d7104d3a4bf39924799067592d9e59119fcfc900a425a12e80a3123ec8/yfinance-0.1.55.tar.gz
Collecting lxml>=4.5.1
[?25l  Downloading https://files.pythonhosted.org/packages/7e/49/f7c5f4ec1913f37a2ecab69c42f95397416606b35ec3ed9373cc456833de/lxml-4.6.0-cp36-cp36m-manylinux1_x86_64.whl (5.5MB)
[K     |████████████████████████████████| 5.5MB 3.7MB/s 
Building wheels for collected pa

In [4]:
from stable_baselines.common.env_checker import check_env

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



In [None]:
import pandas_ta as pdt
import yfinance as yf
from datetime import datetime, timedelta

# GET STOCK DATA
stonk = yf.Ticker('CANF')
df = stonk.history(start=datetime.now() - timedelta(days=2000), end=datetime.now())
df.ta.strategy("all")

# Clean data
percent_missing = df.isnull().sum() * 100 / len(df)
missing_value_df = pd.DataFrame({'column_name': df.columns,
                                 'percent_missing': percent_missing})
for row in missing_value_df.iterrows():
  if row[1].percent_missing > 0.1:
    df.drop(columns=[row[0]], inplace=True)
df = df.dropna()
df

Unnamed: 0_level_0,open,high,low,close,volume,Dividends,Stock Splits,AD,ABER_ATR_5_15,DMP_14,DMN_14,AMAT_LR_2,AMAT_SR_2,OBV,OBV_min_2,OBV_max_2,AOBV_LR_2,AOBV_SR_2,BOP,CDL_DOJI_10_0.1,CDL_INSIDE,ATR_14,CMO_14,LDECAY_5,DEC_1,HL2,HLC3,HA_open,HA_high,HA_low,HA_close,INC_1,LOGRET_1,MIDPOINT_2,MIDPRICE_2,NATR_14,NVI_1,OHLC4,PDIST,PCTRET_1,PSARaf_0.02_0.2,PSARr_0.02_0.2,PVI_1,PVOL,PVT,RMA_10,RSI_14,SLOPE_1,SQZ_ON,SQZ_OFF,SQZ_NO,THERMO_20_2_0.5,THERMOl_20_2_0.5,THERMOs_20_2_0.5,TRUERANGE_1,TTM_TRND_6,SUPERT_7_3.0,SUPERTd_7_3.0,VWAP,WCP
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1
2015-04-29,41.400002,47.250000,40.650002,43.799999,57600,0,0.0,-9.061821e+04,8.218964,0.000000,0.000000,0,0,145600.0,88000.0,145600.0,0,0,3.636361e-01,0,1,8.216665,100.000000,43.799999,0,43.950001,43.900000,44.193750,47.250000,40.650002,43.275001,1,0.093257,41.850000,44.475000,18.759509,1009.774430,43.275001,12.299999,0.097744,0.02,False,1000.000000,2.522880e+06,5.630072e+05,41.952632,100.000000,3.899998,0,0,1,1.799999,0,0,7.349998,-1,19.407699,1,43.325825,43.8750
2015-04-30,44.849998,45.299999,40.200001,40.950001,22200,0,0.0,-1.062888e+05,7.106813,0.000000,3.286434,0,0,123400.0,123400.0,145600.0,0,0,-7.647057e-01,0,0,7.099907,90.078328,43.599999,1,42.750000,42.150000,43.734375,45.299999,40.200001,42.825000,0,-0.067282,42.375000,43.725000,17.337989,1003.267584,42.825000,7.349998,-0.065068,0.02,True,1000.000000,9.090900e+05,4.185552e+05,41.582657,55.960264,-2.849998,0,0,1,1.950001,0,0,5.099998,-1,21.773627,1,43.170263,41.8500
2015-05-01,41.250000,42.150002,38.250000,39.000000,20300,0,0.0,-1.187811e+05,6.220337,0.000000,13.664932,0,0,103100.0,103100.0,123400.0,0,0,-5.769229e-01,0,0,6.208934,83.408307,40.750001,1,40.200001,39.800001,43.279688,43.279688,38.250000,40.162500,0,-0.048790,39.975000,41.775000,15.920343,998.505677,40.162500,5.850002,-0.047619,0.02,False,1000.000000,7.917000e+05,3.218885e+05,40.831667,42.249989,-1.950001,0,0,1,3.149998,0,0,3.900002,-1,22.103080,1,42.806540,39.6000
2015-05-04,39.750000,40.500000,38.849998,39.750000,7800,0,0.0,-1.180720e+05,5.176003,0.000000,11.870783,0,0,110900.0,103100.0,110900.0,0,0,1.345724e-16,0,1,5.157262,83.644812,39.750000,0,39.674999,39.699999,41.721094,41.721094,38.849998,39.712500,1,0.019048,39.375000,40.200001,12.974244,1000.428754,39.712500,4.050003,0.019231,0.04,False,1000.000000,3.100500e+05,3.368885e+05,40.567530,47.570499,0.750000,0,0,1,1.650002,0,0,1.650002,-1,25.073344,1,42.682849,39.7125
2015-05-05,41.849998,41.849998,35.549999,36.150002,17300,0,0.0,-1.320768e+05,5.397063,0.000000,22.884223,0,0,93600.0,93600.0,110900.0,0,0,-9.047615e-01,0,0,5.384659,71.101633,39.550000,1,38.699999,37.850000,40.716797,41.849998,35.549999,38.849999,0,-0.094933,37.950001,38.699999,14.895321,1000.428754,38.849999,9.000000,-0.090566,0.04,False,990.943400,6.253950e+05,1.802093e+05,39.624740,32.224070,-3.599998,0,0,1,3.299999,0,0,6.299999,-1,25.073344,1,42.290690,37.4250
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-10-12,1.850000,1.850000,1.770000,1.810000,134400,0,0.0,-5.723683e+07,0.114509,28.161195,19.695376,1,0,73255300.0,73120900.0,73255300.0,1,0,-5.000007e-01,0,0,0.111434,-6.242337,1.810000,0,1.810000,1.810000,1.791483,1.850000,1.770000,1.820000,1,0.005540,1.805000,1.805000,6.156564,608.849302,1.820000,0.170000,0.005556,0.04,False,1300.952786,2.432640e+05,1.969373e+09,1.809469,47.234580,0.010000,1,0,0,0.020000,1,0,0.080000,1,2.001567,-1,5.992758,1.8100
2020-10-13,1.860000,1.900000,1.770000,1.880000,352100,0,0.0,-5.699307e+07,0.115542,29.009419,18.073474,1,0,73607400.0,73255300.0,73607400.0,1,0,1.538460e-01,0,0,0.112760,3.465310,1.880000,0,1.835000,1.850000,1.805741,1.900000,1.770000,1.852500,1,0.037945,1.845000,1.835000,5.997870,608.849302,1.852500,0.290000,0.038674,0.04,False,1304.820193,6.619480e+05,1.970735e+09,1.816522,52.529246,0.070000,1,0,0,0.050000,1,1,0.130000,1,2.001567,-1,5.988026,1.8575
2020-10-14,1.950000,2.100000,1.830000,2.040000,1319400,0,0.0,-5.626007e+07,0.125839,36.018808,15.262313,1,0,74926800.0,73607400.0,74926800.0,1,0,3.333332e-01,0,0,0.123991,21.116941,2.040000,0,1.965000,1.990000,1.829121,2.100000,1.829121,1.980000,1,0.081678,1.960000,1.935000,6.078009,608.849302,1.980000,0.520000,0.085106,0.04,False,1313.330829,2.691576e+06,1.981963e+09,1.838870,61.932008,0.160000,0,1,0,0.200000,0,1,0.270000,1,1.608421,1,5.970985,2.0025
2020-10-15,2.110000,2.220000,2.040000,2.150000,1091600,0,0.0,-5.601749e+07,0.129450,39.097474,13.729172,1,0,76018400.0,74926800.0,76018400.0,1,0,2.222233e-01,0,0,0.127992,30.480570,2.150000,0,2.130000,2.136667,1.904560,2.220000,1.904560,2.130000,1,0.052518,2.095000,2.025000,5.953116,614.241465,2.130000,0.390000,0.053922,0.06,False,1313.330829,2.346940e+06,1.987850e+09,1.869983,66.800738,0.110000,0,1,0,0.210000,0,1,0.180000,1,1.747218,1,5.957512,2.1400


## Env Validation and Testing

In [None]:
env = DeepStockTraderEnv(df)
# If the environment don't follow the interface, an error will be thrown
check_env(env, warn=True)

In [None]:
import random
BUY = 0
SELL = 1
HOLD = 2

obs = env.reset()
env.render()

print(env.observation_space)
print(env.action_space)
print(env.action_space.sample())

# Hardcoded best agent: always go left!
n_steps = 20
for step in range(n_steps):
  print("Step {}".format(step + 1))
  obs, reward, done, info = env.step(random.randint(0, 2))
  # print('obs=', obs, 'reward=', reward, 'done=', done)
  env.render()
  if done:
    print("Goal reached!", "reward=", reward)
    break

env.reset();

------------Step 1------------
total:   	1000.0
cash:    	1000
n_shares:	0
n_buys:  	0
n_sells:	0
n_holds:	0
Box(60,)
Discrete(3)
1
Step 1
------------Step 2------------
total:   	1000.0
cash:    	1000
n_shares:	0
n_buys:  	0
n_sells:	0
n_holds:	1
Step 2
------------Step 3------------
total:   	963.6363636363636
cash:    	0
n_shares:	24.242424242424242
n_buys:  	1
n_sells:	0
n_holds:	1
Step 3
------------Step 4------------
total:   	1014.5454175544508
cash:    	0
n_shares:	24.242424242424242
n_buys:  	2
n_sells:	0
n_holds:	1
Step 4
------------Step 5------------
total:   	840.0000369910038
cash:    	0
n_shares:	24.242424242424242
n_buys:  	2
n_sells:	0
n_holds:	2
Step 5
------------Step 6------------
total:   	869.0908720999053
cash:    	0
n_shares:	24.242424242424242
n_buys:  	2
n_sells:	0
n_holds:	3
Step 6
------------Step 7------------
total:   	861.8181633226799
cash:    	0
n_shares:	24.242424242424242
n_buys:  	3
n_sells:	0
n_holds:	3
Step 7
------------Step 8------------
total:  

## Sample Training Loop

*See trainer.py for a pytorch example built by Seth Hamilton*

In [None]:
from stable_baselines import DQN, PPO2, A2C, ACKTR
from stable_baselines.common.cmd_util import make_vec_env

# Instantiate the env
env = DeepStockTraderEnv(df)
# wrap it
env = make_vec_env(lambda: env, n_envs=1)

In [None]:
# Train the agent
model = DQN('MlpPolicy', env, verbose=1).learn(10000)







Instructions for updating:
Use keras.layers.flatten instead.
Instructions for updating:
Please use `layer.__call__` method instead.

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where










In [None]:
# Test the trained agent
obs = env.reset()
timestep = 1
while True:
  action, _ = model.predict(obs, deterministic=True)
  obs, reward, done, info = env.step(action)

  # if env.total(timestep) > 10000:
  #   pdb.set_trace()
  env.render(mode='console')
  if done:
    print("Goal reached!", "reward=", reward)
    break

  timestep += 1
env.render(mode='console')

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
------------Step 666------------
total:   	454.4222928263937
cash:    	0
n_shares:	21.952766960278343
n_buys:  	662
n_sells:	1
n_holds:	2
------------Step 667------------
total:   	493.93725660626274
cash:    	0
n_shares:	21.952766960278343
n_buys:  	663
n_sells:	1
n_holds:	2
------------Step 668------------
total:   	540.0380755971632
cash:    	0
n_shares:	21.952766960278343
n_buys:  	664
n_sells:	1
n_holds:	2
------------Step 669------------
total:   	507.1089251567457
cash:    	0
n_shares:	21.952766960278343
n_buys:  	665
n_sells:	1
n_holds:	2
------------Step 670------------
total:   	516.9876451659229
cash:    	0
n_shares:	21.952766960278343
n_buys:  	666
n_sells:	1
n_holds:	2
------------Step 671------------
total:   	507.1089251567457
cash:    	0
n_shares:	21.952766960278343
n_buys:  	667
n_sells:	1
n_holds:	2
------------Step 672------------
total:   	536.7451689274375
cash:    	0
n_shares:	21.952766960278343
n_bu

In [None]:
big_gain = np.exp(np.log(1908.20/1000)/(2000/365))
big_gain

1.1251589010429977