<a href="https://colab.research.google.com/github/PSoysauce/Stock-Prediction-TAMU-Datathon-2020/blob/main/deep_stock_trader_custom_environment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Stock Trader (Advanced)
*Built for TAMU Datathon 2020 with care by Seth Hamilton and Josiah Coad.*

Includes usage examples. 

---

Predicting stock market performance is a centuries old problem. Ambitious investors have designed and redesigned thousands of trading algorithms, custom indexes, and more to gain an edge in market prediction. Some of the biggest banks in the world hire large numbers of ML experts specifically to improve their investing strategies. Some companies even specialize in this approach (see https://www.twosigma.com/).

Reinforcement Learning opens the dooor to a whole new approach to predicting stock prices... in that we don't have to predict prices! Instead, we model trading stocks like a game and train an agent to maximize the reward function we care about: total money gained!

In this challenge, it's your job to train an agent to make a trading decision (buy, sell, hold) to execute at the opening of the following day. You're given an array of values (i.e. open, close, low, high, SMA_10, RSI_14, etc...), your current amount of cash, and the number of shares your currently hold.

In other words. For every row/day in the historical dataset, you'll be given the tuple:

> $([v1, v2, v3, ...], cash, n\_shares)$.

Return a tuple containing one of 

> $(Action.BUY, Action.SELL, Action.HOLD)$ 

and a value $frac$ where $frac$ represents the fraction of cash to spend or shares to sell in a trade 

> $0 \leq frac \leq 1$ 

Thus, your agents step function should return something like...

> $(Action.BUY, 0.5)$

...which can be interpreted as the decision to invest half (0.5) your current amount of cash into the market. Again, you'll return an $(Action, frac)$ tuple for every row/day in the dataset. 

This notebook does 3 things
1. **Creates** a custom gym environment to make RL agent training easy
2. ***Validates*** and tests the custom gym environment
3. ***Downloads*** sample data (not the stock actually used for challenge) and cleans it for use 
4. ***Trains*** a basic agent to play the trading game 
5. ***Tests*** the agent to see how much money it makes!

We recommend downloading this notebook and running it locally on the training dataset so you can train a model for your real submission. 

One last note, this custom gym environment only accepts a basic BUY, SELL, or HOLD action, not a tuple containing both an action and a fraction. You'll have to modify your final implementation to make use of the fraction feature. (Or don't and simply set fraction = 1). 


Good luck!

## Custom Gym Environment

In [None]:
import numpy as np
import gym
from gym import spaces
import pandas as pd

In [None]:
class DeepStockTraderEnv(gym.Env):
  """
  Custom Environment that follows gym interface
  This environment enables agents to make a decision at every timestep in
  a historical stock environment.

  The reward function is defined by how much money the bot made in a particular 
  timestep. (This is 0 in cases where no shares are held)
  """

  metadata={ 'render.modes': ['console'] }

  BUY = 0
  SELL = 1
  HOLD = 2

  def __init__(self, pd_data):
    super(DeepStockTraderEnv, self).__init__()

    self.data = pd_data.values
    self.columns_map = {c.lower(): i for i, c in enumerate(pd_data.columns)}

    self.row_size = len(self.columns_map)

    min_val = np.min(self.data)
    low = np.array([min_val for i in range(self.row_size)])

    max_val = np.max(self.data)
    high = np.array([max_val for i in range(self.row_size)])

    self.observation_space = spaces.Box(low=low, 
                                            high=high, 
                                            shape=(self.row_size,), 
                                            dtype=np.float64)

    self.action_space = spaces.Discrete(3)

    self.n_shares = 0 # num of shares currently held
    self.cash = 1000  # starting cash
    self.timestep = 0 # cur index of row/timestep in dataset
    self.n_buys = 0   # num of buys
    self.n_sells = 0  # num of sells
    self.n_holds = 0  # num of holds
    self.account_vals = [] # list tracking the account performance over time

  def reset(self):
    self.n_shares = 0 
    self.cash = 1000
    self.timestep = 1 # + 1 since we return the first observation
    self.n_buys = 0
    self.n_sells = 0
    self.n_holds = 0
    self.account_vals = []

    return np.copy(self.data[0])

  def total(self, timestep=-1):
    return self.cash + self.n_shares * self.data[timestep, self.columns_map["open"]]

  def step(self, action):
    # if frac > 1 or frac < 0:
    #     raise ValueError("frac needs to be between 0 and 1")

    # ********************** EXECUTE ACTION **********************
    open_j = self.columns_map["open"]
    close_j = self.columns_map["close"]
    if action == self.BUY:
        self.n_shares += self.cash / self.data[self.timestep, open_j]
        self.cash = 0 #-= self.cash
        self.n_buys += 1
    elif action == self.SELL:
        self.cash += self.n_shares * self.data[self.timestep, open_j]
        self.n_shares = 0 # -= int(self.n_shares)
        self.n_sells += 1
    elif action == self.HOLD:
        self.n_holds += 1
    else:
        raise ValueError(f"Illegal Action value: {action}")

    self.account_vals.append(self.total(self.timestep))
    # ************************************************************

    reward = self.total(self.timestep+1) - self.total(self.timestep)
    done = self.timestep+1 == len(self.data)-1
    info = {
        "n_buys": self.n_buys,
        "n_sells": self.n_sells,
        "n_holds": self.n_holds,
        "cash": self.cash,
        "n_shares": self.n_shares
    }

    self.timestep += 1

    return np.copy(self.data[self.timestep]), reward, done, info

  def render(self, mode='console'):
    if mode != 'console':
        raise NotImplementedError()
    
    print(f"------------Step {self.timestep}------------")
    print(f'total:   \t{self.total(self.timestep)}')
    print(f'cash:    \t{self.cash}')
    print(f'n_shares:\t{self.n_shares}')
    print(f'n_buys:  \t{self.n_buys}')
    print(f'n_sells:\t{self.n_sells}')
    print(f'n_holds:\t{self.n_holds}')

## Data Collection and Cleaning

In [None]:
%tensorflow_version 1.x
!pip install stable-baselines[mpi]==2.10.0

TensorFlow 1.x selected.
Collecting stable-baselines[mpi]==2.10.0
[?25l  Downloading https://files.pythonhosted.org/packages/e5/fe/db8159d4d79109c6c8942abe77c7ba6b6e008c32ae55870a35e73fa10db3/stable_baselines-2.10.0-py3-none-any.whl (248kB)
[K     |████████████████████████████████| 256kB 2.8MB/s 
Installing collected packages: stable-baselines
  Found existing installation: stable-baselines 2.2.1
    Uninstalling stable-baselines-2.2.1:
      Successfully uninstalled stable-baselines-2.2.1
Successfully installed stable-baselines-2.10.0


In [None]:
from stable_baselines.common.env_checker import check_env

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



In [None]:
!pip install yfinance;
!pip install pandas-ta;

Collecting yfinance
  Downloading https://files.pythonhosted.org/packages/7a/e8/b9d7104d3a4bf39924799067592d9e59119fcfc900a425a12e80a3123ec8/yfinance-0.1.55.tar.gz
Collecting lxml>=4.5.1
[?25l  Downloading https://files.pythonhosted.org/packages/79/37/d420b7fdc9a550bd29b8cfeacff3b38502d9600b09d7dfae9a69e623b891/lxml-4.5.2-cp36-cp36m-manylinux1_x86_64.whl (5.5MB)
[K     |████████████████████████████████| 5.5MB 3.9MB/s 
Building wheels for collected packages: yfinance
  Building wheel for yfinance (setup.py) ... [?25l[?25hdone
  Created wheel for yfinance: filename=yfinance-0.1.55-py2.py3-none-any.whl size=22618 sha256=44345cb4c69b2375dc28316cadf0fa588607ca96226886f1e6e3043738246734
  Stored in directory: /root/.cache/pip/wheels/04/98/cc/2702a4242d60bdc14f48b4557c427ded1fe92aedf257d4565c
Successfully built yfinance
Installing collected packages: lxml, yfinance
  Found existing installation: lxml 4.2.6
    Uninstalling lxml-4.2.6:
      Successfully uninstalled lxml-4.2.6
Successfully

In [None]:
import pandas_ta as pdt
import yfinance as yf
from datetime import datetime, timedelta

# GET STOCK DATA
stonk = yf.Ticker('CANF')
df = stonk.history(start=datetime.now() - timedelta(days=2000), end=datetime.now())
df.ta.strategy("all")
print(len(df))

# Clean data
percent_missing = df.isnull().sum() * 100 / len(df)
missing_value_df = pd.DataFrame({'column_name': df.columns,
                                 'percent_missing': percent_missing})
for row in missing_value_df.iterrows():
  if row[1].percent_missing > 0.1:
    df.drop(columns=[row[0]], inplace=True)
df = df.dropna()
df

1379


Unnamed: 0_level_0,open,high,low,close,volume,Dividends,Stock Splits,AD,ABER_ATR_5_15,OBV,OBV_min_2,OBV_max_2,AOBV_LR_2,AOBV_SR_2,ADX_14,DMP_14,DMN_14,AMAT_LR_2,AMAT_SR_2,BOP,ATR_14,CDL_DOJI_10_0.1,CDL_INSIDE,CMO_14,LDECAY_5,DEC_1,HL2,HLC3,HA_open,HA_high,HA_low,HA_close,INC_1,LOGRET_1,MIDPOINT_2,MIDPRICE_2,NATR_14,NVI_1,OHLC4,PDIST,PCTRET_1,PSARaf_0.02_0.2,PSARr_0.02_0.2,PVOL,PVI_1,PVT,RMA_10,RSI_14,SLOPE_1,SQZ_ON,SQZ_OFF,SQZ_NO,THERMO_20_2_0.5,THERMOl_20_2_0.5,THERMOs_20_2_0.5,TRUERANGE_1,TTM_TRND_6,SUPERT_7_3.0,SUPERTd_7_3.0,VWAP,WCP
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1
2015-04-27,41.549999,53.549999,39.150002,49.349998,627800,0,0.0,2.636499e+05,11.296552,630900.0,3100.0,630900.0,0,0,100.000000,184.151103,0.000000,0,0,5.416667e-01,11.322223,0,0,100.000000,49.349998,0,46.350000,47.350000,32.531250,53.549999,32.531250,45.900000,1,0.416160,40.949999,42.674999,22.942702,1000.000000,45.900000,29.999996,0.516129,0.02,False,3.098193e+07,1051.612902,3.240258e+07,41.392104,100.000000,16.799999,0,0,1,20.849998,0,0,21.000000,-1,11.180767,1,47.276296,47.849999
2015-04-28,48.299999,49.049999,39.900002,39.900002,88000,0,0.0,1.756499e+05,10.638113,542900.0,542900.0,630900.0,0,0,100.000000,94.249732,0.000000,0,0,-9.180328e-01,10.651370,0,-1,92.274680,49.149998,1,44.475000,42.950001,39.215625,49.049999,39.215625,44.287500,0,-0.212561,44.625000,46.350000,26.695163,980.851069,44.287500,10.949997,-0.191489,0.02,False,3.511200e+06,1051.612902,3.071747e+07,40.841513,62.275456,-9.449997,0,0,1,4.500000,0,0,9.449997,-1,11.936814,1,46.746717,42.187501
2015-04-29,41.400002,47.250000,40.650002,43.799999,57600,0,0.0,1.730317e+05,9.729162,600500.0,542900.0,600500.0,0,0,100.000000,66.190677,0.000000,0,0,3.636361e-01,9.732145,0,1,92.404411,43.799999,0,43.950001,43.900000,41.751563,47.250000,40.650002,43.275001,1,0.093257,41.850000,44.475000,22.219510,990.625499,43.275001,12.299999,0.097744,0.04,False,2.522880e+06,1051.612902,3.128048e+07,41.701788,67.692311,3.899998,0,0,1,1.799999,0,0,7.349998,-1,14.667425,1,46.535550,43.875000
2015-04-30,44.849998,45.299999,40.200001,40.950001,22200,0,0.0,1.573611e+05,8.671386,578300.0,578300.0,600500.0,0,0,98.538271,53.651499,1.446249,0,0,-7.647057e-01,8.663584,0,0,89.907444,43.599999,1,42.750000,42.150000,42.513282,45.299999,40.200001,42.825000,0,-0.067282,42.375000,43.725000,21.156493,984.118653,42.825000,7.349998,-0.065068,0.04,False,9.090900e+05,1051.612902,3.113603e+07,41.518206,60.819548,-2.849998,0,0,1,1.950001,0,0,5.099998,-1,17.184861,1,46.413653,41.850000
2015-05-01,41.250000,42.150002,38.250000,39.000000,20300,0,0.0,1.448688e+05,7.732980,558000.0,558000.0,578300.0,0,0,92.761835,46.345869,7.079450,0,0,-5.769229e-01,7.715664,0,0,88.117741,40.750001,1,40.200001,39.800001,42.669141,42.669141,38.250000,40.162500,0,-0.048790,39.975000,41.775000,19.783754,979.356747,40.162500,5.850002,-0.047619,0.04,False,7.917000e+05,1051.612902,3.103936e+07,40.980770,56.586259,-1.950001,0,0,1,3.149998,0,0,3.900002,-1,17.917317,1,46.249725,39.600000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-10-07,1.830000,1.840000,1.760000,1.790000,319300,0,0.0,-5.695098e+07,0.126782,73539500.0,73539500.0,73858800.0,1,0,11.725747,27.511571,22.117838,1,0,-5.000007e-01,0.123935,0,0,-9.724037,1.790000,1,1.800000,1.796667,1.811862,1.840000,1.760000,1.805000,0,-0.027550,1.815000,1.855000,6.923722,590.262134,1.805000,0.130000,-0.027174,0.04,False,5.715470e+05,1350.870775,2.000097e+09,1.814951,45.328925,-0.050000,1,0,0,0.110000,1,1,0.080000,1,2.001567,-1,6.082077,1.795000
2020-10-08,1.770000,1.790000,1.760000,1.770000,97200,0,0.0,-5.698338e+07,0.120330,73442300.0,73442300.0,73539500.0,1,0,11.664481,27.008664,21.713527,1,0,7.401494e-15,0.117225,100,0,-11.946096,1.770000,1,1.775000,1.773333,1.808431,1.808431,1.760000,1.772500,0,-0.011236,1.780000,1.800000,6.622881,589.144816,1.772500,0.080000,-0.011173,0.04,False,1.720440e+05,1350.870775,1.999989e+09,1.810456,44.103749,-0.020000,1,0,0,0.050000,1,1,0.030000,-1,2.001567,-1,6.080719,1.772500
2020-10-09,1.780000,1.830000,1.760000,1.800000,143100,0,0.0,-5.696294e+07,0.116974,73585400.0,73442300.0,73585400.0,1,0,11.933043,28.332064,20.759939,1,0,2.857138e-01,0.113852,0,0,-7.686209,1.800000,0,1.795000,1.796667,1.790465,1.830000,1.760000,1.792500,1,0.016807,1.785000,1.795000,6.325099,589.144816,1.792500,0.130000,0.016949,0.04,False,2.575800e+05,1352.565689,2.000231e+09,1.809410,46.442166,0.030000,1,0,0,0.040000,1,1,0.070000,1,2.001567,-1,6.078731,1.797500
2020-10-12,1.850000,1.850000,1.770000,1.810000,134400,0,0.0,-5.696294e+07,0.114509,73719800.0,73585400.0,73719800.0,1,0,12.344253,28.161195,19.695376,1,0,-5.000007e-01,0.111434,0,0,-6.242337,1.810000,0,1.810000,1.810000,1.791483,1.850000,1.770000,1.820000,1,0.005540,1.805000,1.805000,6.156564,589.700371,1.820000,0.170000,0.005556,0.04,False,2.432640e+05,1352.565689,2.000306e+09,1.809469,47.234580,0.010000,1,0,0,0.020000,1,0,0.080000,1,2.001567,-1,6.076871,1.810000


## Env Validation and Testing

In [None]:
env = DeepStockTraderEnv(df)
# If the environment don't follow the interface, an error will be thrown
check_env(env, warn=True)

In [None]:
import random
BUY = 0
SELL = 1
HOLD = 2

obs = env.reset()
env.render()

print(env.observation_space)
print(env.action_space)
print(env.action_space.sample())

# Hardcoded best agent: always go left!
n_steps = 20
for step in range(n_steps):
  print("Step {}".format(step + 1))
  obs, reward, done, info = env.step(random.randint(0, 2))
  # print('obs=', obs, 'reward=', reward, 'done=', done)
  env.render()
  if done:
    print("Goal reached!", "reward=", reward)
    break

env.reset();

------------Step 1------------
total:   	1000.0
cash:    	1000
n_shares:	0
n_buys:  	0
n_sells:	0
n_holds:	0
Box(61,)
Discrete(3)
0
Step 1
------------Step 2------------
total:   	857.1429022738519
cash:    	0
n_shares:	20.703934074448203
n_buys:  	1
n_sells:	0
n_holds:	0
Step 2
------------Step 3------------
total:   	928.5714116473056
cash:    	0
n_shares:	20.703934074448203
n_buys:  	1
n_sells:	0
n_holds:	1
Step 3
------------Step 4------------
total:   	928.5714116473056
cash:    	928.5714116473056
n_shares:	0
n_buys:  	1
n_sells:	1
n_holds:	1
Step 4
------------Step 5------------
total:   	928.5714116473056
cash:    	928.5714116473056
n_shares:	0
n_buys:  	1
n_sells:	2
n_holds:	1
Step 5
------------Step 6------------
total:   	928.5714116473056
cash:    	928.5714116473056
n_shares:	0
n_buys:  	1
n_sells:	3
n_holds:	1
Step 6
------------Step 7------------
total:   	928.5714116473056
cash:    	928.5714116473056
n_shares:	0
n_buys:  	1
n_sells:	4
n_holds:	1
Step 7
------------Step 8-

## Sample Training Loop

In [None]:
from stable_baselines import DQN, PPO2, A2C, ACKTR
from stable_baselines.common.cmd_util import make_vec_env

# Instantiate the env
env = DeepStockTraderEnv(df)
# # wrap it
# env = make_vec_env(lambda: env, n_envs=1)

In [None]:
# Train the agent
model = ACKTR('MlpLstmPolicy', env, verbose=1).learn(10000)

Wrapping the env in a DummyVecEnv.
---------------------------------
| explained_variance | -0.00079 |
| fps                | 14       |
| nupdates           | 1        |
| policy_entropy     | 1.1      |
| policy_loss        | -91      |
| total_timesteps    | 20       |
| value_loss         | 1.48e+04 |
---------------------------------
---------------------------------
| explained_variance | 0.191    |
| fps                | 24       |
| nupdates           | 100      |
| policy_entropy     | 0.753    |
| policy_loss        | -0.0248  |
| total_timesteps    | 2000     |
| value_loss         | 0.000634 |
---------------------------------
----------------------------------
| explained_variance | -4.91e-05 |
| fps                | 23        |
| nupdates           | 200       |
| policy_entropy     | 0.747     |
| policy_loss        | -12.2     |
| total_timesteps    | 4000      |
| value_loss         | 341       |
----------------------------------
---------------------------------
| ex

In [None]:
# Test the trained agent
obs = env.reset()
timestep = 1
while True:
  action, _ = model.predict(obs, deterministic=True)
  obs, reward, done, info = env.step(action)

  # if env.total(timestep) > 10000:
  #   pdb.set_trace()
  env.render(mode='console')
  if done:
    print("Goal reached!", "reward=", reward)
    break

  timestep += 1
env.render(mode='console')

ValueError: ignored

In [None]:
big_gain = np.exp(np.log(1908.20/1000)/(2000/365))
big_gain

1.1251589010429977