# Building a Trading Bot for Pfizer Stock Data Using an Autoencoder and Reinforcement Learning

In this notebook, which is based on my previous project "Trading Bots for Pfizer Stock", I will use an autoencoder (AE) and reinforcement learning to create a bot for the trading of the Pfizer stock and observe its performance. I will compute important financial indicators for the stock data and use an autoencoder to obtain a compressed representation of these signals; the aim of using an AE is to eliminate noise and capture important trends from the indicators. I will then use this compressed representation as environment signals for the trading bot. I will also create a baseline model which includes all of the financial indicators, and I will compare the performance of both models.

In [50]:
#Importing the necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import gym
import gym_anytrading
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines import A2C
from finta import TA
from gym_anytrading.envs import StocksEnv
from sklearn.preprocessing import StandardScaler
import random
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LeakyReLU
from tensorflow.keras.layers import BatchNormalization

In [51]:
#Reading in the data
PFE_data = pd.read_csv('C:/Users/chinm/Downloads/PFE_data.csv')

#Converting the date column to datetime and setting this to the index
PFE_data['Date'] = pd.to_datetime(PFE_data['Date'])
PFE_data = PFE_data.set_index('Date')

#Viewing the data
PFE_data.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-01-03,31.024668,31.309298,30.920303,31.309298,25.928759,23391844
2017-01-04,31.432638,31.641365,31.337761,31.58444,26.156618,22753963
2017-01-05,31.660341,31.963947,31.423149,31.888046,26.408049,21083584
2017-01-06,31.935484,31.973434,31.63188,31.764706,26.305901,18418228
2017-01-09,31.717268,31.944971,31.669828,31.755219,26.298042,21559886


In [52]:
#Creating a dataframe of important financial indicators
ind= pd.DataFrame() 
ind['SMA_10'] = TA.SMA(PFE_data,10) #10-day moving average
ind['SMA_50'] = TA.SMA(PFE_data,50) #50-day moving average
ind['SMA_200'] = TA.SMA(PFE_data,200) #200-day moving average
ind['RSI'] = TA.RSI(PFE_data) #Relative strength index
ind['MACD'] = TA.MACD(PFE_data)['MACD'] #Moving average convergence divergence
ind['OBV'] = TA.OBV(PFE_data) #On-balance volume
ind['BB'] = TA.BBWIDTH(PFE_data) #Bollinger bands width
ind['Stoch_K'] = TA.STOCH(PFE_data) #Stochastic oscillator K
ind['Stoch_D'] = TA.STOCHD(PFE_data) #Stochastic oscillator D

In [53]:
#Processing the dataframe
ind = ind.iloc[199:] #Beginning the dataframe at day 200, because we are working with 200-day MA
ind = ind.fillna(0) 
ind_norm = StandardScaler().fit_transform(ind)

In [54]:
#Building and fitting the autoencoder model

#Setting a seed for reproducibility
np.random.seed(2021)
tf.random.set_random_seed(2021)

X = ind_norm.copy()
inp_dim = X.shape[1]

#encoder
visible = Input(shape=(inp_dim,))
a = Dense(6)(visible)
a = BatchNormalization()(a)
a = LeakyReLU()(a)
#bottleneck
bottleneck = Dense(3)(a)
#decoder
b = Dense(6)(bottleneck)
b = BatchNormalization()(b)
b = LeakyReLU()(b)
#output
output = Dense(inp_dim, activation='linear')(b)

#Fitting the model
model = Model(inputs=visible, outputs=output)
model.compile(optimizer='adam', loss='mse')
history = model.fit(X, X, epochs=400, batch_size=50, verbose=2)

Train on 808 samples
Epoch 1/400
808/808 - 2s - loss: 1.1367
Epoch 2/400
808/808 - 0s - loss: 1.0469
Epoch 3/400
808/808 - 0s - loss: 0.9816
Epoch 4/400
808/808 - 0s - loss: 0.9262
Epoch 5/400
808/808 - 0s - loss: 0.8841
Epoch 6/400
808/808 - 0s - loss: 0.8367
Epoch 7/400
808/808 - 0s - loss: 0.7895
Epoch 8/400
808/808 - 0s - loss: 0.7471
Epoch 9/400
808/808 - 0s - loss: 0.7090
Epoch 10/400
808/808 - 0s - loss: 0.6705
Epoch 11/400
808/808 - 0s - loss: 0.6333
Epoch 12/400
808/808 - 0s - loss: 0.6060
Epoch 13/400
808/808 - 0s - loss: 0.5738
Epoch 14/400
808/808 - 0s - loss: 0.5465
Epoch 15/400
808/808 - 0s - loss: 0.5181
Epoch 16/400
808/808 - 0s - loss: 0.4976
Epoch 17/400
808/808 - 0s - loss: 0.4726
Epoch 18/400
808/808 - 0s - loss: 0.4562
Epoch 19/400
808/808 - 0s - loss: 0.4288
Epoch 20/400
808/808 - 0s - loss: 0.4009
Epoch 21/400
808/808 - 0s - loss: 0.3869
Epoch 22/400
808/808 - 0s - loss: 0.3701
Epoch 23/400
808/808 - 0s - loss: 0.3504
Epoch 24/400
808/808 - 0s - loss: 0.3366
Epoc

808/808 - 0s - loss: 0.1580
Epoch 199/400
808/808 - 0s - loss: 0.1666
Epoch 200/400
808/808 - 0s - loss: 0.1702
Epoch 201/400
808/808 - 0s - loss: 0.1650
Epoch 202/400
808/808 - 0s - loss: 0.1655
Epoch 203/400
808/808 - 0s - loss: 0.1664
Epoch 204/400
808/808 - 0s - loss: 0.1643
Epoch 205/400
808/808 - 0s - loss: 0.1568
Epoch 206/400
808/808 - 0s - loss: 0.1677
Epoch 207/400
808/808 - 0s - loss: 0.1584
Epoch 208/400
808/808 - 0s - loss: 0.1626
Epoch 209/400
808/808 - 0s - loss: 0.1620
Epoch 210/400
808/808 - 0s - loss: 0.1632
Epoch 211/400
808/808 - 0s - loss: 0.1596
Epoch 212/400
808/808 - 0s - loss: 0.1613
Epoch 213/400
808/808 - 0s - loss: 0.1730
Epoch 214/400
808/808 - 0s - loss: 0.1579
Epoch 215/400
808/808 - 0s - loss: 0.1573
Epoch 216/400
808/808 - 0s - loss: 0.1619
Epoch 217/400
808/808 - 0s - loss: 0.1725
Epoch 218/400
808/808 - 0s - loss: 0.1617
Epoch 219/400
808/808 - 0s - loss: 0.1639
Epoch 220/400
808/808 - 0s - loss: 0.1589
Epoch 221/400
808/808 - 0s - loss: 0.1648
Epoch 

Epoch 394/400
808/808 - 0s - loss: 0.1483
Epoch 395/400
808/808 - 0s - loss: 0.1436
Epoch 396/400
808/808 - 0s - loss: 0.1392
Epoch 397/400
808/808 - 0s - loss: 0.1358
Epoch 398/400
808/808 - 0s - loss: 0.1486
Epoch 399/400
808/808 - 0s - loss: 0.1442
Epoch 400/400
808/808 - 0s - loss: 0.1348


In [55]:
#Extracting the compressed representation of the indicators
encoder = Model(inputs=visible, outputs=bottleneck)
bottleneck_rep = encoder.predict(X)

In [56]:
#Creating a new array using the bottleneck representation to serve as environment signals
b = pd.DataFrame(bottleneck_rep)
d = PFE_data.copy().iloc[199:]
d = d[['Low','Volume']]
d['AE Col 1'] = b[0].values
d['AE Col 2'] = b[1].values
d['AE Col 3'] = b[2].values
env_sig = StandardScaler().fit_transform(d)
env_df = pd.DataFrame(env_sig)
env_df.columns = ['Low','Volume','AE Col 1','AE Col 2','AE Col 3']

In [97]:
#Creating the AE_powered reinforcement bot
wind_size = 20

#Defining a function that will yield the signals of the environment
def env_signals(env):
    start_index = env.frame_bound[0]-env.window_size
    end_index = env.frame_bound[1]
    prices = env.df.iloc[start_index:end_index]['Low'].to_numpy()
    signal_features = env.df.iloc[start_index:end_index].to_numpy()
    return prices, signal_features

#Creating an environment class
class environment(StocksEnv):
    _process_data = env_signals

#Building the environment
env = environment(df=env_df,window_size=wind_size,frame_bound=(wind_size,100))
make_env = lambda: env
env = DummyVecEnv([make_env])

#Training the model
model_20 = A2C('MlpLstmPolicy', env, verbose=1) 
model_20.learn(total_timesteps=40000)

---------------------------------
| explained_variance | -95.8    |
| fps                | 10       |
| nupdates           | 1        |
| policy_entropy     | 0.692    |
| total_timesteps    | 5        |
| value_loss         | 0.00661  |
---------------------------------
---------------------------------
| explained_variance | -0.47    |
| fps                | 166      |
| nupdates           | 100      |
| policy_entropy     | 0.692    |
| total_timesteps    | 500      |
| value_loss         | 0.046    |
---------------------------------
---------------------------------
| explained_variance | -122     |
| fps                | 176      |
| nupdates           | 200      |
| policy_entropy     | 0.693    |
| total_timesteps    | 1000     |
| value_loss         | 0.0236   |
---------------------------------
---------------------------------
| explained_variance | 0.503    |
| fps                | 184      |
| nupdates           | 300      |
| policy_entropy     | 0.692    |
| total_timest

---------------------------------
| explained_variance | -2.76    |
| fps                | 196      |
| nupdates           | 3100     |
| policy_entropy     | 0.519    |
| total_timesteps    | 15500    |
| value_loss         | 0.00143  |
---------------------------------
---------------------------------
| explained_variance | -0.869   |
| fps                | 196      |
| nupdates           | 3200     |
| policy_entropy     | 0.564    |
| total_timesteps    | 16000    |
| value_loss         | 0.0126   |
---------------------------------
---------------------------------
| explained_variance | -33.1    |
| fps                | 196      |
| nupdates           | 3300     |
| policy_entropy     | 0.0733   |
| total_timesteps    | 16500    |
| value_loss         | 0.00134  |
---------------------------------
---------------------------------
| explained_variance | -2.41    |
| fps                | 196      |
| nupdates           | 3400     |
| policy_entropy     | 0.583    |
| total_timest

---------------------------------
| explained_variance | -2.21    |
| fps                | 197      |
| nupdates           | 6200     |
| policy_entropy     | 0.41     |
| total_timesteps    | 31000    |
| value_loss         | 0.00036  |
---------------------------------
---------------------------------
| explained_variance | -2.38    |
| fps                | 197      |
| nupdates           | 6300     |
| policy_entropy     | 0.00868  |
| total_timesteps    | 31500    |
| value_loss         | 0.000435 |
---------------------------------
---------------------------------
| explained_variance | -27      |
| fps                | 197      |
| nupdates           | 6400     |
| policy_entropy     | 0.449    |
| total_timesteps    | 32000    |
| value_loss         | 0.00619  |
---------------------------------
---------------------------------
| explained_variance | -2.47    |
| fps                | 197      |
| nupdates           | 6500     |
| policy_entropy     | 0.216    |
| total_timest

<stable_baselines.a2c.a2c.A2C at 0x1a3a81216c8>

In [58]:
#Testing the AE_powered RL model and viewing its profit
env = environment(df=env_df, window_size=wind_size, frame_bound=(100,150))
env.seed(2021)
obs = env.reset()
while True: 
    obs = obs[np.newaxis, ...]
    action, _states = model_20.predict(obs)
    obs, rewards, done, info = env.step(action)
    if done:
        print("The total profit after 50 days is ",info['total_profit'])
        break

The total profit after 50 days is  2.018200548603831


In [83]:
#Creating a dataframe for the baseline model
merged = pd.merge(PFE_data.iloc[199:],ind,on='Date',how='left')
columns = list(merged.columns[6:])
columns.insert(0,'Low')
columns.insert(1,'Volume')
merged = merged[columns]
merged = merged.fillna(0)
merged_norm = StandardScaler().fit_transform(merged)
baseline_env_df = pd.DataFrame(merged_norm)
baseline_env_df.columns = merged.columns
baseline_env_df.index = merged.index
baseline_env_df

Unnamed: 0_level_0,Low,Volume,SMA_10,SMA_50,SMA_200,RSI,MACD,OBV,BB,Stoch_K,Stoch_D
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2017-10-17,-0.724786,-0.799317,-0.711039,-1.251479,-1.664942,0.804564,0.632641,-1.213382,-0.960629,0.542766,0.525008
2017-10-18,-0.712523,-0.770788,-0.715115,-1.233854,-1.659566,0.099926,0.544598,-1.279149,-0.957472,-0.610238,-0.022266
2017-10-19,-0.697192,-0.386057,-0.707590,-1.214570,-1.653963,0.647575,0.523659,-1.188881,-0.947571,0.420904,0.124050
2017-10-20,-0.638937,-0.549879,-0.695989,-1.193834,-1.648626,0.856601,0.524656,-1.109045,-0.922519,0.818839,0.223071
2017-10-23,-0.556151,-0.717685,-0.687836,-1.172062,-1.643080,0.818092,0.515976,-1.178194,-0.941767,0.263194,0.536350
...,...,...,...,...,...,...,...,...,...,...,...
2020-12-24,0.329252,-0.798410,0.547783,0.248972,-0.487492,-0.629651,-0.046667,-1.106263,1.500409,-1.602254,-1.797232
2020-12-28,0.167669,0.030901,0.405673,0.264581,-0.475909,-0.830183,-0.238963,-1.223085,1.544971,-1.847759,-1.781578
2020-12-29,0.203217,-0.230160,0.334288,0.272189,-0.459060,-0.691443,-0.355151,-1.122888,1.638685,-1.722279,-1.857894
2020-12-30,0.174132,-0.112072,0.269182,0.278575,-0.446593,-0.838422,-0.486795,-1.230606,1.772777,-1.879715,-1.957412


In [98]:
#Creating the baseline-model trading bot

#Defining a function that will yield the signals of the environment
def env_signals(env):
    start_index = env.frame_bound[0]-env.window_size
    end_index = env.frame_bound[1]
    prices = env.df.iloc[start_index:end_index]['Low'].to_numpy()
    signal_features = env.df.iloc[start_index:end_index].to_numpy()
    return prices, signal_features

#Creating an environment class
class environment(StocksEnv):
    _process_data = env_signals

#Building the environment
env = environment(df=baseline_env_df,window_size=wind_size,frame_bound=(wind_size,100))
make_env = lambda: env
env = DummyVecEnv([make_env])

#Training the model
model_20 = A2C('MlpLstmPolicy', env, verbose=1) 
model_20.learn(total_timesteps=40000)

---------------------------------
| explained_variance | -1.46    |
| fps                | 10       |
| nupdates           | 1        |
| policy_entropy     | 0.693    |
| total_timesteps    | 5        |
| value_loss         | 0.000513 |
---------------------------------
---------------------------------
| explained_variance | -2.64    |
| fps                | 174      |
| nupdates           | 100      |
| policy_entropy     | 0.692    |
| total_timesteps    | 500      |
| value_loss         | 0.00618  |
---------------------------------
---------------------------------
| explained_variance | -9.03    |
| fps                | 188      |
| nupdates           | 200      |
| policy_entropy     | 0.693    |
| total_timesteps    | 1000     |
| value_loss         | 0.000757 |
---------------------------------
---------------------------------
| explained_variance | -0.623   |
| fps                | 192      |
| nupdates           | 300      |
| policy_entropy     | 0.692    |
| total_timest

---------------------------------
| explained_variance | -4.61    |
| fps                | 201      |
| nupdates           | 3100     |
| policy_entropy     | 0.659    |
| total_timesteps    | 15500    |
| value_loss         | 0.00445  |
---------------------------------
---------------------------------
| explained_variance | 0.175    |
| fps                | 201      |
| nupdates           | 3200     |
| policy_entropy     | 0.609    |
| total_timesteps    | 16000    |
| value_loss         | 0.000607 |
---------------------------------
---------------------------------
| explained_variance | 0.503    |
| fps                | 200      |
| nupdates           | 3300     |
| policy_entropy     | 0.677    |
| total_timesteps    | 16500    |
| value_loss         | 0.0451   |
---------------------------------
---------------------------------
| explained_variance | -1.9     |
| fps                | 200      |
| nupdates           | 3400     |
| policy_entropy     | 0.562    |
| total_timest

---------------------------------
| explained_variance | -0.0742  |
| fps                | 199      |
| nupdates           | 6200     |
| policy_entropy     | 0.303    |
| total_timesteps    | 31000    |
| value_loss         | 0.00229  |
---------------------------------
---------------------------------
| explained_variance | -27.9    |
| fps                | 199      |
| nupdates           | 6300     |
| policy_entropy     | 0.0777   |
| total_timesteps    | 31500    |
| value_loss         | 0.000898 |
---------------------------------
---------------------------------
| explained_variance | -23.8    |
| fps                | 198      |
| nupdates           | 6400     |
| policy_entropy     | 0.577    |
| total_timesteps    | 32000    |
| value_loss         | 0.00358  |
---------------------------------
---------------------------------
| explained_variance | -0.4     |
| fps                | 198      |
| nupdates           | 6500     |
| policy_entropy     | 0.377    |
| total_timest

<stable_baselines.a2c.a2c.A2C at 0x1a3a97c3088>

In [96]:
#Testing the AE_powered RL model and viewing its profit
env = environment(df=baseline_env_df, window_size=wind_size, frame_bound=(100,150))
env.seed(2021)
obs = env.reset()
while True: 
    obs = obs[np.newaxis, ...]
    action, _states = model_20.predict(obs)
    obs, rewards, done, info = env.step(action)
    if done:
        print("The total profit after 50 days is ",info['total_profit'])
        break

The total profit after 50 days is  0.9790724958748944


It can be seen that the baseline model loses money (about 2%), while the AE-powered trading bot more than doubles the initial investment. From the performance of the trading bots, the autoencoder proves effective in reducing noise while extracting important trends from the dataframe of financial indicators. Refining the autoencoder through adopting a different architecture and using hyperparameter tuning may increase performance, and this will be the focus of a future project of mine.