### Machine Learning in the Stock Market

The objective of this project is to use a machine learning model to make price prediction insights. We will be using a Long Short Term Memory (LSTM) Neural Network for prdicting a potenitial increases or decreases in price. The reason we chose LSTM is for its ability to dynamically include current and past data in the model.

Credit: Blankly Finance is a package that allows you to build trading strategies and backtest them. I used the Blankly library, docs, and tutorials to build this project out. 

In [1]:
import blankly
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.nn import LSTM
from torch.autograd import Variable

#### Preparing the Data

We will be using historical price data of the last year, but will generate smaller episodes using the sliding window technique of creating training and testing sets.

In [2]:
def gen_episode(data, length, output):
    
    x = []
    y = []
    
    for i in range(len(data) - length):
        x_curr = data[i : (i + length - output)]
        y_curr = data[i + length - output : i + length]
        x.append(x_curr)
        y.append(y_curr)
        
    return np.array(x), np.array(y)

#### Creating our Model

Now, we will initialize our LSTM model using the technical indicators, RSI and MACD. 

In [3]:
def init_model(symbol, state: blankly.StrategyState):
    
    'Setting up interface with Blankly tools '
    interface = state.interface
    resolution = state.resolution
    variables = state.variables
    
    variables['history'] = interface.history(symbol, 300, resolution, return_as='list')['close']
    
    'Setting up indicators '
    rsi = blankly.indicators.rsi(state.variables['history'])
    macd = blankly.indicators.macd(state.variables['history'])
    
    'Splitting the data into 5 training data points and 3 testing data points'
    seq_length = 8
    output_length = 3
    
    'Normalizing and gathering the data'
    
    'We start at 26 because the MACD uses the 26 day MA so we need at least 26 data points before having a MACD value'
    x = [variables['history'][i] / variables['history'][i-1] for i in range(25, len(variables['history']))] 
    x , y = gen_episode(x, seq_length, output_length)
    y = Variable(torch.Tensor(np.array(y))).unsqueeze(0)
    
    rsi_x = rsi[11:]
    rsi_x,_ = gen_episode(rsi_x, seq_length, output_length)
    
    macd_values,_ = gen_episode(macd[0], seq_length, output_length)
    macd_signals,_ = gen_episode(macd[1], seq_length, output_length)
    
    'Put all the features into one data structure'
    x_all = np.zeros((len(x), seq_length-output_length, 4))
    
    for i in range(len(x)):
        for j in range(seq_length - output_length):
            x_all[i][j][0] = x[i][j]
            x_all[i][j][1] = rsi_x[i][j]
            x_all[i][j][2] = macd_values[i][j]
            x_all[i][j][3] = macd_signals[i][j]
            
    x_final = Variable(torch.Tensor(x_all))
    
    
    'Training our LSTM model'
    num_epochs = 10000
    learning_rate = 0.0003
    
    state.lstm = LSTM(4, 20, batch_first = True)
    state.linear = nn.Linear(20, 3)
    criterion = torch.nn.MSELoss()
    
    'Optimizer for LSTM and Linear Layer'
    optimizer = torch.optim.Adam([
        {'params': state.lstm.parameters()},
        {'params': state.linear.parameters()},],
        lr=learning_rate)
    
    'Run the model for each epoch'
    for epoch in range(num_epochs):
        outputs, (h_n, c_n) = state.lstm(x_final)
        out = state.linear(h_n)
        'We are using sigmoid activation for our linear layer so we get positive values for an increase in price and negative for a decrease in price'
        out = F.sigmoid(out) + 0.5
        optimizer.zero_grad()
        
        'Calculate our loss and backpropogate'
        loss = criterion(out, y)
        loss.backward()
        optimizer.step()
        
        'Display loss function every 500 epochs to ensure our model is effectively training'
        if epoch % 500 == 0:
            print("Epoch: %d -> Loss: %1.5f" % (epoch, loss.item()))
            
    'Take the avg of the results from three days, two days, and one day out calculation'
    state.three_readings = [[0,0], [0,0], [0,0]]

#### Using our Model to create a Strategy

We will take the outputs of our LSMT model and create a buy/sell test strategy and backtest it

In [4]:
def price_event(price, symbol, state: blankly.StrategyState):
    
    'Dynamically add current price to the history'
    state.variables['history'].append(price)
    
    'Extract the data from the last 5 days and feed it to our model'
    into = [state.variables['history'][i]/state.variables['history'][i-1] for i in range(-5, 0)]
    rsi = blankly.indicators.rsi(state.variables['history'])
    rsi_ = np.array(rsi[-5:])
    macd = blankly.indicators.macd(state.variables['history'])
    macd_values = np.array(macd[0][-5:])
    macd_signals = np.array(macd[1][-5:])
    
    pred = np.zeros((1, len(into), 4))
    
    for i in range(len(into)):
        pred[0][i][0] = into[i]
        pred[0][i][1] = rsi_[i]
        pred[0][i][2] = macd_values[i]
        pred[0][i][3] = macd_signals[i]
    
    pred = torch.Tensor(pred)
    
    'Run the model'
    out, (h, c) = state.lstm(pred)
    out = state.linear(h)
    out = F.sigmoid(out) + 0.5
    
    'Get the average of the last three days so we do not rely on a single day'
    state.three_readings[0][0] += out[0][0][0]
    state.three_readings[0][1] += 1
    state.three_readings[1][0] += out[0][0][1]
    state.three_readings[1][1] += 1
    state.three_readings[2][0] += out[0][0][0]
    state.three_readings[2][1] += 1
    
    avgprice = state.three_readings[0][0] / state.three_readings[0][1]
    
    'Create our buy/sell logic'
    'If our prediction is 1 (we think price will go up), we buy proportional to how much the prediction is above 1. The same rule applies to Selling'
    
    value = blankly.trunc(state.interface.account[state.base_asset].available, 2)
    
    if avgprice > 1:
        buy = blankly.trunc(state.interface.cash * 2 * (avgprice.item() - 1) / price, 2)
        if buy > 0:
            state.interface.market_order(symbol, side='buy', size=buy)
    elif value > 0:
        sell = blankly.trunc(value * 2 * (1 - avgprice.item()), 2)
        if sell > 0:
            state.interface.market_order(symbol, side='sell', size=sell)
    
    state.three_readings = [state.three_readings[1], state.three_readings[2], [0,0]]

#### Backtest

Connect to the Alpaca API and backtest our LSTM Model

In [5]:
exchange = blankly.Alpaca()
strategy = blankly.Strategy(exchange)
strategy.add_price_event(price_event, symbol='AAPL', resolution='1d', init=init_model)
results = strategy.backtest(to='1y', initial_value={'USD': 100000})
print(results)

INFO: "binance_futures" not specified in preferences, defaulting to: "{'cash': 'USDT', 'margin_type': 'USDT-M'}"
INFO: "okx" not specified in preferences, defaulting to: "{'cash': 'USDT'}"
INFO: No portfolio name to load specified, defaulting to the first in the file: (Alpaca). This is fine if there is only one portfolio in use.


No cached data found for NVDA from: 1620785349.7710078 to 1652234949.7710078 at a resolution of 86400 seconds.

Backtesting...
Epoch: 0 -> Loss: 0.00671




Epoch: 500 -> Loss: 0.00127
Epoch: 1000 -> Loss: 0.00102
Epoch: 1500 -> Loss: 0.00079
Epoch: 2000 -> Loss: 0.00064
Epoch: 2500 -> Loss: 0.00049
Epoch: 3000 -> Loss: 0.00037
Epoch: 3500 -> Loss: 0.00027
Epoch: 4000 -> Loss: 0.00021
Epoch: 4500 -> Loss: 0.00016
Epoch: 5000 -> Loss: 0.00012
Epoch: 5500 -> Loss: 0.00010
Epoch: 6000 -> Loss: 0.00009
Epoch: 6500 -> Loss: 0.00007
Epoch: 7000 -> Loss: 0.00006
Epoch: 7500 -> Loss: 0.00005
Epoch: 8000 -> Loss: 0.00004
Epoch: 8500 -> Loss: 0.00004
Epoch: 9000 -> Loss: 0.00003
Epoch: 9500 -> Loss: 0.00003
Progress: [##########] 100% Done...

Historical Dataframes: 
Account History: 
       NVDA          USD               time  Account Value (USD)
0       0.0     200000.0  1620785349.771008        200000.000000
1       0.0     200000.0  1620785349.771008        200000.000000
2     82.29   155044.973  1620871749.771008        200000.000000
3    142.26  120878.8646  1620958149.771008        201927.231800
4    239.41   65831.7316  1621044549.771008   

INFO: View your backtest here: https://app.blankly.finance/Bba8YOlrozc44bOgbunka1bMIn62/g9QbfPpm0O26YTc7XSOa/7451ee2e-6aa2-4b44-8b32-a28fec9a3cb1/backtest
