In this notebook, we'll use the Blankly package to gather data and backtest a machine learning model. We'll use PyTorch for the model itself, and the model will be an LSTM (Long Short-Term Memory) recursive neural network. LSTMs are great for sequential data, as their architecture allows them to uncover relationships in data while taking into account the significance of the order in which data comes. Once we have the model, we'll use its predictions to decide how much Ethereum to buy or sell on any given day.

In [None]:
!pip install blankly #install blankly

We'll initialize the basics of our Blankly environment with the command '*blankly init*'. Once done, we get template .json files that we'll need for configuring backtests. Most importantly, we'll need to input our API keys into keys.json. 

In [None]:
!blankly init

Initializing...

██████╗ ██╗      █████╗ ███╗   ██╗██╗  ██╗██╗  ██╗   ██╗    ███████╗██╗███╗   ██╗ █████╗ ███╗   ██╗ ██████╗███████╗
██╔══██╗██║     ██╔══██╗████╗  ██║██║ ██╔╝██║  ╚██╗ ██╔╝    ██╔════╝██║████╗  ██║██╔══██╗████╗  ██║██╔════╝██╔════╝
██████╔╝██║     ███████║██╔██╗ ██║█████╔╝ ██║   ╚████╔╝     █████╗  ██║██╔██╗ ██║███████║██╔██╗ ██║██║     █████╗  
██╔══██╗██║     ██╔══██║██║╚██╗██║██╔═██╗ ██║    ╚██╔╝      ██╔══╝  ██║██║╚██╗██║██╔══██║██║╚██╗██║██║     ██╔══╝  
██████╔╝███████╗██║  ██║██║ ╚████║██║  ██╗███████╗██║       ██║     ██║██║ ╚████║██║  ██║██║ ╚████║╚██████╗███████╗
╚═════╝ ╚══════╝╚═╝  ╚═╝╚═╝  ╚═══╝╚═╝  ╚═╝╚══════╝╚═╝       ╚═╝     ╚═╝╚═╝  ╚═══╝╚═╝  ╚═╝╚═╝  ╚═══╝ ╚═════╝╚══════╝

Downloading keys template...
Downloading settings defaults...
Downloading backtest defaults...
Downloading RSI bot example...
Writing deployment defaults...
Detecting python version...
[96m[1mFound python version: 3.7[0m
Writing requirements.txt defaults...
[92m[4mSuccess![0m


In [None]:
import blankly
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import LSTM
import torch.optim as optim
from torch.autograd import Variable
from torch.utils.data import random_split

PyTorch has basically any ML building block you could want. 

In [None]:
def episode_gen(data, seq_length,output_size):
    x = []
    y = []
    #Loop through data, adding input data to x array and output data to y array
    for i in range(len(data)-seq_length):
        _x = data[i:(i+seq_length - output_size)]
        _y = data[i+seq_length - output_size:i + seq_length]
        x.append(_x)
        y.append(_y)

    return np.array(x),np.array(y)

We use this method to generate "episodes" for our model to learn from. We have Ethereum prices from the past few years, but the prices that most affect tomorrow's prices are the prices from today and the last few days. As such, we can take "episodes" of length *seq_length* and split them into one section that we want to predict from the other

In [None]:
def init_NN(symbol, state: blankly.StrategyState):
    interface = state.interface
    resolution = state.resolution
    variables = state.variables

    #Get price data
    variables['history'] = interface.history(symbol, 300, resolution, return_as='list')['close']
    '''We use Blankly's built-in indicator functions to calculate
    indicators we can use along with price data to predict future prices
    '''
    rsi = blankly.indicators.rsi(state.variables['history'])
    macd = blankly.indicators.macd(state.variables['history'])

    '''We'll break the historical Ethereum data into 8 day episodes
    and attempt to predict the final three days using the first 5.
    '''
    seq_length = 8
    output_size = 3

    '''Feature engineering -- here, we calculate the price change from the day before
    as a ratio. This is useful because it means we have less issues with scaling with the model,
    as the data will all already be in roughly the same range. We ignore the first 25 elements
    because we want every observation to have corresponding RSI + MACD data, and MACD requires
    26 periods.
    '''
    x = [variables['history'][i] / variables['history'][i-1] for i in range(25,len(variables['history']))]
    x, y = episode_gen(x, seq_length, output_size)
    y = Variable(torch.Tensor(np.array(y))).unsqueeze(0)
    #RSI data gathering
    x_rsi = rsi[11:]
    x_rsi,_ = episode_gen(x_rsi,seq_length, output_size)

    #MACD data gathering
    macd_vals,_ = episode_gen(macd[0], seq_length, output_size)
    macd_signals,_ = episode_gen(macd[1],seq_length, output_size)

    '''In this section, we put all the features we just extracted into one NumPy array
    which we then convert to a PyTorch tensor that we can run our model on.
    '''
    x_agg = np.zeros((len(macd_signals),seq_length-output_size, 4))
    for i in range(len(macd_signals)):
      for j in range(seq_length - output_size):
        x_agg[i][j][0] = x[i][j]
        x_agg[i][j][1] = x_rsi[i][j]
        x_agg[i][j][2] = macd_vals[i][j]
        x_agg[i][j][3] = macd_signals[i][j]
    x_tot = Variable(torch.Tensor(x_agg))
    '''Here's our training loop! We have a fairly small dataset, so we can run through 10,000 epochs pretty quickly
    Here's also the first place we see our model architecture. We use an LSTM that takes in 4 features and has 
    a hidden state with dimension 20 and feed the hidden state into a linear neural network layer and sigmoid 
    activation that output 3 numbers, the predicted prices for the next three days. We use mean-squared-error loss
    and an Adam optimizer
    '''
    num_epochs = 10000
    learning_rate = 0.0003


    state.lstm = LSTM(4,20, batch_first = True)
    state.lin = nn.Linear(20,3)
    criterion = torch.nn.MSELoss()    # mean-squared error for regression
    #Optimizer: make sure to include parameters of both linear layer and LSTM
    optimizer = torch.optim.Adam([
                {'params': state.lstm.parameters()},
                {'params': state.lin.parameters()}
            ], lr=learning_rate)

    # Train the model
    for epoch in range(num_epochs):
        #run model
        outputs, (h_n, c_n) = state.lstm(x_tot)
        out = state.lin(h_n)
        out = 0.5 + F.sigmoid(out)
        optimizer.zero_grad()
        
        loss = criterion(out, y) #calculate loss function
        
        loss.backward() #backprop
        
        optimizer.step() #gradient descent

        #Output loss functions every 500 epochs so we can make sure the model is training
        if epoch % 500 == 0:
          print("Epoch: %d, loss: %1.5f" % (epoch, loss.item()))
    #state.lstm.load_state_dict(torch.load('lstm_pm.pth'))
    #state.lin.load_state_dict(torch.load('lin_pm.pth'))      
    '''We use this in the trading algorithm for more stability.
    Essentially, instead of relying on a single output of the model 
    to tell us whether to buy or sell, we average the readings from three different calculations
    (3 days before, 2 days before, day before)
    '''    
    state.lastthree = [[0,0],[0,0],[0,0]] 

This section is where we do the bulk of the work with our data and model. We pull the data using Blankly and calculate values of indicators using Blankly's built in functions. Then, we aggregate the data, instantiate our model (an LSTM that feeds into a linear layer with sigmoid activation), and train the model. We also introduce one of the aspects of the trading algorithm -- the three-day-average for deciding whether to buy or sell (and what amount).

In [None]:
def price_lstm(price,symbol,state: blankly.StrategyState):
    state.variables['history'].append(price) #Add latest price to current list of data

    '''Here, we pull the data from the last few days, prepare it,
    and run the necessary indicator functions to feed into our model
    '''
    into = [state.variables['history'][i]/state.variables['history'][i-1] for i in range(-5,0)]
    rsi = blankly.indicators.rsi(state.variables['history'])
    rsi_in = np.array(rsi[-5:])
    macd = blankly.indicators.macd(state.variables['history'])
    macd_vals = np.array(macd[0][-5:])
    macd_signals = np.array(macd[1][-5:])

    '''We put the data into the torch Tensor that we'll run the model on
    '''
    pred_in = np.zeros((1,len(into),4))
    for i in range(len(into)):
      pred_in[0][i][0] = into[i]
      pred_in[0][i][1] = rsi_in[i]
      pred_in[0][i][2] = macd_vals[i]
      pred_in[0][i][3] = macd_signals[i]
    pred_in = torch.Tensor(pred_in)
  
    print(price) #Print the price so we can see what's going on

    '''Run the data through the trained model. 
    The field out stores the prediction values we want
    '''
    out,(h,c) = state.lstm(pred_in)
    out = state.lin(h)
    out = 0.5 + F.sigmoid(out)
    
    '''This definitely could be shortened with a loop,
    but basically, we add the percentage increase to the other values in the
    3-day-average array. We also increment a counter showing how many values have been
    added before averaging. This handles the edge case of the first few values (where
    we wouldn't divide by 3)
    '''
    state.lastthree[0][0]+=out[0][0][0]
    state.lastthree[0][1]+=1
    state.lastthree[1][0]+=out[0][0][1]
    state.lastthree[1][1]+=1
    state.lastthree[2][0]+=out[0][0][2]
    state.lastthree[2][1]+=1

    '''The avg price increase is calculated by dividing the sum of next day predictions
    by the number of predictions for the next day.
    '''
    priceavg = state.lastthree[0][0]/state.lastthree[0][1]

    curr_value = blankly.trunc(state.interface.account[state.base_asset].available, 2) #Amount of Ethereum available
    if priceavg > 1:
        # If we think price will increase, we buy
        buy = blankly.trunc(state.interface.cash  * 2 * (priceavg.item() - 1)/price, 2) #Buy an amount proportional to priceavg - 1
        if buy > 0:
          state.interface.market_order(symbol, side='buy', size=buy)
    elif curr_value > 0:
        #If we think price will decrease, we sell
         cv =  blankly.trunc(curr_value  * 2 * (1 - priceavg.item()),2) #Sell an amount proportional to 1 - priceavg
         if cv > 0:
          state.interface.market_order(symbol, side='sell', size=cv)

    print("prediction for price --",priceavg) #Print so we can see what's happening
    state.lastthree = [state.lastthree[1], state.lastthree[2], [0,0]] #Shift the values in our 3-day-average array

This section is where we implement the logic of the trading algorithm. After we train the model, we still have to decide how to interpret its predictions. We use the average-of-3 method discussed above, where we average the price predictions across 3 days to decide whether to buy or sell. We also vary the amount bought or sold based on the output prediction value -- higher value means more confident buy and lower means more confident sell, so we would buy more if the model output 1.05 than if it output 1.01. Similarly, if the model output 0.95, we would sell more than if it output 0.99.

In [None]:
exchange = blankly.FTX() #Connect to FTX API
strategy = blankly.Strategy(exchange) #Initialize a Blankly strategy
strategy.add_price_event(price_lstm, symbol='ETH-USD', resolution='1d', init=init_NN) #Add our price event and initialization
results = strategy.backtest(to='1y', initial_values={'USD': 10000}) #Backtest one year starting with $10,000
print(results)

INFO: No portfolio name to load specified, defaulting to the first in the file: (portfolio). This is fine if there is only one portfolio in use.



Initializing...
dims:x (266, 5) y (266, 3)
dims:x_rsi (277, 5)
dims:macd_vals (266, 5)
xtot torch.Size([266, 5, 4])
Epoch: 0, loss: 0.00655




Epoch: 500, loss: 0.00223
Epoch: 1000, loss: 0.00198
Epoch: 1500, loss: 0.00175
Epoch: 2000, loss: 0.00157
Epoch: 2500, loss: 0.00143
Epoch: 3000, loss: 0.00130
Epoch: 3500, loss: 0.00118
Epoch: 4000, loss: 0.00108
Epoch: 4500, loss: 0.00098
Epoch: 5000, loss: 0.00088
Epoch: 5500, loss: 0.00077
Epoch: 6000, loss: 0.00069
Epoch: 6500, loss: 0.00062
Epoch: 7000, loss: 0.00055
Epoch: 7500, loss: 0.00050
Epoch: 8000, loss: 0.00047
Epoch: 8500, loss: 0.00043
Epoch: 9000, loss: 0.00041
Epoch: 9500, loss: 0.00038

Backtesting...
Progress: [----------] 0.0% 1833.2
prediction for price -- tensor(0.9856, grad_fn=<DivBackward0>)
1869.1
prediction for price -- tensor(0.8249, grad_fn=<DivBackward0>)
1797.2
prediction for price -- tensor(1.2415, grad_fn=<DivBackward0>)
1827.7
prediction for price -- tensor(1.0498, grad_fn=<DivBackward0>)
1765.6
prediction for price -- tensor(1.1630, grad_fn=<DivBackward0>)
1923.3
prediction for price -- tensor(1.2869, grad_fn=<DivBackward0>)
1848.8
prediction for pr

Here, we test our model! We start with $10,000 and connect to FTX's API through Blankly. After creating a Blankly strategy and adding our price event, we can run and see the results. We find a CAGR of 98% -- very good, along with a Sharpe Ratio of 1.5 and a Sortino Ratio of 2.17, both also considered strong. There are many ways we could improve this strategy -- deepen the machine learning model, pull more/better input data, or better choose amounts to buy/sell, but this is a very strong start, and Blankly's package makes it very easy to edit and test this model.