# Experiments with Machine Learning

**APPROACH:** Predict the price using Machine Learning models, then decide to go long or short.

First, import necessary libraries

In [1]:
import pandas as pd 
import yfinance as yf
import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import LSTM, Dense
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import f1_score, accuracy_score
from backtesting import Strategy
from backtesting.lib import crossover
from backtesting import Backtest

## Prepare data
3-year data from 2017-2019, and test with data of 2020 and the first half of 2021

In [2]:
aapl = yf.Ticker('AAPL')
orig_data = aapl.history(start='2016-12-30') # to calculate pct_change over 2 days
orig_data.shape

(1070, 7)

In [3]:
orig_data.index

DatetimeIndex(['2016-12-29', '2016-12-30', '2017-01-03', '2017-01-04',
               '2017-01-05', '2017-01-06', '2017-01-09', '2017-01-10',
               '2017-01-11', '2017-01-12',
               ...
               '2021-03-18', '2021-03-19', '2021-03-22', '2021-03-23',
               '2021-03-24', '2021-03-25', '2021-03-26', '2021-03-29',
               '2021-03-30', '2021-03-31'],
              dtype='datetime64[ns]', name='Date', length=1070, freq=None)

As can be seen from above, the data fetched from Yahoo Finance is a Dataframe, indexed and sorted by date, which is very convenient. The next step is to split the data into train and test set:

In [4]:
drop_columns = ['Dividends', 'Stock Splits']
split_date = np.datetime64('2020-01-01')
split_date

numpy.datetime64('2020-01-01')

In [5]:
data = orig_data.copy()
data.Close = data.Close.pct_change(2)
data.Close[ data.Close > 0 ] = 1
#data.Close[ data.Close.between(-.004, .004)] = 0 # ignore too small changes
data.Close[ data.Close < 0 ] = -1
data = data.iloc[2:]
data

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2017-01-03,27.377192,27.502493,27.131317,-1.0,115127600,0.0,0.0
2017-01-04,27.389012,27.545049,27.365371,1.0,84472400,0.0,0.0
2017-01-05,27.405562,27.627796,27.379556,1.0,88774400,0.0,0.0
2017-01-06,27.608877,27.935134,27.535588,1.0,127007600,0.0,0.0
2017-01-09,27.885486,28.235385,27.883123,1.0,134247600,0.0,0.0
...,...,...,...,...,...,...,...
2021-03-25,119.540001,121.660004,119.000000,-1.0,98844700,0.0,0.0
2021-03-26,120.349998,121.480003,118.919998,1.0,93958900,0.0,0.0
2021-03-29,121.650002,122.580002,120.730003,1.0,80819200,0.0,0.0
2021-03-30,120.110001,120.400002,118.860001,-1.0,85523800,0.0,0.0


In [6]:
data = data.drop(drop_columns, axis=1)
train = data[ data.index <  split_date]
y_train = train['Close']
X_train = train.drop('Close', axis=1)

test = data[data.index >= split_date]
y_test = test['Close']
X_test = test.drop('Close', axis=1) 

## Build models and test their performance

First, I will create a Random Forest Classifier to predict if the price will go up or down. My strategy will then decide to go long or short accordingly. For experimenting, I dedcided to create a classifier with default values.

In [7]:
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
y_pred = rfc.predict(X_test)
f1_score(y_test, y_pred)

0.7061310782241015

In [8]:
accuracy_score(y_test, y_pred)

0.5573248407643312

In [18]:
class RandomForestStrategy(Strategy):
    price_delta = .004

    def init(self):
        self.clf = rfc

    def next(self):
        row = self.data.df.iloc[-1:]
        X = row[['Open', 'High', 'Low', 'Volume']]
        pred = self.clf.predict(X)[0]
        
        # set take-profit and stop-loss prices
        close = self.data.Close
        upper, lower = close[-1] * (1 + np.r_[1, -1]*self.price_delta)

        # buy shares worth of 20% equity if prediction is up and not going long atm
        # do the opposite if the condition is reverse and is not going short
        if pred == 1 and not self.position.is_long:
            self.buy(size=.2, tp = upper, sl = lower)
        elif pred == -1 and not self.position.is_short:
            self.sell(size=.2, tp = lower, sl = upper)

        # if position has been hold for more than 2 days => set stop-loss to be more aggressive
        current_time = self.data.index[-1]
        high, low = self.data.High, self.data.Low
        for trade in self.trades:
            if current_time - trade.entry_time > pd.Timedelta('2 days'):
                if trade.is_long:
                    trade.sl = max(trade.sl, low)
                else:
                    trade.sl = min(trade.sl, high)

In [19]:
test_data = orig_data[orig_data.index > split_date]
bt = Backtest(test_data, RandomForestStrategy, commission=.0002, margin=.05)
bt.run()

Start                     2020-01-02 00:00:00
End                       2021-03-31 00:00:00
Duration                    454 days 00:00:00
Exposure Time [%]                   99.363057
Equity Final [$]                    136.71555
Equity Peak [$]                       10000.0
Return [%]                         -98.632845
Buy & Hold Return [%]               65.073598
Return (Ann.) [%]                  -96.809162
Volatility (Ann.) [%]                1.518605
Sharpe Ratio                              0.0
Sortino Ratio                             0.0
Calmar Ratio                              0.0
Max. Drawdown [%]                  -98.632845
Avg. Drawdown [%]                  -98.632845
Max. Drawdown Duration      453 days 00:00:00
Avg. Drawdown Duration      453 days 00:00:00
# Trades                                  313
Win Rate [%]                         6.070288
Best Trade [%]                       0.439954
Worst Trade [%]                     -6.767073
Avg. Trade [%]                    

In the first attempt, this model lost us almost all of our money. This is understandable because this model uses only default values for hyperparameters, which results in only about 50% accuracy. This will need a lot of fine-tuning.