# Experiments with Machine Learning

**APPROACH:** Predict the price using Machine Learning models, then decide to go long or short.

First, import necessary libraries

In [12]:
import pandas as pd 
import yfinance as yf
import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import LSTM, Dense
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import f1_score, accuracy_score
from backtesting import Backtest

from data_prepration import get_OHLC_df, label_OHLC_df, split_train_test
from strategies.BinaryClassificationStrategy import BinaryClassificationStrategy

## Prepare data
3-year data from 2017-2019, and test with data of 2020 and the first half of 2021

In [2]:
aapl = yf.Ticker('AAPL')
orig_data = aapl.history(start='2016-12-30') # to calculate pct_change over 2 days for 01.01.2017
orig_data.shape

(1071, 7)

In [3]:
orig_data.index

DatetimeIndex(['2016-12-29', '2016-12-30', '2017-01-03', '2017-01-04',
               '2017-01-05', '2017-01-06', '2017-01-09', '2017-01-10',
               '2017-01-11', '2017-01-12',
               ...
               '2021-03-19', '2021-03-22', '2021-03-23', '2021-03-24',
               '2021-03-25', '2021-03-26', '2021-03-29', '2021-03-30',
               '2021-03-31', '2021-04-01'],
              dtype='datetime64[ns]', name='Date', length=1071, freq=None)

As can be seen from above, the data fetched from Yahoo Finance is a Dataframe, indexed and sorted by date, which is very convenient. The next step is to split the data into train and test set:

In [4]:
data = get_OHLC_df(orig_data)
data = label_OHLC_df(data, 2)
data

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2017-01-03,27.377192,27.502493,27.131317,-1.0,115127600
2017-01-04,27.389012,27.545049,27.365371,1.0,84472400
2017-01-05,27.405562,27.627796,27.379556,1.0,88774400
2017-01-06,27.608877,27.935134,27.535588,1.0,127007600
2017-01-09,27.885486,28.235385,27.883123,1.0,134247600
...,...,...,...,...,...
2021-03-26,120.349998,121.480003,118.919998,1.0,93958900
2021-03-29,121.650002,122.580002,120.730003,1.0,80819200
2021-03-30,120.110001,120.400002,118.860001,-1.0,85671900
2021-03-31,121.650002,123.519997,121.150002,1.0,118323800


In [5]:
split_date = np.datetime64('2020-01-01')
split_date

numpy.datetime64('2020-01-01')

In [6]:
X_train, X_test, y_train, y_test = split_train_test(data, split_date)

## Build models and test their performance

First, I will create a Random Forest Classifier to predict if the price will go up or down. My strategy will then decide to go long or short accordingly. For experimenting, I dedcided to create a classifier with default values.

In [7]:
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
y_pred = rfc.predict(X_test)
f1_score(y_test, y_pred)

0.7095435684647302

In [8]:
accuracy_score(y_test, y_pred)

0.5555555555555556

In [9]:
class RandomForestStrategy(BinaryClassificationStrategy):
    price_delta = .004

    def init(self):
        self.clf = rfc

    def next(self):
        row = self.data.df.iloc[-1:]
        X = row[['Open', 'High', 'Low', 'Volume']]
        pred = self.clf.predict(X)[0]

        self.decide_trade(pred)

        # if position has been hold for more than 2 days => set stop-loss to be more aggressive
        current_time = self.data.index[-1]
        high, low = self.data.High, self.data.Low
        for trade in self.trades:
            if current_time - trade.entry_time > pd.Timedelta('2 days'):
                if trade.is_long:
                    trade.sl = max(trade.sl, low)
                else:
                    trade.sl = min(trade.sl, high)

In [10]:
test_data = orig_data[orig_data.index > split_date]
bt = Backtest(test_data, RandomForestStrategy, commission=.0002, margin=.05)
bt.run()

NameError: name 'upper' is not defined

In the first attempt, this model lost us almost all of our money. This is understandable because this model uses only default values for hyperparameters, which results in only about 50% accuracy. This will need a lot of fine-tuning.
Also, the current strategy is very sensitive to price changes because even the slightest change is classified with either up or down. Therefore, if we are going long and the price experiences a small hiccup but the upward trend remains, our bot would just sell all the shares because of that hiccup.