Experiment 0 - Simple MLP

The purpose of this experiment is to ensure everything works as intended. This notebook can be modified for future experiments to use different model architectures.

As this is an MLP model, the idea behind this model will be looking at OHLC data patterns and making a judgement on them alone. Therefore I think the data can be shuffled and should be so that the model doesn't become biased due to specific market conditions and only looks at the overaching repeating patterns,  if there are any.

This will be a regression model, finding 2 values: the next two turning points. In a high level sense this will inform us of which direction a trade should be made and the stop/loss required.  


TODO:
- Structure data
    - Data will be gathered as windows of x candles, the labels will be the next two turning points, using the low value for local minima and the high value for local maxima
- Normalise
    - Normalising the data will be difficult. As price changes vary a lot I guess what we actually need is a % change between each value.  
    - OK get % change between each open, then the % change of the high Low and close in relation to the open volume can stay
- Shuffle & get train/test split 
- Define model
- Train
- Evaluate

- Compare how the model fairs when the data is just normalised (without converting to percentage changes first)

In [1]:
import tensorflow as tf
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
from pandas import read_csv

srcFile = "./Data/GBPUSD_M1.csv"

rawData = read_csv(srcFile, usecols=(1, 2, 3, 4, 5), engine='c', sep="\t",
    dtype={
        "Open": np.float32,
        "High": np.float32,
        "Low": np.float32,
        "Close": np.float32,
        "Volume": np.uint32 #16 is defienitely enough for 1-min charts, and probably for 15 min charts but any longer wont suffice, also the rest are 32 bit dtypes so may as well
    })
rawData = rawData.to_numpy()

In [2]:
rawData[1, :]

array([1.35341001, 1.35344005, 1.35339999, 1.35344005, 2.        ])

In [9]:
#Structuring the data

def isTurningPoint(window):
    if window[0, 2] >= window[1, 2] >=  window[2, 2]  <= window[3, 2] <= window[4, 2]:
        return -1
    elif window[0, 1] <= window[1, 1] <=  window[2, 1]  >= window[3, 1] >= window[4, 1]:
        return 1
    else:
        return 0

dataUsedForInferenceWindowSize = 5
tpWindowSize = 5
rows, columns = rawData.shape
numOfSamples = rows-dataUsedForInferenceWindowSize+1
x = np.ndarray((numOfSamples, columns, dataUsedForInferenceWindowSize))
y = np.ndarray(numOfSamples)
#This np array will be used to store the open as compared to the previous value, as well as the high, low and close values as compared to the open and finally the volume as is
percentageChangeData = np.ndarray((rows-1, columns))
for idx, row in enumerate(rawData):
    if idx < 1: continue
    percentageChangeSincePreviousOpen = row[0] / rawData[idx-1, 0]
    percentageChangeData[idx-1] = [percentageChangeSincePreviousOpen, row[1]/row[0], row[2]/row[0], row[3]/row[0], row[4]]

scaler = MinMaxScaler((-1, 1))
normalisedData = scaler.fit_transform(percentageChangeData)

for idx, row in enumerate(rawData):
    if idx < dataUsedForInferenceWindowSize+1: continue

    x[idx-dataUsedForInferenceWindowSize-1] = normalisedData[idx - dataUsedForInferenceWindowSize-1: idx-1, :]
    y[idx-dataUsedForInferenceWindowSize-1] = isTurningPoint(rawData[idx - tpWindowSize//2: idx + tpWindowSize//2+1, :])


In [13]:
#Splitting the data into training and testing sets

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3)
x_val, x_test, y_val, y_test = train_test_split(x_test, y_test, test_size=0.5) 

In [16]:
#Define Model

-1.0