In [9]:
import pandas as pd, numpy as np, matplotlib.pyplot as plt, seaborn as sns, warnings
warnings.filterwarnings('ignore')
AI_COLORS = {'primary':'#8FBC8F','gold':'#DAA520','crypto':'#FF6347','accent':'#6B8E23','highlight':'#ADFF2F','dark':'#2F4F2F','neutral':'#F0FFF0'}
kiwi_palette = [AI_COLORS['primary'], AI_COLORS['gold'], AI_COLORS['crypto'], AI_COLORS['accent'], AI_COLORS['highlight']]
plt.style.use('default')
sns.set_style("whitegrid")
sns.set_palette(kiwi_palette)
plt.rcParams.update({'figure.facecolor':AI_COLORS['neutral'],'figure.figsize':[12,8],'axes.facecolor':'white','axes.edgecolor':AI_COLORS['dark'],'axes.grid':True,'axes.titlecolor':AI_COLORS['dark'],'axes.titleweight':'bold','axes.titlesize':16,'axes.labelsize':12,'grid.color':AI_COLORS['primary'],'grid.alpha':0.3,'font.size':11,'xtick.color':AI_COLORS['dark'],'ytick.color':AI_COLORS['dark'],'legend.fontsize':10,'legend.frameon':True,'legend.facecolor':'white','legend.edgecolor':AI_COLORS['primary'],'lines.linewidth':2,'savefig.dpi':300,'savefig.bbox':'tight'})
sns.set_context("notebook", font_scale=1.1)

# The Problem

Computation and automation has absolutely taken over the trading world. As a retail trader, your best bet is to bet on what the you think the automated trades will think, not what YOU actually think will happen. This puts you at a massive disadvantage and completely changes the way you have to approach the markets as a retail trader. Thats why It would give a competitive advantage to be able to see what your own automation and computation says to get an idea of what the big player's automation will do.

The problem is how precisely could we classify single 15 minute candlesticks of the S&P 500. We will ultimately aim to project if the next 15 minute candlestick will be a reversal and to what degree. This would identify could entries for shorts and longs.

<hr>

# The Data

The quality of the data is absolutely crucial when predicting the market. Thats why instead of like last time where I built my own data set with a lot of preprocessing I opted to pay for a high quality financially sound source. Backtestmarket. This data is a snap shot of the S&P 500 every 15 minutes, giving enough data to reconstruct the candlestick you would see visually with your eyes, but in memory.

S&P 500 15m: https://www.backtestmarket.com/en/sp-500-15m

Has open high low close volume in increments of 25 cent

## Data explanation
Open: The price at which the S&P 500 opened for the 15 minute interval
Close: The price at which the S&P 500 closed for the 15 minute interval
High: The highest price during the 15 minute interval
Low: The lowest price during the 15 minute interval
Volume: The total volume of trades during the 15 minute interval

<hr>

# Preprocessing

First I want to import the data of course. But while im at it, ill go ahead and cleanup it's formating be a dataframe I like. I will also take care of the date column making it a real datetime object and a single column for the index.

In [10]:
raw_df = pd.read_csv('es-15m.csv')
raw_df.head()

Unnamed: 0,01/04/2007;17:15:00;1430;1431;1429.75;1431;592
0,01/04/2007;17:30:00;1431;1431.5;1430.75;1430.7...
1,01/04/2007;17:45:00;1430.5;1431;1430.25;1430.5...
2,01/04/2007;18:00:00;1430.5;1430.75;1430.25;143...
3,01/04/2007;18:15:00;1431;1432;1431;1431.25;561
4,01/04/2007;18:30:00;1431;1431.25;1431;1431.25;80


In [11]:
data_split = raw_df.iloc[:, 0].str.split(';', expand=True)
spx_df = pd.DataFrame({
    'Date': data_split[0],
    'Time': data_split[1],
    'Open': pd.to_numeric(data_split[2]),
    'High': pd.to_numeric(data_split[3]),
    'Low': pd.to_numeric(data_split[4]),
    'Close': pd.to_numeric(data_split[5]),
    'Volume': pd.to_numeric(data_split[6])
})

# Create datetime column for index
spx_df['DateTime'] = pd.to_datetime(spx_df['Date'] + ' ' + spx_df['Time'], format='%d/%m/%Y %H:%M:%S')
spx_df = spx_df.set_index('DateTime').sort_index()

# Drop original date/time columns
spx_df = spx_df.drop(['Date', 'Time'], axis=1)
spx_df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
DateTime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2007-04-01 17:30:00,1431.0,1431.5,1430.75,1430.75,461
2007-04-01 17:45:00,1430.5,1431.0,1430.25,1430.5,264
2007-04-01 18:00:00,1430.5,1430.75,1430.25,1430.75,91
2007-04-01 18:15:00,1431.0,1432.0,1431.0,1431.25,561
2007-04-01 18:30:00,1431.0,1431.25,1431.0,1431.25,80
