# S&P500 Daily Financial Time Series Prediction
___

## Preparing Dataset
___

1. Read Dataset
2. Remove unwanted columns
3. Calculate percentage change
4. Create window
5. Define Response Variables
6. Split Dataset into training and test datasets

In [1]:
# Import libraries
import pandas as pd
import numpy as np
from datetime import timedelta
from math import floor

In [2]:
# Read the dataset
df = pd.read_csv('../Data/Raw/S&P500D.csv', parse_dates=['Date'], index_col='Date')

In [3]:
# Merge Dataset with the percentage changes of the same
df_ready = pd.merge(
    df,
    df.pct_change().dropna(), 
    left_index=True, 
    right_index=True, 
    suffixes=['', '_chg']
)

In [13]:
x = []
y = []

for d in range(len(df_ready)-306):
    x.append(df_ready.iloc[d: d+300].values) # Creates Window of size 300
    df_temp = df_ready.iloc[d+300: d+306]
    y.append(df_temp.mean()[['High', 'Low']].values) # Create response variables

In [18]:
SPLIT_RATIO = 0.9
x_train = np.array(x[:floor(len(x)*SPLIT_RATIO)])
y_train = np.array(y[:floor(len(x)*SPLIT_RATIO)])

x_test = np.array(x[floor(len(x)*SPLIT_RATIO):])
y_test = np.array(y[floor(len(x)*SPLIT_RATIO):])

In [19]:
print('Shape of the X Training Matrix : {}'.format(x_train.shape))
print('Shape of the Y Training Matrix : {}'.format(y_train.shape))
print('Shape of the X Testing Matrix : {}'.format(x_test.shape))
print('Shape of the Y Testing Matrix : {}'.format(y_test.shape))

Shape of the X Training Matrix : (3912, 300, 12)
Shape of the Y Training Matrix : (3912, 2)
Shape of the X Testing Matrix : (435, 300, 12)
Shape of the Y Testing Matrix : (435, 2)


## Modeling for prediction
___

In [20]:
from keras.models import Sequential
from keras.layers import Conv1D, MaxPool1D, Dropout, Flatten, Dense

In [21]:
model = Sequential()

model.add(Conv1D(10, 100, strides=40, activation='relu', padding='same', input_shape=(300, 12)))
model.add(MaxPool1D(5))
model.add(Dropout(0.2))

model.add(Flatten())
model.add(Dense(10, activation='relu'))

model.add(Dense(2, activation='linear'))

In [25]:
model.compile(loss='mean_squared_error', optimizer='sgd')

In [27]:
model.fit(x_train, y_train, batch_size=100, epochs=100, shuffle=True, validation_data=(x_test, y_test))

Train on 3912 samples, validate on 435 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
 500/3912 [==>...........................] - ETA: 0s - loss: nan

KeyboardInterrupt: 