# ANN Stock Trend Prediction With Large Amount of Predictor

## *This is a project 100% initiated by Max Hong Ka Ho and should solely be used for academic purpose only, any result in this passage should not constitute any investment advice*

### This project would use a large data set contributed by **Ehsan Hoseinzade, a Ph.D. student of computer science at Simon Fraser University** in which the scholar has used CNN to predict future prices. Here, I would like to borrow his data set for my own experiment as the owner of the data has offered a wide range of features for usage

### My Approach: we would like to use the neural network to approach this question. With the aid of the large data set, we are going to treat this as a binary classicfication problem where 1 indicates rises in tomorrow's price while 0 indicates a fall.

In [65]:
#Import library
import numpy as np
import pandas as pd
import tensorflow as tf

In [66]:
#read the data
data = pd.read_csv('Processed_S&P.csv')
data = data.set_index(data['Date']).iloc[:,1:]
print(data.head())

                  Close    Volume       mom  ...  wheat-F   XAG   XAU
Date                                         ...                     
2009-12-31  1115.099976       NaN       NaN  ...    -0.48  0.30  0.39
2010-01-04  1132.989990  0.921723  0.016043  ...     3.12  3.91  2.10
2010-01-05  1136.520020 -0.375903  0.003116  ...    -0.90  1.42 -0.12
2010-01-06  1137.140015  0.996234  0.000546  ...     2.62  2.25  1.77
2010-01-07  1141.689941  0.059932  0.004001  ...    -1.85  0.22 -0.58

[5 rows x 83 columns]


In [67]:
# check na values
data = data.fillna(0)
data.isna().sum()

Close             0
Volume            0
mom               0
mom1              0
mom2              0
                 ..
Dollar index-F    0
Dollar index      0
wheat-F           0
XAG               0
XAU               0
Length: 83, dtype: int64

In [68]:
#Create Return column
data['Return'] = (data['Close'] - data['Close'].shift(1)) / data['Close'].shift(1)
data['Return'].head()

Date
2009-12-31         NaN
2010-01-04    0.016043
2010-01-05    0.003116
2010-01-06    0.000546
2010-01-07    0.004001
Name: Return, dtype: float64

In [70]:
#Create Binary Column to denote the ups/downs of the S&P500
data['Trend'] = [1 if data['Return'][i] > 0 else 0 for i in range(0, len(data['Return']))]
data['Trend'] = data['Trend'].shift(-1)
data['Trend'].head(20)

Date
2009-12-31    1.0
2010-01-04    1.0
2010-01-05    1.0
2010-01-06    1.0
2010-01-07    1.0
2010-01-08    1.0
2010-01-11    0.0
2010-01-12    1.0
2010-01-13    1.0
2010-01-14    0.0
2010-01-15    1.0
2010-01-19    0.0
2010-01-20    0.0
2010-01-21    0.0
2010-01-22    1.0
2010-01-25    0.0
2010-01-26    1.0
2010-01-27    0.0
2010-01-28    0.0
2010-01-29    1.0
Name: Trend, dtype: float64

In [71]:
data.head()

Unnamed: 0_level_0,Close,Volume,mom,mom1,mom2,mom3,ROC_5,ROC_10,ROC_15,ROC_20,EMA_10,EMA_20,EMA_50,EMA_200,DTB4WK,DTB3,DTB6,DGS5,DGS10,Oil,Gold,DAAA,DBAA,GBP,JPY,CAD,CNY,AAPL,AMZN,GE,JNJ,JPM,MSFT,WFC,XOM,FCHI,FTSE,GDAXI,DJI,HSI,...,TE2,TE3,TE5,TE6,DE1,DE2,DE4,DE5,DE6,CTB3M,CTB6M,CTB1Y,Name,AUD,Brent,CAC-F,copper-F,WIT-oil,DAX-F,DJI-F,EUR,FTSE-F,gold-F,HSI-F,KOSPI-F,NASDAQ-F,GAS-F,Nikkei-F,NZD,silver-F,RUSSELL-F,S&P-F,CHF,Dollar index-F,Dollar index,wheat-F,XAG,XAU,Return,Trend
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
2009-12-31,1115.099976,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.06,0.2,2.69,3.85,0.0,0.0,5.33,6.39,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,3.79,3.65,0.02,0.16,1.06,2.54,6.19,6.33,6.35,0.0,0.0,0.0,S&P,0.35,-0.13,0.15,0.09,0.1,0.48,-1.19,-0.12,0.27,0.34,1.68,-0.07,-0.96,-2.4,0.67,0.03,0.26,-1.08,-1.0,-0.11,-0.08,-0.06,-0.48,0.3,0.39,,1.0
2010-01-04,1132.98999,0.921723,0.016043,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.08,0.18,2.65,3.85,0.02683,0.0,5.35,6.39,-0.004222,-0.004467,-0.010644,-0.001991,0.015565,-0.004609,0.02115,0.004192,0.028318,0.01542,0.012227,0.014078,0.019724,0.0,0.0,0.014951,0.0,...,3.77,3.67,0.03,0.13,1.04,2.54,6.21,6.31,6.34,-0.1,-0.04386,-0.01487,S&P,1.73,2.81,1.99,1.36,2.71,0.96,1.28,0.61,1.74,2.05,-0.52,0.54,1.51,5.6,0.31,1.52,3.26,1.61,1.62,-0.57,-0.59,-0.42,3.12,3.91,2.1,0.016043,1.0
2010-01-05,1136.52002,-0.375903,0.003116,0.016043,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.07,0.17,2.56,3.77,0.002699,0.00156,5.24,6.3,-0.007628,-0.009838,-0.001441,1.5e-05,0.001729,0.0059,0.005178,-0.011596,0.01937,0.000323,0.027452,0.003904,-0.000264,0.004036,-0.002718,-0.001128,0.020909,...,3.7,3.6,0.04,0.14,1.06,2.53,6.13,6.23,6.27,-0.055556,-0.073394,-0.033962,S&P,-0.08,0.59,-0.11,0.24,0.32,-0.14,-0.04,-0.31,0.38,0.04,2.03,-0.18,-0.08,-4.2,0.47,-0.07,1.96,-0.2,0.31,0.43,0.03,0.12,-0.9,1.42,-0.12,0.003116,1.0
2010-01-06,1137.140015,0.996234,0.000546,0.003116,0.016043,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.06,0.15,2.6,3.85,0.016883,0.006009,5.3,6.34,0.002067,0.008418,-0.007311,0.000191,-0.015906,-0.018116,-0.005151,0.008134,0.005494,-0.006137,0.001425,0.008643,0.001186,0.001358,0.00041,0.000157,0.006153,...,3.79,3.7,0.03,0.12,1.04,2.49,6.19,6.28,6.31,-0.117647,0.0,0.015625,S&P,0.91,1.61,0.15,2.41,1.72,-0.01,0.01,0.31,0.16,1.59,0.79,0.78,-0.36,6.6,0.19,0.56,2.15,-0.02,0.07,-0.56,-0.24,-0.17,2.62,2.25,1.77,0.000546,1.0
2010-01-07,1141.689941,0.059932,0.004001,0.000546,0.003116,0.016043,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.05,0.16,2.62,3.85,-0.006256,0.000221,5.31,6.33,-0.005609,0.011196,0.002035,-7.3e-05,-0.001849,-0.017013,0.05178,-0.007137,0.019809,-0.0104,0.036286,-0.003142,0.001775,-0.000597,-0.002481,0.003138,-0.006567,...,3.8,3.69,0.03,0.14,1.02,2.48,6.17,6.28,6.31,0.066667,0.019802,0.007692,S&P,-0.41,-0.46,0.15,-1.9,-0.63,-0.12,0.28,-0.66,0.06,-0.25,-0.6,-1.27,-0.05,-3.38,-0.09,-0.72,0.94,0.5,0.4,0.58,0.58,0.54,-1.85,0.22,-0.58,0.004001,1.0


In [72]:
# Create Independent and Dependent variables for our model
X = data.iloc[:, 14:-2].values[:-1]
y = data.iloc[:, -1].values[:-1]

In [73]:
print(X)

[[0.04 0.06 0.2 ... -0.48 0.3 0.39]
 [0.05 0.08 0.18 ... 3.12 3.91 2.1]
 [0.03 0.07 0.17 ... -0.9 1.42 -0.12]
 ...
 [1.03 1.21 1.34 ... 0.7 -0.71 -0.8]
 [1.04 1.22 1.35 ... -1.85 0.83 0.16]
 [1.04 1.24 1.37 ... 1.0 0.01 0.24]]


In [74]:
print(y)

[1. 1. 1. ... 1. 0. 0.]


In [75]:
# train test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

In [85]:
for i in range(len(X_train)):
  for j in range(len(X_train[0])):
    if type(X_train[i][j]) == str:
      X_train[i][j] = 0
    else:
      continue
for i in range(len(X_test)):
  for j in range(len(X_test[0])):
    if type(X_test[i][j]) == str:
      X_test[i][j] = 0
    else:
      continue

In [86]:
#feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [149]:
#Build the ANN
ann = tf.keras.models.Sequential()
ann.add(tf.keras.layers.Dense(units = 5, activation = 'relu'))
ann.add(tf.keras.layers.Dense(units = 5, activation = 'relu'))
ann.add(tf.keras.layers.Dense(units = 1, activation = 'sigmoid')) 

In [150]:
ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

In [151]:
ann.fit(X_train, y_train, batch_size = 32, epochs = 100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7f9df01be710>

In [152]:
# make predictions on test set
y_pred = ann.predict(X_test)
y_pred = (y_pred > 0.5)
np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test), 1)), 1)

array([[0., 1.],
       [1., 1.],
       [1., 0.],
       [1., 1.],
       [1., 1.],
       [0., 0.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [0., 0.],
       [0., 1.],
       [1., 1.],
       [1., 0.],
       [0., 1.],
       [1., 1.],
       [1., 1.],
       [1., 0.],
       [1., 0.],
       [1., 0.],
       [1., 1.],
       [0., 0.],
       [1., 1.],
       [1., 0.],
       [1., 0.],
       [1., 1.],
       [0., 1.],
       [1., 0.],
       [1., 1.],
       [1., 1.],
       [1., 0.],
       [1., 1.],
       [1., 1.],
       [1., 0.],
       [1., 1.],
       [0., 0.],
       [1., 1.],
       [0., 1.],
       [1., 1.],
       [1., 0.],
       [0., 0.],
       [1., 1.],
       [0., 0.],
       [1., 0.],
       [1., 0.],
       [1., 1.],
       [1., 0.],
       [0., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [0., 1.],
       [1., 1.],
       [1., 1.],
       [1., 0.],
       [1., 0.],
       [1., 0.],
       [1., 1.],
       [1., 0.

In [153]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

[[ 58 131]
 [ 54 154]]


0.5340050377833753

# Conclusion

This experiment designed in above works poorly in test sample with maximum accuracy (after tried for lots of times) only 53.4% only, but 53% is a good score for financial predictions already and people may improve the model a bit to further enhance its accuracy