# Predicting S&P 500 Price Movement 

### Based on major global stock market indices using Neural Network

### Description
- Finance industry has numerous information and its complexity is suitable for using neural network. 
- Instead of predicting a stock price, we are going to predict return’s direction, positive or negative. It will be treated as two classes. 

### Dataset
- 6 major stock market indices from different time zone
![](../img/Picture1.png)

- US S&P500
- Austrialia S&P ASX300
- Japan Nikkei 225
- Hongkong HangSeng 
- Frankfurt DAX
- London FTSE 100

### Motives
If financial market is perfect, all relevant information should be available to everyone immediately and price of any product or service should reflect it immediately. In other words, price is the product of all relevant information.
In reality, there are a number of limitations that prevent the market to be perfect. However, due to advanced technology and relatively low cost of transactions, stock market resembles the ideal perfect market. 
Each major stock market indices should reflect all relevant information available. Can we use stock market indices to predict another stock market index? 

### Limitations
Given such a high degree globalization in business, country specific biases of stock market indices should be neutralized in a way. 
Stock market price fluctuate continuously during the market is open. I will focus on closing price of each indices. 
Predicting specific absolute number of price would be very challenging. But predicting positive or negative movement would be simpler. 

### Data Collection
Source : finance.yahoo.com

Duration : 2012/05/05 – 2017/05/05

- S&P500  : https://finance.yahoo.com/quote/%5EGSPC/history/
- S&P AXS300 : https://au.finance.yahoo.com/quote/%5EAXJO/history?period1=1336190400&period2=1493956800&interval=1d&filter=history&frequency=1d
- HangSeng Index : https://finance.yahoo.com/quote/%5EHSI/history?period1=1336190400&period2=1493956800&interval=1d&filter=history&frequency=1d
- Nikkei 225 : https://finance.yahoo.com/quote/%5EN225/history?period1=1462482232&period2=1494018232&interval=1d&filter=history&frequency=1d
- DAX : https://finance.yahoo.com/quote/%5EGDAXI/history?period1=1336190400&period2=1493956800&interval=1d&filter=history&frequency=1d
- FTSE 100 : https://www.investing.com/indices/uk-100-historical-data


## Importing Packages

In [1]:
import pandas as pd
import os
import datetime
from datetime import date, datetime, timedelta, time
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

  'Matplotlib is building the font cache using fc-list. '


## Wragling Data

In [2]:
SNPraw=pd.read_csv("../data/S&P500.csv")
ASXraw=pd.read_csv("../data/ASX200.csv")
NIKraw=pd.read_csv("../data/Nikkei225.csv")
HSraw = pd.read_csv("../data/HangSeng.csv")
DAXraw = pd.read_csv("../data/DAX.csv")
FTSEraw=pd.read_csv("../data/FTSE100.csv")


Let's have a quick look at the data.

In [3]:
SNPraw.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2012-05-07,1368.790039,1373.910034,1363.939941,1369.579956,1369.579956,3559390000.0
1,2012-05-08,1369.160034,1369.160034,1347.75,1363.719971,1363.719971,4261670000.0
2,2012-05-09,1363.199951,1363.72998,1343.130005,1354.579956,1354.579956,4288540000.0
3,2012-05-10,1354.579956,1365.880005,1354.579956,1357.98999,1357.98999,3727990000.0
4,2012-05-11,1358.109985,1365.660034,1348.890015,1353.390015,1353.390015,3869070000.0


In [4]:
ASXraw.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2012-05-07,4375.399902,4375.399902,4299.0,4301.299805,4301.299805,0.0
1,2012-05-08,4310.299805,4327.200195,4297.399902,4314.399902,4314.399902,0.0
2,2012-05-09,4310.799805,4312.200195,4257.899902,4275.100098,4275.100098,0.0
3,2012-05-10,4269.899902,4296.100098,4269.899902,4295.600098,4295.600098,0.0
4,2012-05-11,4296.0,4302.200195,4274.399902,4285.100098,4285.100098,0.0


In [5]:
NIKraw.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2012-05-07,9198.169922,9206.450195,9109.009766,9119.139648,9119.139648,133000
1,2012-05-08,9190.5,9207.55957,9159.469727,9181.650391,9181.650391,112400
2,2012-05-09,9112.719727,9115.94043,9021.200195,9045.05957,9045.05957,131600
3,2012-05-10,9013.259766,9075.629883,8985.900391,9009.650391,9009.650391,141600
4,2012-05-11,9019.400391,9050.610352,8944.629883,8953.30957,8953.30957,143400


In [6]:
HSraw.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2012-05-07,20658.410156,20674.449219,20477.859375,20536.650391,20536.650391,2201042000.0
1,2012-05-08,20647.380859,20647.380859,20399.380859,20484.75,20484.75,1480445000.0
2,2012-05-09,20361.550781,20371.660156,20257.609375,20330.640625,20330.640625,1797879000.0
3,2012-05-10,20313.970703,20375.039063,20091.679688,20227.279297,20227.279297,1720960000.0
4,2012-05-11,20083.269531,20083.269531,19901.410156,19964.630859,19964.630859,1624695000.0


In [7]:
DAXraw.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2012-05-07,6515.939941,6578.950195,6410.029785,6569.47998,6569.47998,26200000.0
1,2012-05-08,6548.740234,6576.720215,6415.029785,6444.740234,6444.740234,28639600.0
2,2012-05-09,6483.27002,6506.209961,6375.790039,6475.310059,6475.310059,37212700.0
3,2012-05-10,6531.149902,6549.77002,6440.22998,6518.0,6518.0,37107900.0
4,2012-05-11,6465.189941,6589.009766,6454.209961,6579.930176,6579.930176,27864600.0


In [8]:
FTSEraw.head()

Unnamed: 0,Date,Price,Open,High,Low,Volume,Change(%)
0,5-May-17,7297.43,7248.1,7297.43,7222.81,-,0.68%
1,4-May-17,7248.1,7234.53,7280.7,7226.07,947.13M,0.19%
2,3-May-17,7234.53,7250.05,7250.05,7218.56,790.04M,-0.21%
3,2-May-17,7250.05,7203.94,7254.32,7203.94,939.01M,0.64%
4,28-Apr-17,7203.94,7237.17,7243.31,7197.28,1.18B,-0.46%


Quick glance at the data already suggest a couple of data cleaning challenges

## 1. Data type of Date
### -string to datetime

In [9]:
All=[SNPraw,ASXraw,NIKraw,HSraw,DAXraw,FTSEraw]

In [10]:
for i in All:
    print(i.Date.describe(),'\n')

count           1257
unique          1257
top       2016-10-18
freq               1
Name: Date, dtype: object 

count           1267
unique          1267
top       2016-10-18
freq               1
Name: Date, dtype: object 

count           1226
unique          1226
top       2016-10-18
freq               1
Name: Date, dtype: object 

count           1233
unique          1233
top       2016-10-18
freq               1
Name: Date, dtype: object 

count           1265
unique          1265
top       2016-10-18
freq               1
Name: Date, dtype: object 

count         1263
unique        1263
top       9-Oct-12
freq             1
Name: Date, dtype: object 



All data under Date is not a true datetime type but simple strings. If I leave it as they are, it is impossible to merge each dataframe based on dates. Using datetime.strptime I will convert the Date data.

In [11]:
SNPraw.iloc[:,0]=[datetime.strptime(i,'%Y-%m-%d') for i in SNPraw.iloc[:,0]]
ASXraw.iloc[:,0]=[datetime.strptime(i,'%Y-%m-%d') for i in ASXraw.iloc[:,0]]
NIKraw.iloc[:,0]=[datetime.strptime(i,'%Y-%m-%d') for i in NIKraw.iloc[:,0]]
HSraw.iloc[:,0]=[datetime.strptime(i,'%Y-%m-%d') for i in HSraw.iloc[:,0]]
DAXraw.iloc[:,0]=[datetime.strptime(i,'%Y-%m-%d') for i in DAXraw.iloc[:,0]]
FTSEraw.iloc[:,0] = [datetime.strptime(i,'%d-%b-%y') for i in FTSEraw.iloc[:,0]]

Let's look at the data again.

In [12]:
for i in All:
    print(i.Date.describe(),'\n')
    print(i.Date.dtypes,'\n','\n')


count                    1257
unique                   1257
top       2016-09-08 00:00:00
freq                        1
first     2012-05-07 00:00:00
last      2017-05-04 00:00:00
Name: Date, dtype: object 

datetime64[ns] 
 

count                    1267
unique                   1267
top       2016-09-08 00:00:00
freq                        1
first     2012-05-07 00:00:00
last      2017-05-05 00:00:00
Name: Date, dtype: object 

datetime64[ns] 
 

count                    1226
unique                   1226
top       2016-09-08 00:00:00
freq                        1
first     2012-05-07 00:00:00
last      2017-05-02 00:00:00
Name: Date, dtype: object 

datetime64[ns] 
 

count                    1233
unique                   1233
top       2016-09-08 00:00:00
freq                        1
first     2012-05-07 00:00:00
last      2017-05-05 00:00:00
Name: Date, dtype: object 

datetime64[ns] 
 

count                    1265
unique                   1265
top       2016-09-08 00:00:00
fr

## 2. Price data type
### - string to float

The price data is not exactly integer or float that can be used. I will convert them from string to float.

Before diving into 6 different dataframes, I want to focus on the FTSEraw dataframe.

In [13]:
FTSEraw.head()

Unnamed: 0,Date,Price,Open,High,Low,Volume,Change(%)
0,2017-05-05,7297.43,7248.1,7297.43,7222.81,-,0.68%
1,2017-05-04,7248.1,7234.53,7280.7,7226.07,947.13M,0.19%
2,2017-05-03,7234.53,7250.05,7250.05,7218.56,790.04M,-0.21%
3,2017-05-02,7250.05,7203.94,7254.32,7203.94,939.01M,0.64%
4,2017-04-28,7203.94,7237.17,7243.31,7197.28,1.18B,-0.46%


The Volume column data is string and contains M or B. I need to convert it to actual numberic value.

In [17]:
#A custom function that changes string with M or B to decimal digits

from decimal import Decimal
d = {
        'M': 6,
        'B': 9
}
def text_to_num(text):
        if text[-1] in d:
            num, magnitude = text[:-1], text[-1]
            return Decimal(num) * 10 ** d[magnitude]
        else:
            return text

*https://stackoverflow.com/questions/11896560/how-can-i-consistently-convert-strings-like-3-71b-and-4m-to-numbers-in-pytho/11896814*

In [18]:
for i in range(0,FTSEraw.shape[0]):
    if FTSEraw.iloc[i,5] !='-':
        FTSEraw.iloc[i,5] = float(text_to_num(FTSEraw.iloc[i,5]))

In [27]:
FTSEraw.head()

Unnamed: 0,Date,Price,Open,High,Low,Volume,Change(%)
0,2017-05-05,7297.43,7248.1,7297.43,7222.81,-,0.68%
1,2017-05-04,7248.1,7234.53,7280.7,7226.07,9.4713e+08,0.19%
2,2017-05-03,7234.53,7250.05,7250.05,7218.56,7.9004e+08,-0.21%
3,2017-05-02,7250.05,7203.94,7254.32,7203.94,9.3901e+08,0.64%
4,2017-04-28,7203.94,7237.17,7243.31,7197.28,1.18e+09,-0.46%


For other price information, we see thousands seperator. Would it matter if we want to convert it to numeric data?

In [31]:
pd.to_numeric(FTSEraw.Price)

ValueError: Unable to parse string "7,297.43" at position 0

Initially I replaced ',' to '' to enable to_numeric. 

In [32]:
FTSEraw.iloc[:,1].str.replace(',','').head()

0    7297.43
1    7248.10
2    7234.53
3    7250.05
4    7203.94
Name: Price, dtype: object

However there is a smater way to do the same. 

*https://stackoverflow.com/questions/1779288/how-to-convert-a-string-to-a-number-if-it-has-commas-in-it-as-thousands-separato*

In [39]:
import locale
locale.setlocale( locale.LC_ALL, 'en_US.UTF-8' ) 
type(locale.atof("7,297.43"))

float

So let's convert all data to float.

In [44]:
FTSEraw.iloc[:,1:5] = FTSEraw.iloc[:,1:5].applymap(locale.atof)
FTSEraw.head()

Unnamed: 0,Date,Price,Open,High,Low,Volume,Change(%)
0,2017-05-05,7297.43,7248.1,7297.43,7222.81,-,0.68%
1,2017-05-04,7248.1,7234.53,7280.7,7226.07,9.4713e+08,0.19%
2,2017-05-03,7234.53,7250.05,7250.05,7218.56,7.9004e+08,-0.21%
3,2017-05-02,7250.05,7203.94,7254.32,7203.94,9.3901e+08,0.64%
4,2017-04-28,7203.94,7237.17,7243.31,7197.28,1.18e+09,-0.46%


In [49]:
FTSEraw.describe()

Unnamed: 0,Price,Open,High,Low
count,1263.0,1263.0,1263.0,1263.0
mean,6493.478409,6492.175439,6528.278353,6455.791314
std,446.704634,446.750822,441.99023,450.780939
min,5260.19,5260.19,5324.36,5229.76
25%,6195.005,6193.42,6241.09,6146.24
50%,6568.35,6568.33,6598.39,6535.18
75%,6805.2,6804.405,6830.895,6771.225
max,7429.81,7429.81,7447.0,7402.64


Other dataframes should be converted to data type that can be investigated.

In [54]:
for i in All[:-1]:
    i.iloc[:,1:]=i.iloc[:,1:].applymap(pd.to_numeric)
    print(i.describe())

              Open         High          Low        Close    Adj Close  \
count  1256.000000  1256.000000  1256.000000  1256.000000  1256.000000   
mean   1888.432858  1896.895424  1879.482509  1889.030787  1889.030787   
std     281.692212   281.836013   281.240488   281.569206   281.569206   
min    1277.819946  1282.550049  1266.739990  1278.040039  1278.040039   
25%    1673.679993  1680.044952  1662.222534  1674.192505  1674.192505   
50%    1963.600037  1974.580017  1953.875000  1964.750000  1964.750000   
75%    2092.167419  2100.924927  2083.075012  2091.882446  2091.882446   
max    2394.750000  2400.979980  2386.780029  2395.959961  2395.959961   

             Volume  
count  1.256000e+03  
mean   3.557088e+09  
std    6.814312e+08  
min    5.362000e+08  
25%    3.154670e+09  
50%    3.483950e+09  
75%    3.855888e+09  
max    7.597450e+09  
              Open         High          Low        Close    Adj Close  \
count  1265.000000  1265.000000  1265.000000  1265.000000  12

## 3. Merging dataframes into one

In [55]:
#Creating a range of dates from 2012-05-05 to 2017-05-05
def perdelta(start, end, delta):
    curr = start
    while curr < end:
        yield curr
        curr += delta
datelist=[]
for result in perdelta(date(2012, 5, 8), date(2017, 5, 5), timedelta(days=1)):
    datelist.append(result)

Clean the data so that all shares the same type of values. - date, price, openprice, high price and low price. I will try to incorporate these.
It is also important to fill in datas for missing dates. I will use forward data filling. Such as if May 3rd data is empty, May 2nd data will fill in the balnk. If May 2nd data is also NA, May 1st data will be used. 

In [None]:
#Deleting unnecessary columns and changing column names 
del SNPraw['Adj Close']
del ASXraw['Adj Close']
del NIKraw['Adj Close']
del HSraw['Adj Close']
del DAXraw['Adj Close']
del FTSEraw['Change(%)']
#Changing a single column name of FTSEraw
FTSEraw.rename(columns={'Price':'Close'}, inplace=True)
FTSEraw.rename(columns={FTSEraw.columns[0]:'Date'}, inplace=True)

In [None]:
close_data = pd.DataFrame({"Date":datelist})
HL_data = pd.DataFrame({"Date":datelist})
Vol_data =pd.DataFrame({"Date":datelist})

close_data["SNPclose"] = ""
close_data["ASXclose"]= ""
close_data["NIKclose"]= ""
close_data["HSclose"]= ""
close_data["DAXclose"]= ""
close_data["FTSEclose"]= ""

HL_data["SNPHigh"] = ""
HL_data["SNPLow"] = ""
HL_data["ASXHigh"]= ""
HL_data["ASXLow"]= ""
HL_data["NIKHigh"]= ""
HL_data["NIKLow"]= ""
HL_data["HSHigh"]= ""
HL_data["HSLow"]= ""
HL_data["DAXHigh"]= ""
HL_data["DAXLow"]= ""
HL_data["FTSEHigh"]= ""
HL_data["FTSELow"]= ""

Vol_data["SNPVol"] = ""
Vol_data["ASXVol"]= ""
Vol_data["NIKVol"]= ""
Vol_data["HSVol"]= ""
Vol_data["DAXVol"]= ""
Vol_data["FTSEVol"]= ""

In [None]:
counter=0
for i in close_data.iloc[:,0]:
    tmpidx=SNPraw.Date[SNPraw.Date==i].index.tolist()
    if len(tmpidx)!=0:
        close_data.iloc[counter,1]=SNPraw.iloc[tmpidx[0],4]
        HL_data.iloc[counter,1]=SNPraw.iloc[tmpidx[0],2]
        HL_data.iloc[counter,2]=SNPraw.iloc[tmpidx[0],3]
        Vol_data.iloc[counter,1]=SNPraw.iloc[tmpidx[0],5]
    counter+=1

In [None]:
counter=0
for i in close_data.iloc[:,0]:
    tmpidx=ASXraw.Date[ASXraw.Date==i].index.tolist()
    if len(tmpidx)!=0:
        close_data.iloc[counter,2]=ASXraw.iloc[tmpidx[0],4]
        HL_data.iloc[counter,3]=ASXraw.iloc[tmpidx[0],2]
        HL_data.iloc[counter,4]=ASXraw.iloc[tmpidx[0],3]
        Vol_data.iloc[counter,2]=ASXraw.iloc[tmpidx[0],5]
    counter+=1

In [None]:
counter=0
for i in close_data.iloc[:,0]:
    tmpidx=NIKraw.Date[NIKraw.Date==i].index.tolist()
    if len(tmpidx)!=0:
        close_data.iloc[counter,3]=NIKraw.iloc[tmpidx[0],4]
        HL_data.iloc[counter,5]=NIKraw.iloc[tmpidx[0],2]
        HL_data.iloc[counter,6]=NIKraw.iloc[tmpidx[0],3]
        Vol_data.iloc[counter,3]=NIKraw.iloc[tmpidx[0],5]
    counter+=1

In [None]:
counter=0
for i in close_data.iloc[:,0]:
    tmpidx=HSraw.Date[HSraw.Date==i].index.tolist()
    if len(tmpidx)!=0:
        close_data.iloc[counter,4]=HSraw.iloc[tmpidx[0],4]
        HL_data.iloc[counter,7]=HSraw.iloc[tmpidx[0],2]
        HL_data.iloc[counter,8]=HSraw.iloc[tmpidx[0],3]
        Vol_data.iloc[counter,4]=HSraw.iloc[tmpidx[0],5]
    counter+=1

In [None]:
counter=0
for i in close_data.iloc[:,0]:
    tmpidx=DAXraw.Date[DAXraw.Date==i].index.tolist()
    if len(tmpidx)!=0:
        close_data.iloc[counter,5]=DAXraw.iloc[tmpidx[0],4]
        HL_data.iloc[counter,9]=DAXraw.iloc[tmpidx[0],2]
        HL_data.iloc[counter,10]=DAXraw.iloc[tmpidx[0],3]
        Vol_data.iloc[counter,5]=DAXraw.iloc[tmpidx[0],5]
    counter+=1
    

In [None]:
counter=0
for i in close_data.iloc[:,0]:
    tmpidx=FTSEraw.Date[FTSEraw.Date==i].index.tolist()
    if len(tmpidx)!=0:
        close_data.iloc[counter,6]=FTSEraw.iloc[tmpidx[0],1]
        HL_data.iloc[counter,11]=FTSEraw.iloc[tmpidx[0],3]
        HL_data.iloc[counter,12]=FTSEraw.iloc[tmpidx[0],4]
        Vol_data.iloc[counter,6]=FTSEraw.iloc[tmpidx[0],5]
    counter+=1

In [None]:
close_data=close_data.replace('',np.nan, regex=True)
close_data=close_data.fillna(method="ffill")

HL_data=HL_data.replace('',np.nan, regex=True)
HL_data=HL_data.fillna(method="ffill")

Vol_data=Vol_data.replace('',np.nan, regex=True)
Vol_data=Vol_data.fillna(method="ffill")

In [None]:
#Now all data is cleaned and converted to properly converted to float types.

In [None]:
plt.plot(close_data.iloc[:,0],close_data.iloc[:,1:7])
plt.legend(close_data.columns[1:7])
plt.show()

In [None]:
plt.plot(Vol_data.iloc[:,0],Vol_data.iloc[:,1:])
plt.legend(Vol_data.columns[1:])
plt.show()

In [None]:
plt.plot(HL_data.iloc[:,0],HL_data.iloc[:,1:])
plt.legend(HL_data.columns[1:])
plt.show()

In [None]:
close_return=pd.DataFrame({"SNP" : close_data['SNPclose']/close_data['SNPclose'].shift(),
                           "ASX" :close_data['ASXclose']/close_data['ASXclose'].shift() ,
                           "NIK" :close_data['NIKclose']/close_data['NIKclose'].shift(),  
                           "HS": close_data['HSclose']/close_data['HSclose'].shift(), 
                           "DAX": close_data['DAXclose']/close_data['DAXclose'].shift(), 
                           "FTSE" :close_data['FTSEclose']/close_data['FTSEclose'].shift()})

In [None]:
close_return =close_return - 1

In [None]:
HL_volatility = pd.DataFrame({"SNP" : 
                              (HL_data.iloc[:,1] - np.mean(HL_data.iloc[:,1:3].transpose()))/
                              np.mean(HL_data.iloc[:,1:3].transpose()),
                              "ASX" : 
                              (HL_data.iloc[:,3] - np.mean(HL_data.iloc[:,3:5].transpose()))/
                              np.mean(HL_data.iloc[:,3:5].transpose()),
                              "NIK" : 
                              (HL_data.iloc[:,5] - np.mean(HL_data.iloc[:,5:7].transpose()))/
                              np.mean(HL_data.iloc[:,5:7].transpose()),
                              "HS" : 
                              (HL_data.iloc[:,7] - np.mean(HL_data.iloc[:,7:9].transpose()))/
                              np.mean(HL_data.iloc[:,7:9].transpose()),
                              "DAX" : 
                              (HL_data.iloc[:,9] - np.mean(HL_data.iloc[:,9:11].transpose()))/
                              np.mean(HL_data.iloc[:,9:11].transpose()),
                              "FTSE" : 
                              (HL_data.iloc[:,11] - np.mean(HL_data.iloc[:,11:13].transpose()))/
                              np.mean(HL_data.iloc[:,11:13].transpose())})

In [None]:
close_return.corr().iloc[:,5]

It is clear that there is strong correlation between returns of other indices. 

The Tensorflow model here will solve a classification model, positive return or negative return. To do this, the output prediction should be in a form called 'One hot encoding'. It means that each class choice will be an entry in an array. If the SNP return is possitive, array should be [1,0]. If it is negative, it should be [0,1]. The model prediction will not likely look like the examples before but more like a probability. Using argmax, the returned index value will be able to tell us how classification has been 'predicted', i.e. positive if 0, negative if 1.

In [None]:
Final_data = pd.DataFrame({'SNPpositive' : (close_return.SNP>=0)*1,
                           'SNPnegative' : (close_return.SNP<0)*1,
                           'ASXreturn' : close_return.ASX,
                           'NIKreturn' : close_return.NIK,
                           'HSreturn' : close_return.HS,
                           'DAXreturn' : close_return.DAX,
                           'FTSEreturn' : close_return.FTSE,
                           'SNPvol' : HL_volatility.SNP,
                           'ASXvol' : HL_volatility.ASX,
                           'NIKvol' : HL_volatility.NIK,
                           'HSvol' : HL_volatility.HS,
                           'DAXvol' : HL_volatility.DAX,
                           'FTSEvol' : HL_volatility.FTSE})

In [None]:
#Droping the first row since it has Nan values
Final_data = Final_data.drop(Final_data.index[0])

In [None]:
#Normalizing the data
Final_data = (Final_data - np.min(Final_data))/(np.max(Final_data)-np.min(Final_data))

In [None]:
#Seperating the data into training and test dataset. 80% will be training and 20% will be test
traininglegth = int(Final_data.shape[0] * 0.8)
training_data = Final_data.iloc[:traininglegth,:]
test_data = Final_data.iloc[traininglegth: , :]

In [None]:
training_y = training_data[['SNPpositive','SNPnegative']]
training_x = training_data.drop(['SNPpositive','SNPnegative'],axis=1)

In [None]:
test_y = test_data[['SNPpositive','SNPnegative']]
test_x = test_data.drop(['SNPpositive','SNPnegative'],axis=1)

In [None]:
num_predictor = training_x.shape[1]
num_class = training_y.shape[1]

In [None]:
graph = tf.Graph()
with graph.as_default():
    with tf.name_scope("Input"):
        x = tf.placeholder(shape=[None,num_predictor], dtype=tf.float32, name='x')
        y = tf.placeholder(shape=[None,num_class], dtype=tf.float32, name='y')
    
    with tf.name_scope("Simple_Model"):
        with tf.name_scope("Variables"):
            w = tf.Variable(tf.truncated_normal([num_predictor, num_class], stddev=0.1), name="W")
            b = tf.Variable(tf.truncated_normal([num_class], stddev=0.1), name="b")

        model = tf.nn.softmax(tf.matmul(x, w) + b)
        
        with tf.name_scope("Result"):
            cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=model, labels=y), name="cost")
            correct_prediction = tf.equal(tf.argmax(model,1),tf.argmax(y,1))
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        
        with tf.name_scope("Train"):
            training = tf.train.AdamOptimizer(0.1).minimize(cost)
    
    with tf.name_scope("Two_Layer"):
        node_1 = 5
        node_2 = 3
        with tf.name_scope("Variables2"):
            w1 = tf.Variable(tf.truncated_normal([num_predictor, node_1], stddev=0.1), name="W1")
            b1 = tf.Variable(tf.truncated_normal([node_1], stddev=0.1), name="b1")
            w2 = tf.Variable(tf.truncated_normal([node_1, node_2], stddev=0.1), name ="W2")
            b2 = tf.Variable(tf.truncated_normal([node_2], stddev=0.1), name="b2")
            w3 = tf.Variable(tf.truncated_normal([node_2, num_class], stddev=0.1), name="W3")
            b3 = tf.Variable(tf.truncated_normal([num_class], stddev=0.1), name="b3")
        
        with tf.name_scope("Layer1"):
            hidden_layer_1 = tf.nn.relu(tf.matmul(x, w1) + b1)

        with tf.name_scope("Layer2"):
            hidden_layer_2 = tf.nn.relu(tf.matmul(hidden_layer_1, w2) + b2)
        
        model2 = tf.nn.softmax(tf.matmul(hidden_layer_2, w3) + b3)
        
        with tf.name_scope("Result2"):
            cost2 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=model2, labels=y), name="cost2")
            correct_prediction2 = tf.equal(tf.argmax(model2,1),tf.argmax(y,1))
            accuracy2 = tf.reduce_mean(tf.cast(correct_prediction2, tf.float32))  
            
        with tf.name_scope("Train2"):
            training2 = tf.train.AdamOptimizer(0.005).minimize(cost2)
            
    with tf.name_scope("Summary"):
        tf.summary.scalar("Accuracy_1", accuracy)
        tf.summary.scalar("Accuracy_2", accuracy2)
        
    with tf.name_scope("global_ops"):
        init = tf.global_variables_initializer()
        # Collect all summary Ops in graph
        summ = tf.summary.merge_all()

In [None]:
sess = tf.Session(graph=graph)
writer = tf.summary.FileWriter('./FinalGraph', graph=graph)
sess.run(init)

In [None]:
starting_time = datetime.now()
for i in range(1, 100000):
    _,__, _accuracy, _accuracy2, _cost, _cost2, _summ = sess.run([training, training2, accuracy, accuracy2, cost, cost2, summ],
                                                                feed_dict={x: training_x.values,
                                                                          y: training_y.values.reshape(len(training_y.values),2)})
    if i%100 == 0:
        writer.add_summary(_summ, i)
    if i%1000 == 0:
        print(_accuracy)
        print(_accuracy2)
        print("Processing... %sth loop" %i)

Processing_time = datetime.now() - starting_time

In [None]:
print("Final Accuracy of Simple model %s" %_accuracy)

In [None]:
print("Final Accuracy of Two Layer model %s" %_accuracy2)

In [None]:
_Predict1,_Predict2 = sess.run([accuracy,accuracy2], feed_dict={x: test_x.values,
                                                                y: test_y.values.reshape(len(test_y.values),2)})

In [None]:
print("Using Simple Model, test accuracy is %s" %_Predict1)

In [None]:
print("Usint Two Hidden Layers, test accuracy is %s" %_Predict2)

In [None]:
print("Total time %s" %Processing_time)

In [None]:
Final_data.to_csv("./finaldata.csv")