<a href="https://colab.research.google.com/github/adugnaNecho/ArduinoTensorFlowLiteTutorials/blob/master/FMG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

* The object of the project is to predict the direction (up or down) of movement of a stock index (S&P500) over a single trading day period, given returns from the previous 10 year period.
* Data is provided in a csv file and partially pre-processed. It is for a period of about 10 years, 2005-20015, but 
the dates are not given.
* The stocks and indices are in the form of daily log_returns. The three currencies are in the form 
of daily changes.
* Find out what the ticker symbols represent.
* The data are from successive trading days starting from the earliest date.
* You should use both ANNs and SVMs and compare.
* DO NOT EXPECT GOOD OR EVEN REASONABLY GOOD RESULTS - this is a very difficult problem to do well.
* Evaluate the success rates using whatever metrics you can and comment on them.
* Use numpy and pandas for the data. (pd.read_csv). You might find the numpy method 'np.where(condition,-1,1)' useful.
* The data file has been uploaded to the website (SPPrediction_Project.csv).

In [None]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
from pandas import DataFrame, read_csv 

## Importing data (SPPrediction_Project.csv)

In [None]:
d = read_csv('SPPrediction_Project.csv', index_col=0)    #Assigning the csv file
d.head()

Unnamed: 0,GSPC,XOM,GE,MSFT,PG,JNJ,DJI,IXIC,HIS,FCHI,FTSE,GDAXI,JPY,GBP,CAD
1,0.013491,-0.008643,0.020911,0.038266,0.017212,0.003772,0.013725,0.019385,0.0,0.012764,0.013314,0.000295,-0.87,0.000474,-0.012
2,0.021899,0.039154,0.019289,0.031909,0.011537,0.011976,0.024797,0.030443,0.011656,0.022139,0.031488,0.035472,1.84,0.00011,-0.0044
3,-0.002053,-0.001671,-0.016858,-0.004971,-0.019074,-0.00972,-0.000756,0.002251,0.037655,-0.015049,-0.007771,-0.022368,-2.07,0.001392,0.0023
4,0.004212,-0.00587,-0.005479,-0.004161,-0.018744,0.000751,0.011007,0.007845,0.03639,0.00348,0.007511,0.012996,-0.05,0.002247,-0.0003
5,-0.00883,-0.042084,-0.026602,-0.015973,-0.021963,-0.03748,-0.002433,0.016994,-0.010717,-0.010304,-0.01017,-0.022928,-2.28,0.005447,-0.0065


## Replacing missing data with their respective means

In [None]:
#a for loop to replace the missing data
for i in d.ix[:]:      
    g = np.where(d == 0, np.mean(d.ix[:]), d)    
    q = [r for r in range(1,(len(d)+1))]
    

#displaying the new dataset
mdata = DataFrame(g,q,columns= ['GSPC', 'XOM', 'GE', 'MSFT', 'PG', 'JNJ', 'DJI', 'IXIC', 'HIS', 'FCHI', 'FTSE', 'GDAXI', ' JPY', 'GBP', 'CAD'])
mdata.tail()

Unnamed: 0,GSPC,XOM,GE,MSFT,PG,JNJ,DJI,IXIC,HIS,FCHI,FTSE,GDAXI,JPY,GBP,CAD
2341,-0.002819,-0.008934,0.00691,-0.008618,-0.001113,0.006666,-0.002954,-0.007345,-0.000644,-0.008372,-0.005751,-0.008137,0.06,-0.001137,-0.0021
2342,0.005826,0.007793,0.014394,0.022112,0.009972,0.008615,0.000373,0.01383,-0.011952,0.01388,0.005557,0.012434,0.07,-2.1e-05,-0.0031
2343,-0.029805,-0.013678,-0.053571,-0.021391,-0.019724,-0.015748,-0.030813,-0.033836,-0.000644,-0.02457,-0.026449,-0.024202,-1.17,-0.004548,0.0027
2344,-0.003724,0.001619,-0.010229,-0.004334,-0.02868,-0.010796,-0.009378,-0.002476,-0.000644,-0.006544,0.002136,-0.005852,-0.51,-0.001364,-0.0018
2345,0.001266,0.018091,0.016242,-0.004353,0.005276,0.01204,0.000308,-0.009829,-0.000644,0.008499,0.017193,-0.000559,-0.18,-2.1e-05,0.0073


## Ticker Symbols Representation

In [None]:
L = ['GSPC', 'XOM', 'GE', 'MSFT', 'PG', 'JNJ', 'DJI', 'IXIC', 'HIS', 'FCHI', 'FTSE', 'GDAXI', ' JPY', 'GBP', 'CAD']
M = ['Glasgow Solicitors & Property Centre', 'Exxon Mobil Corperation', 'General Electric Company', 'Microsoft Corperation',
     'Procter of Gamble','Johnson & Johnson', 'Dow Jones Industrial', ' NASDAQ Composite','H.I.S. Company Ltd', 
     'First Choice Home Inspections','Financial Times Stock Exchange', 'German Deutscher Aktienindex', 'Japanese Yen', 
     'Great British Pounds', 'Canadian Dollar']
h = [b for b in zip(L,M)]
j = [r for r in range(1,16)]
frame = DataFrame(h, j, columns=['Ticker symbols', 'Ticker symbols meaning'])
frame

Unnamed: 0,Ticker symbols,Ticker symbols meaning
1,GSPC,Glasgow Solicitors & Property Centre
2,XOM,Exxon Mobil Corperation
3,GE,General Electric Company
4,MSFT,Microsoft Corperation
5,PG,Procter of Gamble
6,JNJ,Johnson & Johnson
7,DJI,Dow Jones Industrial
8,IXIC,NASDAQ Composite
9,HIS,H.I.S. Company Ltd
10,FCHI,First Choice Home Inspections


# Machine learning for the stocks and indices

## Assigning Training datasets and Target datasets

In [None]:
#assigning X to be training data
X=mdata.ix[:,['XOM', 'GE', 'MSFT', 'PG', 'JNJ', 'DJI', 'IXIC', 'HIS', 'FCHI', 'FTSE', 'GDAXI']].values

#assigning y to be target data
y=mdata.ix[:,'GSPC'].values

X[1:,:]=X[1:,:]-X[0:-1,:]        #difference of todays stock returns and yesterdays (training variables)
X[0,:]=0.000001                  # replacing first row with 0
y[1:]=y[1:]-y[0:-1]              # difference between todays stock return and yesterdays (target variable)

y[0]=.000001                     # replacing first row with 0
y = np.where(y >= 0.0, 1, -1)    #replacing negative values with -1 and positive values with 1

X_del_1=X[0:-1,:]                #excludes the last row
y_del_1=y[1:]                    #excludes the first row

In [None]:
#Coverting X, y into a list
X=np.ndarray.tolist(X)
y=np.ndarray.tolist(y)

X_del_1=np.ndarray.tolist(X_del_1)
y_del_1=np.ndarray.tolist(y_del_1)

## Artificial Neural Network

In [None]:
#allocating data set to training and testing
from sklearn.model_selection import train_test_split


#splitting dataset; 70% for training and 30% for testing
X_train, X_test, y_train, y_test = train_test_split(
         X, y, test_size=0.3, random_state=0)
X_train_del_1, X_test_del_1, y_train_del_1, y_test_del_1 = train_test_split(
         X_del_1, y_del_1, test_size=0.3, random_state=0)

In [None]:
#Standardising the dataset
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
sc.fit(X_train_del_1)

sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

X_train_std_del_1 = sc.transform(X_train_del_1)
X_test_std_del_1 = sc.transform(X_test_del_1)

### Classifier

In [None]:
###----MLPClassifier----###
from sklearn.neural_network import MLPClassifier

#setting values for hidden layers, learning rates and maximum iteration
mlp = MLPClassifier(hidden_layer_sizes=(1000,1000,100,), max_iter=10, alpha=1e-5,
                    solver='sgd', verbose=10, tol=1e-6, random_state=14,
                    learning_rate_init=.1)

mlp.fit(X_train_std, y_train)                                       #fitting the dataset
y_pred=mlp.predict(X_test)                                          #predicting the training dataset
print('Accuracy: %.2f' % accuracy_score(y_test, y_pred))            #displaying the accuracy score
print("Training set score: %f" % mlp.score(X_train_std, y_train))   #displaying the training score
print("Test set score: %f" % mlp.score(X_test_std, y_test))         #displaying the testing score
mlp.predict(X_test)[1:100]                                          #predicting training dataset

Iteration 1, loss = 0.50929235
Iteration 2, loss = 0.18583861
Iteration 3, loss = 0.12826544
Iteration 4, loss = 0.11054124
Iteration 5, loss = 0.11923427
Iteration 6, loss = 0.09476021
Iteration 7, loss = 0.09028181
Iteration 8, loss = 0.08512183
Iteration 9, loss = 0.08040144
Iteration 10, loss = 0.08040714


NameError: name 'accuracy_score' is not defined

In [None]:
#assigning the prediction of the standardised training dataset
y_pred_mlp = mlp.predict(X_test_std)

In [None]:
#checking for the accuracy of the classifier
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report

print('Misclassified samples: %d' % (y_test != y_pred_mlp).sum())  #displaying the sum of misclassified samples
print('Accuracy: %.2f' % accuracy_score(y_test, y_pred_mlp))       #displaying the accuracy score
print(confusion_matrix(y_test,y_pred_mlp))                         #displaying the confusion matrix
print(classification_report(y_test,y_pred_mlp))                    #displaying the classification report

## Support Vector Machine

In [None]:
###----SVM----###
from sklearn.svm import SVC                               #importing Support Vector Machine
clf = SVC(kernel='linear', C=1.0, random_state=0)   
clf.fit(X_train, y_train)                                 #fitting the training dataset 
y_pred_SVM=clf.predict(X_test)                            #predicting for the Support Vector Machine


from sklearn.metrics import accuracy_score,confusion_matrix,classification_report

print('Misclassified samples: %d' % (y_test != y_pred_SVM).sum())  #displaying the sum of misclassified samples
print('Accuracy: %.2f' % accuracy_score(y_test, y_pred_SVM))       #displaying the accuracy score
print(confusion_matrix(y_test,y_pred_SVM))                         #displaying the confusion matrix
print(classification_report(y_test,y_pred_SVM))                    #displaying the classification report

From the analysis of the Stock price using ANN and SVM;
* The accuracy of ANN is 94% and SVM is 91%

Precision = $\displaystyle{\frac{t_{p}}{t_{p}+f_{p}}}$


Recall = $\displaystyle{\frac{t_{p}}{t_{p}+f_{n}}}$

Where $t_{p}$ is the true positive, $f_{p}$ is the false positive and $f_{n}$ is the false negative.

$ \\ $

$\displaystyle{\text{f1-score} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}}$

$ \\ $

$\textbf{Interpretation of metrics for ANN:}$

Precision is defined as the probability that a (randomly selected) retrieved document is relevant. Therefore, the precision for determining the upward movement of stock price is 0.93 and for downward movement of price is 0.95.

Recall is define as the probability that a (randomly selected) relevant document is retrieved in a search. The recall for upward movement of stock price is 0.95 and downward movement is 0.94.


$\textbf{Interpretation of metrics for SVM:}$

Precision is defined as the probability that a (randomly selected) retrieved document is relevant. Therefore, the precision for determining the upward movement of stock price is 0.94 and for downward movement of price is 0.89.

Recall is define as the probability that a (randomly selected) relevant document is retrieved in a search. The recall for upward movement of stock price is 0.87 and downward movement is 0.95.

# Machine learning for Currencies (JPY, GBP, CAD)

## Assigning Training datasets and Target datasets

In [None]:
#assigning X to be training data
M=mdata.ix[:,[' JPY', 'CAD']].values

#assigning y to be target data
n=mdata.ix[:,'GBP'].values

M[1:,:]=M[1:,:]-M[0:-1,:]       #difference of todays stock returns and yesterdays (training variables)
M[0,:]=0.000001                 # replacing first row with 0
n[1:]=n[1:]-n[0:-1]             # difference between todays stock return and yesterdays (target variable)

n[0]=.000001                    # replacing first row with 0
n = np.where(n >= 0.0, 1, -1)   #replacing negative values with -1 and positive values with 1

M_del_1=M[0:-1,:]               #excludes the last row
n_del_1=n[1:]                   #excludes the first row

In [None]:
#Coverting X, y into a list
M=np.ndarray.tolist(M)
n=np.ndarray.tolist(n)

M_del_1=np.ndarray.tolist(M_del_1)
n_del_1=np.ndarray.tolist(n_del_1)

## Artificial Neural Network

In [None]:
#allocating data set to training and testing
from sklearn.model_selection import train_test_split


#splitting dataset; 70% for training and 30% for testing
M_train, M_test, n_train, n_test = train_test_split(
         M, n, test_size=0.3, random_state=0)
M_train_del_1, M_test_del_1, n_train_del_1, n_test_del_1 = train_test_split(
         M_del_1, n_del_1, test_size=0.3, random_state=0)

In [None]:
#Standardising the dataset
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
sc.fit(M_train_del_1)

sc.fit(M_train)
M_train_std = sc.transform(M_train)
M_test_std = sc.transform(M_test)

M_train_std_del_1 = sc.transform(M_train_del_1)
M_test_std_del_1 = sc.transform(M_test_del_1)

NameError: name 'M_train_del_1' is not defined

### Classifier

In [None]:
###----MLPClassifier----###
from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(1000,1000,100,), max_iter=10, alpha=1e-5,
                    solver='sgd', verbose=10, tol=1e-6, random_state=14,
                    learning_rate_init=.1)

mlp.fit(M_train_std, n_train)
n_pred=mlp.predict(M_test)
print('Accuracy: %.2f' % accuracy_score(n_test, n_pred))
print("Training set score: %f" % mlp.score(M_train_std, n_train))
print("Test set score: %f" % mlp.score(M_test_std, n_test))
mlp.predict(M_test)[1:100]

Iteration 1, loss = 0.67885779
Iteration 2, loss = 0.64865639
Iteration 3, loss = 0.64898208
Iteration 4, loss = 0.64771497
Iteration 5, loss = 0.64638967
Iteration 6, loss = 0.64407697
Iteration 7, loss = 0.64316757
Iteration 8, loss = 0.64283732
Iteration 9, loss = 0.64292349
Iteration 10, loss = 0.64117459
Accuracy: 0.63
Training set score: 0.630104
Test set score: 0.636364


array([-1, -1,  1, -1, -1,  1, -1, -1, -1,  1,  1, -1, -1,  1,  1,  1, -1,
       -1, -1, -1, -1,  1,  1, -1,  1,  1,  1,  1, -1,  1, -1,  1, -1,  1,
        1,  1, -1,  1,  1, -1, -1, -1, -1,  1,  1, -1,  1, -1,  1,  1, -1,
       -1, -1, -1, -1,  1, -1, -1, -1, -1, -1, -1, -1,  1,  1,  1, -1,  1,
        1,  1, -1, -1,  1,  1, -1, -1,  1, -1,  1, -1,  1, -1,  1, -1, -1,
       -1, -1,  1, -1,  1, -1, -1, -1, -1, -1, -1, -1, -1,  1])

In [None]:
n_pred_mlp = mlp.predict(M_test_std)

In [None]:
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report

print('Misclassified samples: %d' % (n_test != n_pred_mlp).sum())
print('Accuracy: %.2f' % accuracy_score(n_test, n_pred_mlp))
print(confusion_matrix(n_test,n_pred_mlp))
print(classification_report(n_test,n_pred_mlp))

Misclassified samples: 256
Accuracy: 0.64
[[235 143]
 [113 213]]
              precision    recall  f1-score   support

          -1       0.68      0.62      0.65       378
           1       0.60      0.65      0.62       326

    accuracy                           0.64       704
   macro avg       0.64      0.64      0.64       704
weighted avg       0.64      0.64      0.64       704



# Support Vector Machine

In [None]:
###----SVM----###
from sklearn.svm import SVC                               #importing Support Vector Machine
clf = SVC(kernel='linear', C=1.0, random_state=0)
clf.fit(M_train, n_train)                                 #fitting the training dataset 
n_pred_SVM=clf.predict(M_test)                            #predicting for the Support Vector Machine

from sklearn.metrics import accuracy_score,confusion_matrix,classification_report

print('Misclassified samples: %d' % (n_test != n_pred_SVM).sum())   #displaying the sum of misclassified samples
print('Accuracy: %.2f' % accuracy_score(n_test, n_pred_SVM))        #displaying the accuracy score
print(confusion_matrix(n_test,n_pred_SVM))                          #displaying the confusion matrix
print(classification_report(n_test,n_pred_SVM))                     #displaying the classification report

Misclassified samples: 259
Accuracy: 0.63
[[227 151]
 [108 218]]
              precision    recall  f1-score   support

          -1       0.68      0.60      0.64       378
           1       0.59      0.67      0.63       326

    accuracy                           0.63       704
   macro avg       0.63      0.63      0.63       704
weighted avg       0.64      0.63      0.63       704



From the analysis of the Stock price using ANN and SVM;
* The accuracy of ANN is 62% and SVM is 63%

$\textbf{Interpretation of metrics for ANN:}$

Precision is defined as the probability that a (randomly selected) retrieved document is relevant. Therefore, the precision for determining the upward movement of stock price is 0.58 and for downward movement of price is 0.65.

Recall is define as the probability that a (randomly selected) relevant document is retrieved in a search. The recall for both upward movement and downward movement of stock price is 0.62.


$\textbf{Interpretation of metrics for SVM:}$

Precision is defined as the probability that a (randomly selected) retrieved document is relevant. Therefore, the precision for determining the upward movement of stock price is 0.59 and for downward movement of price is 0.67.

Recall is define as the probability that a (randomly selected) relevant document is retrieved in a search. The recall for upward movement of stock price is 0.67 and downward movement is 0.60.