# All Models

- This notebook contains all of the models we have developed. Each model will have it's own function.


## Discussion and Findings

Looking at the selected input dimensions, the neural network appears to perform better with fewer inputs. With more input it seems to get confused and less accurate. Whereas with few inputs, the linear models always predict 1. With more inputs the logistic regression model predicts both 1s and 0s.

### Algorithm Hyper-Parameters

Here we will list the algorithm hyper-parameters that yield the best results for each algo

#### Sequential NN


#### Logistic Regression

#### Perceptron 

#### Random Forest 

In [8]:
#required imports and dependencies

import pandas as pd
import numpy as np 
import warnings 
warnings.filterwarnings("ignore")
from sklearn import preprocessing
from keras import Sequential
from keras.layers import Embedding, Dense, Dropout
from keras.metrics import *
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.linear_model import *
from sklearn.ensemble import RandomForestClassifier



#our own confusion matrix module that we developed
import cmatrix as cm

#load data into data frames
xlsPath = r'C:\Users\New\Desktop\UniWork\ADA\ada2\final dataset.xlsx'
data = pd.read_excel(xlsPath)

#split into data and class variable
X = data.drop(['date','symbol','price increase tomorrow?'], axis =1)
class_var = data['price increase tomorrow?']

### Comment/uncomment the below block to use Wavelet transform
#import pywt
#import statistics
#for column in X:
#    coeff = pywt.wavedec(X[column], "haar", level=10)
#    sigma = statistics.median(coeff[-1])/0.6745
#    threshold = sigma*np.sqrt(2*np.log(len(X[column])))
#    coeff[1:] = (pywt.threshold(i, value=threshold) for i in coeff[1:])
#    X[column] = pywt.waverec(coeff, "haar")

#perform normalization on data 
min_max_scaler = preprocessing.MinMaxScaler()
for column in X:
    X[column] = min_max_scaler.fit_transform(X[column].values.reshape(-1,1))


#select dimensions to reduce to 
inputDims = 4
attributes = SelectKBest(chi2, k=inputDims).fit_transform(X,class_var)

#Create a train test split
split_number = round(len(data)*0.97)
train_attributes,train_class_var,test_attributes,test_class_var = \
attributes[:split_number],class_var[:split_number], \
attributes[split_number:],class_var[split_number:]



In [2]:
def sequentialNN(train_attributes,train_class_var,test_attributes,test_class_var):
    
    model = Sequential()
    #The imput dim here is the number of cols 
    #in the df getting fed into the model
    model.add(Dense(256, input_dim=inputDims, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])
    
    model.summary()

    model.fit(train_attributes, train_class_var,
             epochs=100, verbose=0, batch_size=128)

    #the predict_classes method returns the 
    #binary variable we're looking for
    #for some reason it doesn't output the 
    #correct array type, made a 2D array
    pred = model.predict_classes(test_attributes)

    #the flatten method solves this and 
    #squashes to a 1D array for evaluation
    predictions = pred.flatten()

    #be sure to import our cmatrix module
    cmat = cm.cmatrix(test_class_var, predictions)
    return cmat

In [None]:
def logRegression(train_attributes,train_class_var,test_attributes,test_class_var):
    
    logreg = LogisticRegression()
    logreg.fit(train_attributes, train_class_var)
    predictions = logreg.predict(test_attributes)
    
    cmat = cm.cmatrix(test_class_var, predictions)
    return cmat

In [None]:
def perceptron(train_attributes,train_class_var,test_attributes,test_class_var):
    
    perceptron = Perceptron()
    perceptron.fit(train_attributes, train_class_var)
    predictions = perceptron.predict(test_attributes)
    
    cmat = cm.cmatrix(test_class_var, predictions)
    return cmat

In [None]:
def randomForest(train_attributes,train_class_var,test_attributes,test_class_var):
    
    random_forest = RandomForestClassifier(n_estimators=20)
    random_forest.fit(train_attributes, train_class_var)
    predictions = random_forest.predict(test_attributes)
    
    cmat = cm.cmatrix(test_class_var, predictions)
    return cmat

In [12]:
sequentialNN(train_attributes,train_class_var,test_attributes,test_class_var)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_113 (Dense)            (None, 256)               1280      
_________________________________________________________________
dropout_57 (Dropout)         (None, 256)               0         
_________________________________________________________________
dense_114 (Dense)            (None, 128)               32896     
_________________________________________________________________
dropout_58 (Dropout)         (None, 128)               0         
_________________________________________________________________
dense_115 (Dense)            (None, 32)                4128      
_________________________________________________________________
dense_116 (Dense)            (None, 1)                 33        
Total params: 38,337
Trainable params: 38,337
Non-trainable params: 0
_________________________________________________________________
accura

Predicted,0,1
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
0,23,0
1,23,1


In [None]:
logRegression(train_attributes,train_class_var,test_attributes,test_class_var)

In [None]:
perceptron(train_attributes,train_class_var,test_attributes,test_class_var)

In [None]:
randomForest(train_attributes,train_class_var,test_attributes,test_class_var)