# Neural CI
----
In this notebook, we'll be reading in data from the data formatting notebook (dataFormmatter.ipynb) and passing them into [my neural net library](https://github.com/RobGeada/nn). Specifically, we'll design functions that allow for easy manipulation of the network architecture and dataset sizes, to simplify neural net tuning.

## Imports and Initializations

In [13]:
import numpy as np
import os,sys
import time
import pickle
import matplotlib.pyplot as plt
import nn

cwd = os.getcwd()

## Data Helpers
We're going to wanna be able to specify dataset sizes here, so we can play around with how much data we're training and testing on, and thus we're going to want a function rather than hardcoding specific values. We also need to be able to randomly shuffle the data, as per the neural net algorithm. Finally, we're going to want a way to easily save our predictions to file, ideally allowing for comparisons to the true ci_status values.

In [27]:
def unison_shuffled_copies(a, b):
    assert len(a) == len(b)
    p = np.random.permutation(len(a))
    return a[p], b[p]

def loadData(trainUB,testUB):
    tX,tY = np.load("tvectors.npy"),np.load("tstatus.npy")
    vX,vY = np.load("vvectors.npy"),np.load("vstatus.npy")
    
    #randomly shuffle data
    vX2,vY2 = unison_shuffled_copies(vX,vY)
    tX2,tY2 = unison_shuffled_copies(tX,tY)

    tysize = tY2.shape
    vysize = vY2.shape

    trainX,trainY = tX2[:trainUB],tY2[:trainUB]
    testX,testY  = vX2[:testUB],vY2[:testUB]

    return trainX,trainY,testX,testY

#save predictions to file
def savePredictions(predictions,testY,filename):
    print "Saving predictions..."
    f = open("{}/{}_Predictions.csv".format(cwd,filename),"w")
    numPred = len(predictions)
    for i,prediction in enumerate(predictions):
        if round(prediction,0)!=testY[i]:
            f.write("P: {},A: {} INCORRECT".format(round(prediction,3),testY[i]))
        else:
            f.write("P: {},A: {}".format(round(prediction,3),testY[i]))
        if i<numPred-1:
            f.write("\n")
    f.close()
    print "Done!"

## Setup NN Test Function
The same goes here; we're going to want to be able to play with network parameters, so let's write a function rather than hardcode anything. You'll notice that I'm passing test data into the `net.train()` function; don't be alarmed,  `net.train()` only uses testing data to produce a per-epoch glimpse at the test error, so we can nip over-fitting in the bud.

In [17]:
def nnTest(parameters):
    #unpack parameters construct
    trainUB,testUB,hiddenSize,epochs,learningRate = parameters

    #load training,testing data from the specified sets
    trainX,trainY,testX,testY = loadData(trainUB,testUB)

    #create network
    net = nn.Network(inDim=35,biases=1,hiddenDims=[hiddenSize,],outDim=1,learningRate=learningRate)

    #train network
    tStart = time.time()
    Y = net.train(trainX,trainY,testX,testY,epochs=epochs)

    #display training stats
    print "\n===RESULTS==="
    print "Train time:   {} s".format(time.time()-tStart)

    #make predictions
    tStart = time.time()
    predictionsX = net.predict(testX)
    print "Predict time: {} s".format(time.time()-tStart)
    
    #test predictions and display accuracy stats
    net.error(testX,testY,verbose=True)
    return predictionsX,testY

## Run It!
Here we define the network parameters we want to test. The variables defined below correspond to network parameters as follows:


| Variable        | Parameter           |
| ------------- |:-------------:|
| trainUB     | Size of training dataset|
| testUB      | Size of testing dataset|
| hiddenSize  | Number of nodes in hidden layer|
| epochs  | Self-explanatory|
| learningRate  | Eta value for neural net backpropagation|

The values below are just the ones I've found to perform best on my particular slice of the dataset, so tune away!

In [23]:
#define net parameters
trainUB,testUB,hiddenSize,epochs,learningRate = 75000,10000,35,100,.15
netParams = (trainUB,testUB,hiddenSize,epochs,learningRate)

#test said parameters
predictions,testY = nnTest(parameters=netParams)

Training network...
===EPOCH 0===
Train error: 0.14972
Holdout error: 0.1578
===EPOCH 1===
Train error: 0.137306666667
Holdout error: 0.143
===EPOCH 2===
Train error: 0.125786666667
Holdout error: 0.1304
===EPOCH 3===
Train error: 0.12276
Holdout error: 0.1274
===EPOCH 4===
Train error: 0.121386666667
Holdout error: 0.127
===EPOCH 5===
Train error: 0.120253333333
Holdout error: 0.126
===EPOCH 6===
Train error: 0.119586666667
Holdout error: 0.1233
===EPOCH 7===
Train error: 0.118466666667
Holdout error: 0.1221
===EPOCH 8===
Train error: 0.1168
Holdout error: 0.1199
===EPOCH 9===
Train error: 0.112746666667
Holdout error: 0.1167
===EPOCH 10===
Train error: 0.107906666667
Holdout error: 0.1117
===EPOCH 11===
Train error: 0.1028
Holdout error: 0.1071
===EPOCH 12===
Train error: 0.09876
Holdout error: 0.1022
===EPOCH 13===
Train error: 0.09568
Holdout error: 0.0995
===EPOCH 14===
Train error: 0.0931333333333
Holdout error: 0.0978
===EPOCH 15===
Train error: 0.0909733333333
Holdout error: 0.

## Save Predictions

In [28]:
savePredictions(predictions,testY,"CI")

Saving predictions...
Done!


In [29]:
trainX,trainY,testX,testY = loadData(500,200)
trainX.shape, trainY.shape

((500, 35), (500,))