# Modeling: Neural Nets

_By [Michael Rosenberg](mailto:rosenberg.michael.m@gmail.com)._

_**Description**: Contains my methods for modeling the data-generating process via neural nets._

_Last Updated: 9/11/2017 11:01 PM._

In [1]:
#imports
import pandas as pd
import pickle as pkl
import numpy as np
import scipy as sp
import keras as kr
import matplotlib.pyplot as plt
import seaborn as sns

#helpers
sigLev = 3
alphaLev = 3
percentLev = 100
%matplotlib inline
sns.set_style("whitegrid")
pd.set_option("display.precision",sigLev)

Using Theano backend.


In [2]:
dataDict = pkl.load(open("../data/processed/processedData.pkl"))
trainFeatureMat = dataDict["train"]["featureMat"]
trainTargetFrame = dataDict["train"]["target"]
testFeatureMat = dataDict["test"]["featureMat"]
testTargetFrame = dataDict["test"]["target"]

# Recap

As part of our [initial modeling](initialModeling.ipynb), we found that a dataset with 4 layers, 3 epochs, and relu activations generally worked the best. Let's try to replicate that performance here.

In [3]:
initNet = kr.models.Sequential()
initNet.add(kr.layers.Dense(100,input_dim = trainFeatureMat.shape[1],
                activation = "relu"))
initNet.add(kr.layers.Dense(50,activation = "relu"))
initNet.add(kr.layers.Dense(25,activation = "sigmoid"))
initNet.add(kr.layers.Dense(1,activation = "linear"))

In [4]:
initNet.compile(loss = "mean_squared_error",optimizer = "adam",
              metrics = ["accuracy"])

In [5]:
initNet.fit(trainFeatureMat.toarray(),
            np.array(trainTargetFrame["logTripDuration"]),
            epochs = 3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x11090dc50>

In [6]:
testFeatureMat.shape

(625134, 1101)

In [7]:
testTargetFrame["logTripDuration"] = initNet.predict(testFeatureMat.toarray())
testTargetFrame["trip_duration"] = np.exp(testTargetFrame["logTripDuration"])

In [8]:
exportFrame = testTargetFrame[["id","trip_duration"]]
exportFrame.to_csv("../data/processed/predictions/initNNPredictions.csv",
                   index = False)

That helped us to some degree! Let's see how well we perform when we add another layer and another epoch, with an elu at the end.

In [42]:
initNet = kr.models.Sequential()
initNet.add(kr.layers.Dense(100,input_dim = trainFeatureMat.shape[1],
                           activation = "relu"))
initNet.add(kr.layers.Dense(50,activation = "elu"))
initNet.add(kr.layers.Dense(25,activation = "sigmoid"))
initNet.add(kr.layers.Dense(1,activation = "linear"))

In [43]:
initNet.compile(loss = "mean_squared_error",optimizer = "adam",
              metrics = ["accuracy"])

In [44]:
initNet.fit(trainFeatureMat.toarray(),
            np.array(trainTargetFrame["logTripDuration"]),
            epochs = 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x123d5a210>

In [45]:
testTargetFrame["logTripDuration"] = initNet.predict(testFeatureMat.toarray())
testTargetFrame["trip_duration"] = np.exp(testTargetFrame["logTripDuration"])

In [46]:
exportFrame = testTargetFrame[["id","trip_duration"]]
exportFrame.to_csv("../data/processed/predictions/nextNNPredictions.csv",
                   index = False)