# Week 18 Homework

### 1.	What is a neural network? What are the general steps required to build a neural network?

A neural network is a supervised machine deep learning model that uses hidden layers to find interactions between features.  It is supposed to mimic human learning and neurons.  It can also be described a graph model with nodes and edges.

To build a neural network, you must create the following layers:
    * An input layer
    * One of more hidden layers
    * An output layer - which may have 1 or more nodes depending on the problem - regression or classification.
    
For classification problems, each output node represents one category.

Each layer must have an activation function (like ReLU) and a number of nodes.   Models are optimized using a function like Adam or Stochastic Gradient Descent.  Neural networks are refined using forward and backward propagation.

### 2.	Generally, how do you check the performance of a neural network? Why? 

Hold back some of the data to test accuracy/performance using a validation split.  Kfold cross validation is too computationally expensive and takes too long.

### 3.	Create a neural network using keras to predict the outcome

In [4]:
import numpy as np
import pandas as pd
from statistics import mean

from keras.layers import Dense
from keras.models import Sequential
from keras import metrics
from keras.callbacks import EarlyStopping
from keras.callbacks import History 

from numpy.random import seed
seed(42)
import tensorflow
tensorflow.random.set_seed(42)

In [5]:
import os
os.environ['PYTHONHASHSEED']=str(42)

In [6]:
abalone_df = pd.read_csv('clean_abalone_data.csv', index_col=0)
abalone_df.head()

Unnamed: 0,sex,length,diameter,height,whole_weight,shucked_weight,viscera_weight,shell_weight,rings,sex_cat
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15,2
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7,2
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9,0
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10,2
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7,1


In [7]:
# one hot encode sex feature
abalone_df = pd.get_dummies(abalone_df, prefix_sep="__",
                              columns=['sex'])
abalone_df.head()

Unnamed: 0,length,diameter,height,whole_weight,shucked_weight,viscera_weight,shell_weight,rings,sex_cat,sex__F,sex__I,sex__M
0,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15,2,0,0,1
1,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7,2,0,0,1
2,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9,0,1,0,0
3,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10,2,0,0,1
4,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7,1,0,1,0


In [8]:
# set features and target 
X = abalone_df.drop(['rings', 'sex_cat'], axis=1)
y = abalone_df['rings']

In [9]:
# scale data - important for neural networks, not for trees
# did not scale data for last week's tree models
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)

In [10]:
# number of features
n_cols = X.shape[1]
n_cols

10

### Defining helping function to run model

After running this a lot of times, I realized the best way to approach this is to a the mean of 10 model runs.   I built helper functions to this.

In [11]:
def run_model(X, y, layers, nodes, n_cols):
    # function to run a particular model once
    # X - features
    # y - target
    # layers - int number of hidden layers
    # nodes - int number of nodes per layer
    # n_cols for input shape
    # returns MSE of validation data
    
    model = Sequential()
    model.add(Dense(nodes, activation='relu', input_shape=(n_cols,)))
    if layers > 1:
        for i in range(layers-1):
            model.add(Dense(nodes, activation='relu'))
    model.add(Dense(1))
    model.compile(optimizer='adam',
                 loss='mean_squared_error')
    early_stopping_monitor = EarlyStopping(patience=3, monitor='val_loss')
    history = History()
    model.fit(X, y, validation_split=.3, epochs=50, callbacks=[early_stopping_monitor, history], verbose=1)
    # changed verbose=1 from 0 to see keras in action again
    
    #return last validation loss
    return history.history['val_loss'][-1]
    

In [12]:
def average_model_rmse(X, y, layers, nodes, n_cols):
    # runs model 10 times
    # prints mse list as a visual validation
    # returns the root mean squared error of the average MSE
    mse_list = []
    for i in range(10):
        mse = run_model(X, y, layers, nodes, n_cols)
        mse_list.append(mse)
    print(mse_list)
    
    mean_mse = mean(mse_list)
    rmse = mean_mse**.5
    return rmse
    

In [13]:
# testing
average_model_rmse(X, y, 2, 100, n_cols)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50


Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50


Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
[3.272320508956909, 3.232393980026245, 3.4187707901000977, 3.2519590854644775, 3.421375036239624, 3.4947404861450195, 3.2616567611694336, 3.230750799179077, 3.3944783210754395, 3.2263786792755127]


1.8222190990007716

#### Model 1

Layers = 1, Nodes = 100

In [144]:
average_model_rmse(X, y, 1, 100, n_cols)

[3.284062623977661, 3.273653507232666, 3.3759284019470215, 3.2524707317352295, 3.434234142303467, 3.2776827812194824, 3.314272165298462, 3.4065635204315186, 3.457488775253296, 3.2557079792022705]


1.825707113109906

#### Model 2 

Layers = 2, Nodes = 100

In [145]:
average_model_rmse(X, y, 2, 100, n_cols)

[3.284374237060547, 3.2499232292175293, 3.2461225986480713, 3.268390655517578, 3.2319915294647217, 3.248373031616211, 3.429248094558716, 3.21582293510437, 3.2683210372924805, 3.2360286712646484]


1.8077222137193776

#### Model 3 

Layers = 2, Nodes = 200

In [146]:
average_model_rmse(X, y, 2, 200, n_cols)

[3.4066529273986816, 3.362492322921753, 3.3257672786712646, 3.3973565101623535, 3.360968828201294, 3.339843511581421, 3.4071311950683594, 3.3765745162963867, 3.3637523651123047, 3.3493335247039795]


1.835480127381329

#### Model 4

Layers = 3, Nodes = 100

In [147]:
average_model_rmse(X, y, 3, 100, n_cols)

[3.297391891479492, 3.3103384971618652, 3.2883212566375732, 3.345993995666504, 3.2697465419769287, 3.3393874168395996, 3.3076107501983643, 3.2888870239257812, 3.3017842769622803, 3.317643642425537]


1.8184362868485089

#### Model 5

Layers = 4, Nodes = 100

In [148]:
average_model_rmse(X, y, 4, 100, n_cols)

[3.4114432334899902, 3.3468828201293945, 3.3339128494262695, 3.392328977584839, 3.386967420578003, 3.39839506149292, 3.338188409805298, 3.3435137271881104, 3.3187456130981445, 3.3416478633880615]


1.8333582840290936

#### Model 6

Layers = 3, Nodes = 50

In [150]:
average_model_rmse(X, y, 3, 50, n_cols)

[3.461167335510254, 3.3578155040740967, 3.427490472793579, 3.4539945125579834, 3.2534005641937256, 3.3252944946289062, 3.2695159912109375, 3.379878520965576, 3.4582760334014893, 3.3128271102905273]


1.8357467292528966

#### Model 7

Layers = 3, Nodes = 125

In [152]:
average_model_rmse(X, y, 3, 125, n_cols)

[3.3090052604675293, 3.266998529434204, 3.3159334659576416, 3.296473503112793, 3.313354730606079, 3.3292856216430664, 3.368403196334839, 3.394998550415039, 3.3207478523254395, 3.307285785675049]


1.822703664778553

#### Model 8

Layers = 2, Nodes = 75

In [158]:
average_model_rmse(X, y, 2, 75, n_cols)

[3.2502245903015137, 3.23888897895813, 3.257948160171509, 3.230085611343384, 3.2475154399871826, 3.243337869644165, 3.2287466526031494, 3.2568018436431885, 3.212043285369873, 3.2422375679016113]


1.8002174868588436

### Model Performance

| Model Number | Hidden Layers | Nodes | Root Mean Sq Error |
|--- | --- | --- | ----|
|1|1|100|1.8257|
| 2|2 |100 |1.8077|
|3|2|200|1.8355|
|4| 3|100|1.8184|
|5|4|100|1.8334|
|6|3|50|1.8357|
|7|3|125|1.8227|
|8|2|75|1.8002|

The best performing model had 2 hidden layers and 75 nodes and the 2 layer, 100 node model was also very close.  The average error was 1.8 rings.

So based on the features of an abalone, the number of rings (related to age) could be predicted within 1.8 rings. (I think?)

### 4.	Write another algorithm to predict the same result as the previous question using either KNN or logistic regression.

#### Logistic Regression

I wouldn't have chosen this model, but decided to run a simple, untuned version.  Since my models last week and this week treated this problem as linear prediction output, I can't really compare the output and performance.  I tried using RMSE, but that's not usually applicable to a Logistic Regression.

In [186]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=42, stratify=y)

# The data is already scaled

In [167]:
#Simplest logistic regression approach
modelLR = LogisticRegression(max_iter = 10000, random_state=42)
modelLR.fit(X_train, y_train)

y_pred= modelLR.predict(X_test)

In [168]:
from sklearn.metrics import mean_squared_error as MSE

In [169]:
# this is probably not a valid test

RMSE = MSE(y_test, y_pred)**.5
print(RMSE)

2.1297918678441348


In [172]:
from sklearn.metrics import accuracy_score

accuracy_score(y_test, y_pred)

0.27386934673366836

In [176]:
modelLR.score(X_test, y_test)

0.27386934673366836

#### Linear Regression

RMSE = 1.9542

Running this model makes more sense in how I ran all my other abalone data.

The RMSE for a simple Linear Regression is worse than the Keras model, but better than some of my Decision Trees from last week.

In [175]:
from sklearn.linear_model import LinearRegression


reg = LinearRegression()

reg.fit(X_train, y_train)
reg.score(X_test, y_test)

0.4900459741284562

In [178]:
y_pred = reg.predict(X_test)
RMSE = MSE(y_test, y_pred)**.5
print(RMSE)

1.9542222317684228


### 5.	Create a neural network using pytorch to predict the same result as question 3.

In [220]:
import torch
import torch.nn as nn
import torch.nn.functional as F #this has activation functions

In [298]:
# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=42, stratify=y)

In [299]:
# Creating tensors
X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)

y_train = torch.FloatTensor(y_train.to_numpy())
y_test = torch.FloatTensor(y_test.to_numpy())

print(X_train)

tensor([[-1.6290, -1.6428, -1.7304,  ..., -0.6701,  1.4290, -0.7523],
        [-1.1486, -1.1151, -1.3189,  ..., -0.6701, -0.6998,  1.3293],
        [-1.5854, -1.6428, -1.7304,  ..., -0.6701,  1.4290, -0.7523],
        ...,
        [-1.9785, -1.8538, -1.5933,  ..., -0.6701,  1.4290, -0.7523],
        [ 1.1664,  1.1011,  0.3273,  ...,  1.4923, -0.6998, -0.7523],
        [ 0.1181, -0.0070,  0.0530,  ..., -0.6701, -0.6998,  1.3293]])


In [300]:
class ANN_Model(nn.Module):
    def __init__(self, input_features=10, hidden1=100, hidden2=100, out_features =1):
        super().__init__()
        self.layer_1_connection = nn.Linear(input_features, hidden1)
        self.layer_2_connection = nn.Linear(hidden1, hidden2)
        self.out = nn.Linear(hidden2, out_features)
    
    def forward(self, x):
        #apply activation functions
        x = F.relu(self.layer_1_connection(x))
        x = F.relu(self.layer_2_connection(x))
        x = self.out(x)
        return x

In [301]:
torch.manual_seed(42)

#instantiate the model
model = ANN_Model()

In [302]:
# loss function
loss_function = nn.MSELoss()

#optimizer
optimizer = torch.optim.Rprop(model.parameters(), lr = 0.01)

In [303]:
#run model through multiple epochs/iterations
final_loss = []
n_epochs = 500
for epoch in range(n_epochs):
    y_pred = model.forward(X_train)
    loss = loss_function(y_pred[:,0], y_train)
    final_loss.append(loss)
    
    if epoch % 10 == 1:
        print(f'Epoch number: {epoch} with loss: {loss.item()}')
    
    optimizer.zero_grad() #zero the gradient before running backwards propagation
    loss.backward() #for backward propagation 
    optimizer.step() #performs one optimization step each epoch

Epoch number: 1 with loss: 86.2098388671875
Epoch number: 11 with loss: 7.514674663543701
Epoch number: 21 with loss: 3.8004727363586426
Epoch number: 31 with loss: 3.47163724899292
Epoch number: 41 with loss: 3.2960548400878906
Epoch number: 51 with loss: 3.1216843128204346
Epoch number: 61 with loss: 3.0475783348083496
Epoch number: 71 with loss: 2.992058753967285
Epoch number: 81 with loss: 2.945040464401245
Epoch number: 91 with loss: 2.9107394218444824
Epoch number: 101 with loss: 2.883209705352783
Epoch number: 111 with loss: 2.854099988937378
Epoch number: 121 with loss: 2.83445405960083
Epoch number: 131 with loss: 2.8181657791137695
Epoch number: 141 with loss: 2.7982242107391357
Epoch number: 151 with loss: 2.7869207859039307
Epoch number: 161 with loss: 2.7774386405944824
Epoch number: 171 with loss: 2.766237497329712
Epoch number: 181 with loss: 2.754946708679199
Epoch number: 191 with loss: 2.743060350418091
Epoch number: 201 with loss: 2.7311606407165527
Epoch number: 211

In [248]:
2.4524**.5

1.566014048468276

In [304]:
#predictions
y_pred = []

with torch.no_grad():
    for i, data in enumerate(X_test):
        prediction = model(data)
        y_pred.append(prediction.item())

In [307]:
y_pred[1:20]  #testing output

[10.169902801513672,
 12.285325050354004,
 6.628574848175049,
 13.747121810913086,
 7.7859110832214355,
 5.466454982757568,
 6.459367275238037,
 6.684447765350342,
 10.25206184387207,
 6.199649810791016,
 7.299482822418213,
 6.215028762817383,
 8.770575523376465,
 11.107390403747559,
 9.816902160644531,
 11.087135314941406,
 9.090593338012695,
 6.663367748260498,
 10.299430847167969]

In [306]:
rmse = MSE(y_test, y_pred)**.5
print('RMSE ', round(rmse, 4))

RMSE  1.9299


I think there are proabaly parameters that can be tuned to get a pytorch answer more similar to Keras.  Also note that Pytorch model is overfitted.  I found this to be less of a problem with the Keras model.

### 6.	Compare the performance of the neural networks to the other model you created. Which performed better? Why do you think that is?

The **Keras neural network out performed every model** for the the Abalone data set so far.  In part, I was better able to tune the Keras model, so it's not surprising that it out performed the Pytorch model.  If I had more time, I would probably write functions and tune the Pytorch model like I did the Keras model.   Have a simple function for Keras calls made it much simpler to experiment with different layers and nodes.

In general, **neural networks** have the opportunity to outperform other models due to the fact they can **better account for how different features interact with each other.**

***Best Keras Test set RMSE: 1.8002***

***Random Forest Test set RMSE: 1.886 (last week)***

***Linear Regression RMSE: 1.9542***

What was interesting was how well the 'simple' Linear Regression model worked on the scaled data.  I feel this is the could be the worse case scenario - where **all other models should improve on a Linear Regression score.**

It might have been interesting to treat the Abalone data as a classification problem, but I think the target would need to be binned in 2-4 bins for a meaniful model.

I chose **not to treat the Abalone data as a classification problem** because I thought it would be more meaniful to have a model predict an age (with rings) with a margin of error.   So if you're the person in the lab wanting **to know approximate age** of the abalone specimen,**all you need is an easier to gather feature set.**   The information about the data mentioned it was tedious and error prone to count rings.


Some things for fun - I'm running my best Keras model to check model MSE and test MSE.   I changed the my function to verbose = 1.

In [14]:
run_model(X, y, 2, 75, n_cols)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50


3.249844551086426

In [15]:
modelRMSE = 3.369**.5
testRMSE = 3.2498**.5

print('Model RMSE: ', modelRMSE)
print('Test RMSE: ', testRMSE)

Model RMSE:  1.8354835875049387
Test RMSE:  1.802720166858961


Model and test RMSE for Keras model are pretty close, so I don't feel it's over or under fitted unlike the pytorch model.