## 1. What is a neural network? What are the general steps required to build a neural network?

A neural network is a type of machine learning that is designed to mimic the brain's neural network. It can utilize hidden layers with hundreds of nodes and also use backpropagation to improve the network.

- Specify architecture: how many hidden layers, input shape, how many nodes, what sort of activation
- Compile: specifiy optimizer, learning rate(if desired), loss function
- Fit: this is where the backpropagation comes in to adjust the weights of the nodes. You can also define the split of the data here instead of splitting it in advance, and specify the number of epochs to use when fitting.
- Predict: actually make predictions

## 2. Generally, how do you check the performance of a neural network? Why?

Using the loss function, it calculates how the network is performing during each instance/epoch and you can use it to determine when to stop running epochs because the model is doing as well as it can. 

## 3. Create a neural network using keras to predict the outcome

In [6]:
import numpy as np
import pandas as pd
from keras.layers import Dense
from keras.models import Sequential
from keras.callbacks import EarlyStopping

In [3]:
import csv

with open("../abalone.data") as infile, open("abalone.csv", "w") as outfile:
    csv_writer = csv.writer(outfile, delimiter=',')
    #create a row of titles
    csv_writer.writerow(['Sex','Length','Diameter','Height','Whole weight','Shucked weight',
                         'Viscera weight','Shell weight','Rings'])
    for line in infile:
        #check to see how the data is divided so you know what to split the line on
        row = [field.strip() for field in line.split(',')]
        csv_writer.writerow(row)

In [4]:
abalone_df = pd.read_csv('abalone.csv')
abalone_df.head()

Unnamed: 0,Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7


In [5]:
# repeated preprocessing from last week
#nominal values
from sklearn.preprocessing import LabelEncoder

sex_labels = LabelEncoder()
abalone_df['Sex'] = sex_labels.fit_transform(abalone_df['Sex'].values)
abalone_df.head()

Unnamed: 0,Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings
0,2,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,2,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,0,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,2,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,1,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7


In [91]:
abalone_df['age']=abalone_df.apply(lambda x: x['Rings']+1.5,axis=1)
abalone_df.head()

Unnamed: 0,Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings,age
0,2,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15,16.5
1,2,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7,8.5
2,0,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9,10.5
3,2,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10,11.5
4,1,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7,8.5


In [178]:
# define data 
to_drop = abalone_df[['Rings', 'age']]
predictors = np.array(abalone_df.drop(to_drop, axis =1))
target = np.array(abalone_df['age'])
print(predictors.shape, target.shape)



(4177, 8) (4177,)


In [93]:
# create model

n_cols= predictors.shape[1]
early_stopping_monitor = EarlyStopping(patience=2)

#instantiate model
model = Sequential()
model.add(Dense(100, activation='relu', input_shape= (n_cols,)))
model.add(Dense(1))

model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(predictors, target, validation_split=0.3, epochs=20, 
          callbacks= [early_stopping_monitor])



Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7fc56a7c2340>

In [94]:
# Train a second model to compare
model_1 = Sequential()
model_1.add(Dense(100, activation='relu', input_shape= (n_cols,)))
model_1.add(Dense(100, activation='relu'))
model_1.add(Dense(1))

model_1.compile(optimizer='adam', loss='mean_squared_error')

model_1.fit(predictors, target, validation_split=0.3, epochs=20, 
          callbacks= [early_stopping_monitor])

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20


<tensorflow.python.keras.callbacks.History at 0x7fc56b9661c0>

In [95]:
# Train a third model to compare
model_2 = Sequential()
model_2.add(Dense(150, activation='relu', input_shape= (n_cols,)))
model_2.add(Dense(150, activation='relu'))
model_2.add(Dense(1))

model_2.compile(optimizer='adam', loss='mean_squared_error')

model_2.fit(predictors, target, validation_split=0.3, epochs=20, 
          callbacks= [early_stopping_monitor])

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20


<tensorflow.python.keras.callbacks.History at 0x7fc56674e1f0>

### Best performing keras model

In [185]:
# fourth model 
# create model

n_cols= predictors.shape[1]
early_stopping_monitor = EarlyStopping(patience=2)


model_3 = Sequential()
model_3.add(Dense(150, activation='relu', input_shape= (n_cols,)))
model_3.add(Dense(150, activation='relu'))
model_3.add(Dense(150, activation='relu'))
model_3.add(Dense(1))

model_3.compile(optimizer='adam', loss='mean_squared_error')

model_3.fit(predictors, target, validation_split=0.3, epochs=20, 
          callbacks= [early_stopping_monitor])

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7fc568d95220>

In [186]:
model_3.summary()

Model: "sequential_21"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_59 (Dense)             (None, 150)               1350      
_________________________________________________________________
dense_60 (Dense)             (None, 150)               22650     
_________________________________________________________________
dense_61 (Dense)             (None, 150)               22650     
_________________________________________________________________
dense_62 (Dense)             (None, 1)                 151       
Total params: 46,801
Trainable params: 46,801
Non-trainable params: 0
_________________________________________________________________


## 4. Write another algorithm using KNN

In [97]:
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = np.array(abalone_df.drop(to_drop, axis =1))
y = np.array(abalone_df['age']).reshape(-1)
print(X.shape, y.shape)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=42)

#Standardize  The model performed better without standardization
#sc= StandardScaler()
#X_train=sc.fit_transform(X_train)
#X_test=sc.fit_transform(X_test)

print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)



(4177, 8) (4177,)
(2923, 8) (2923,)
(1254, 8) (1254,)


In [98]:
#instantiate knn

knn= KNeighborsRegressor(n_neighbors=20)
knn.fit(X_train, y_train)
knn.score(X_test, y_test)

0.5469333028936754

In [99]:
from sklearn.metrics import mean_squared_error
y_pred= knn.predict(X_test)
mean_squared_error(y_pred, y_test)

4.600691786283892

## 5. Create a neural network using pytorch to predict the same result as question 3. 

In [166]:
import pandas as pd
import torch

from sklearn.model_selection import train_test_split


X = abalone_df.drop(to_drop, axis=1).values
y = abalone_df['age'].values

# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=42)

#Standardize
#sc= StandardScaler()
#X_train=sc.fit_transform(X_train)
#X_test=sc.fit_transform(X_test)

In [167]:
import torch.nn as nn
import torch.nn.functional as F #this has activation functions
from torch.autograd import Variable

# Creating tensors
X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)

#I think these need to be float tensors rather than long tensors because it's regression
y_train = torch.FloatTensor(y_train)
y_test = torch.FloatTensor(y_test)

print(X_train)

tensor([[0.0000, 0.5250, 0.4300,  ..., 0.4325, 0.1800, 0.1815],
        [1.0000, 0.4300, 0.3250,  ..., 0.1575, 0.0825, 0.1050],
        [2.0000, 0.4550, 0.3500,  ..., 0.1625, 0.0970, 0.1450],
        ...,
        [2.0000, 0.5100, 0.3950,  ..., 0.2440, 0.1335, 0.1880],
        [2.0000, 0.5750, 0.4650,  ..., 0.5160, 0.2185, 0.2350],
        [0.0000, 0.5950, 0.4750,  ..., 0.5470, 0.2310, 0.2710]])


In [168]:
class ANN_Model(nn.Module):
    def __init__(self, input_features=8, hidden1=100, out_features =1):
        super().__init__()
        self.layer_1_connection = nn.Linear(input_features, hidden1)
        #self.layer_2_connection = nn.Linear(hidden1, hidden2)
        self.out = nn.Linear(hidden1, out_features)
    
    def forward(self, x):
        #apply activation functions
        x = F.relu(self.layer_1_connection(x))
        #x = F.relu(self.layer_2_connection(x))
        x = self.out(x)
        return x

In [169]:
torch.manual_seed(42)

#instantiate the model
model = ANN_Model()

In [176]:
# loss function
loss_function = nn.MSELoss()

#optimizer
optimizer = torch.optim.Adam(model.parameters(), lr = 0.000001)

In [177]:
#run model through multiple epochs/iterations
final_loss = []
n_epochs = 500
for epoch in range(n_epochs):
    y_pred = model.forward(X_train)
    loss = loss_function(y_pred, y_train)
    final_loss.append(loss)
    
    if epoch % 10 == 1:
        print(f'Epoch number: {epoch} with loss: {loss.item()}')
    
    optimizer.zero_grad() #zero the gradient before running backwards propagation
    loss.backward() #for backward propagation 
    optimizer.step() #performs one optimization step each epoch

Epoch number: 1 with loss: 10.486957550048828
Epoch number: 11 with loss: 10.486957550048828
Epoch number: 21 with loss: 10.486957550048828
Epoch number: 31 with loss: 10.486957550048828
Epoch number: 41 with loss: 10.486957550048828
Epoch number: 51 with loss: 10.486957550048828
Epoch number: 61 with loss: 10.486956596374512
Epoch number: 71 with loss: 10.486956596374512
Epoch number: 81 with loss: 10.486955642700195
Epoch number: 91 with loss: 10.486955642700195
Epoch number: 101 with loss: 10.486955642700195
Epoch number: 111 with loss: 10.486955642700195
Epoch number: 121 with loss: 10.486953735351562
Epoch number: 131 with loss: 10.486953735351562
Epoch number: 141 with loss: 10.486953735351562
Epoch number: 151 with loss: 10.486952781677246
Epoch number: 161 with loss: 10.486952781677246
Epoch number: 171 with loss: 10.486952781677246
Epoch number: 181 with loss: 10.486952781677246
Epoch number: 191 with loss: 10.486952781677246
Epoch number: 201 with loss: 10.486952781677246
Epo

## 6.	Compare the performance of the neural networks to the other model you created. Which performed better? Why do you think that is?

The model that seemed to perform the best was the Keras model which got down to a MSE of 3.95, which still isn't great but is better than what I was getting with my best performing decision tree regression.

I think the Keras model performed the best in part because it is designed to be easier to use for beginner developers and for people with less of a math background. It has been easier for me to understand what I'm doing in terms of manipulating Keras than PyTorch, although despite the poor performance the Pytorch model was performing even worse initially.

I think that Keras also performed better than KNN because of the ability to correct using backpropagation.