### 1.	What is a neural network? What are the general steps required to build a neural network? 

Neural networks are collections of connected nodes (which represent neurons), that aim to mimic the workings of neurons in the brain. The nodes are connected by edges (loosely analogous to dendrites) which represent (directional) transmission of information between the nodes. The inputs to a node a subject to some nonlinear transformation in order to create the output, which is sent forward if it reaches a threshold.

Neural networks are aggregated into layers which we need to build:


    - initialize the model
    - we add input layer, hidden layers, output layer (we need to specify how many hidden layers we want and also how many nodes for each layer: the number of nodes in the hidden layers is subject to optimalization and the output would be as many as the number of possible classes in a classification problem and a single node for regression)
    - choose an activation function for the hidden layers (relu, sigmoid, tanh), this would depend on the type of network used
    - choose an activation function for the output layer that is most appropriate for the type of problem (e.g.: binary-sigmoid, nonbinary classification - softmax, linear regression - linear activation)
    - the inputs to these functions are usually weighted, so we need to initialize the weights (and bias which is constant), as far as I know this is done for us in keras, by using random weights to begin with.
    - we need to define (or choose a predefined) a loss function that is minimized

### 2.	Generally, how do you check the performance of a neural network? Why? 

The performance metrics (besides speed) would depend on the problem to be solved:

- for a regression problem the mean squared error (MSE) or its square route are a good metric
- for a classification problem accuracy, recall (if true positives are the most important class) or precision, and if we want to check the robustness of the system, then ROC-AUC might be the right metric

Now, this implies the performance metrics shall depend on the problem, rather than the method used, this way we can directly compare different methods like KNN, DecisionTree or neural networks. If the performance in the chosen metrics is close across models, then speed, simplicity of implementation and scalability should be considered.

### 3.	Create a neural network using keras to predict the outcome of either of the arrythmia or abalone datasets

In [1]:
import keras
import numpy as np
import pandas as pd

from keras.layers import Dense
from keras.models import Sequential

from numpy.random import seed
seed(42)

In [2]:
with open('abalone.names') as f:
    print(f.read())

1. Title of Database: Abalone data

2. Sources:

   (a) Original owners of database:
	Marine Resources Division
	Marine Research Laboratories - Taroona
	Department of Primary Industry and Fisheries, Tasmania
	GPO Box 619F, Hobart, Tasmania 7001, Australia
	(contact: Warwick Nash +61 02 277277, wnash@dpi.tas.gov.au)

   (b) Donor of database:
	Sam Waugh (Sam.Waugh@cs.utas.edu.au)
	Department of Computer Science, University of Tasmania
	GPO Box 252C, Hobart, Tasmania 7001, Australia

   (c) Date received: December 1995


3. Past Usage:

   Sam Waugh (1995) "Extending and benchmarking Cascade-Correlation", PhD
   thesis, Computer Science Department, University of Tasmania.

   -- Test set performance (final 1044 examples, first 3133 used for training):
	24.86% Cascade-Correlation (no hidden nodes)
	26.25% Cascade-Correlation (5 hidden nodes)
	21.5%  C4.5
	 0.0%  Linear Discriminate Analysis
	 3.57% k=5 Nearest Neighbour
      (Problem encoded as a classification task)

   -- Data set samp

In [3]:
f.close()

abalones = pd.read_csv('abalone.data', names = ['Sex', 'Length', 'Diameter', 'Height', 'Whole weight', 'Shucked weight', 'Viscera weight', 'Shell weight', 'Rings'])
abalones.head(2)

Unnamed: 0,Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7


In [4]:
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder

columns_to_encode = ['Sex']
columns_to_scale = ['Length', 'Diameter', 'Height', 'Whole weight', 'Shucked weight', 'Viscera weight', 'Shell weight']

scaler = StandardScaler()
ohe    = OneHotEncoder(sparse=False)

numeric_transformer = StandardScaler()
categorical_transformer = OneHotEncoder(handle_unknown="ignore")

preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, columns_to_scale),
        ("cat", categorical_transformer, columns_to_encode),
    ]
)

predictors_df = abalones.drop('Rings', axis = 1)
predictors = preprocessor.fit_transform(predictors_df)
target = abalones['Rings']

X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.2, random_state=42)

n_cols = X_train.shape[1]

model = Sequential()
model.add(Dense(500, activation = 'relu', input_shape = (n_cols, )))
model.add(Dense(500, activation = 'relu'))
model.add(Dense(500, activation = 'relu'))
model.add(Dense(1))
model.compile(optimizer = 'adam', loss = 'mean_squared_error')

In [9]:
model.fit(X_train, y_train)



<keras.callbacks.History at 0x1b9debcc2e0>

In [10]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((3341, 10), (836, 10), (3341,), (836,))

Here I played a bit with the model manually: adding one more layer or adding more neurons per layer did not improve the performance any more.

In [11]:
predictions = model.predict(X_test)

In [12]:
from sklearn.metrics import mean_squared_error as MSE
mse = MSE(predictions, y_test)

print(f'The mean squared error of the prediction is: {mse}')

The mean squared error of the prediction is: 5.67324539510749


### 4.	Write another algorithm to predict the same result as the previous question using either KNN or logistic regression.

In [13]:
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import KFold

neigh = KNeighborsRegressor()

kfold = KFold(n_splits=3, shuffle=True, random_state=42)

# Define candidate hyperparameters
parameters = [{'n_neighbors': [2,3,4,5,6], 'weights': ['uniform','distance']}]

# Search for best hyperparameters
grid = GridSearchCV(estimator=neigh, param_grid=parameters, cv=kfold, scoring='r2')
grid.fit(X_train, y_train)

GridSearchCV(cv=KFold(n_splits=3, random_state=42, shuffle=True),
             estimator=KNeighborsRegressor(),
             param_grid=[{'n_neighbors': [2, 3, 4, 5, 6],
                          'weights': ['uniform', 'distance']}],
             scoring='r2')

In [14]:
# Get the results
print(grid.best_score_)
print(grid.best_estimator_)
print(grid.best_params_)

0.4900224988507879
KNeighborsRegressor(n_neighbors=6, weights='distance')
{'n_neighbors': 6, 'weights': 'distance'}


In [15]:
neigh_best = KNeighborsRegressor(n_neighbors = 6, weights = 'distance')

neigh_best.fit(X_train, y_train)

y_pred = neigh_best.predict(X_test)

mse = MSE(y_pred, y_test)

print(f'The mean squared error of the prediction is: {mse}')

The mean squared error of the prediction is: 5.227363651926104


### 5.	Create a neural network using pytorch to predict the same result as question 3. 

In [29]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

relu = nn.ReLU()

class REG_Model(nn.Module):
    def __init__(self, input_features = 10, hidden1 = 500, hidden2 = 500, out_features = 1):
        super().__init__()
        self.fc1 = nn.Linear(input_features, hidden1)
        self.fc2 = nn.Linear(hidden1, hidden2)
        self.fc2 = nn.Linear(hidden1, hidden2)
        self.fc3 = nn.Linear(hidden2, out_features)
        
    def forward(self, x):
        # apply activation functions
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [30]:
reg = REG_Model()
criterion = nn.MSELoss()
optimizer = optim.Adam(reg.parameters(), lr = 0.003)

#criterions = [nn.L1Loss(), nn.MSELoss()]

final_loss = []

X_torch_train = torch.FloatTensor(X_train) 
X_torch_test = torch.FloatTensor(X_test)

y_torch_train = torch.FloatTensor(y_train)

for epoch in range(200): # loop over the dataset multiple times        
        #forward + backward + optimize
        outputs = reg(X_torch_train)
        loss = criterion(outputs, y_torch_train)
        final_loss.append(loss)
    
        if epoch % 40 == 1:
            print(f'Epoch number: {epoch} with loss: {loss}')
        
        optimizer.zero_grad() #zero the gradient before running backwards propagation
        loss.backward()
        optimizer.step()

  return F.mse_loss(input, target, reduction=self.reduction)


Epoch number: 1 with loss: 78.01948547363281
Epoch number: 41 with loss: 10.637167930603027
Epoch number: 81 with loss: 10.329761505126953
Epoch number: 121 with loss: 10.295011520385742
Epoch number: 161 with loss: 10.290472030639648


In [20]:
X_torch_train.shape, y_torch_train.shape, X_torch_test.shape

(torch.Size([3341, 10]), torch.Size([3341]), torch.Size([836, 10]))

In [31]:
#predictions
y_pred = []

with torch.no_grad(): #decreases memory consumption
    for i, data in enumerate(X_torch_test):
        prediction = reg(data)
        y_pred.append(prediction)

In [32]:
mse = MSE(y_pred, y_test)

print(f'The mean squared error of the prediction is: {mse}')

The mean squared error of the prediction is: 10.83816372437589


  array = np.asarray(array, order=order, dtype=dtype)
  array = np.asarray(array, order=order, dtype=dtype)
  y_true = check_array(y_true, ensure_2d=False, dtype=dtype)


### 6.	Compare the performance of the neural networks to the other model you created. Which performed better? Why do you think that is?

Keras and KNR performed about the same (KNR was optimized though, while the model built in KERAS was optimized only in terms of increasing the neuron count manually until the loss improved), order changing upon subsequent runs with different training sets.

Torch though performed worse and stopped improving after epoch 80, even though I set the number of layers, neurons per layer, loss function and optimizer the same as in KERAS.