## HW 9: Neural Networks 
Noemi Turner <br>
CPSC 323 <br>
Professor Morehead <br>
11/3/2022 <br>
Description: This project gets some data and then fits a Neural Network model to the data.



## Dataset

UCI Machine Learning Abalone Dataset: https://archive.ics.uci.edu/ml/datasets/Abalone

The goal is to try to predict the number of rings

### Import Statements

In [None]:
# Mount data from drive
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
import csv
import numpy as np
from sklearn.model_selection import train_test_split

Load in Data 

In [None]:
header = ["Sex", "Length", "Diameter", "Height", "Whole weight", "Shucked weight", "Viscera weight", "Shell weight", "Rings"]
data = []

with open('/content/gdrive/MyDrive/Colab Notebooks/abalone.csv', encoding="utf8") as f:
    csv_reader = csv.reader(f) 
    for line in csv_reader:
        row = []
        for e, element in enumerate(line):
            if e == 0:
                row.append(element)
            elif e == 8:
                row.append(int(element))
            else:
                row.append(float(element))
        data.append(row)

for i in range(5):
    print(data[i])  

['M', 0.455, 0.365, 0.095, 0.514, 0.2245, 0.101, 0.15, 15]
['M', 0.35, 0.265, 0.09, 0.2255, 0.0995, 0.0485, 0.07, 7]
['F', 0.53, 0.42, 0.135, 0.677, 0.2565, 0.1415, 0.21, 9]
['M', 0.44, 0.365, 0.125, 0.516, 0.2155, 0.114, 0.155, 10]
['I', 0.33, 0.255, 0.08, 0.205, 0.0895, 0.0395, 0.055, 7]


In [None]:
# get X values
X_header = ["Sex", "Length", "Diameter", "Height"]
X = []
for row in data:
    x = []
    for i in range(4):
        x.append(row[i])
    X.append(x)

# get y values (parallel to X)
y_header = ["Rings"]
y = []
for row in data:
    y.append(row[8])

# Discretize the data 

In [None]:
print("max:", max(y), "min:", min(y), "median:", np.median(y))

max: 29 min: 1 median: 9.0


Since it is simpler to predict binary data, I will discretize the data. 

Shells with 9 rings or less will be represented with 0s

Shells with more than 9 rings will be represented with 1s

In [None]:
def discretize(count):
  if count > 9:
    return 1 # higher number of rings -> older abalone
  else:
    return 0 # lower number of rings -> younger abalone
  
y = [discretize(count) for count in y]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, shuffle=True)

# Create Neural Network Using PyTorch

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5*5 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square, you can specify with a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = torch.flatten(x, 1) # flatten all dimensions except the batch dimension
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()
print(net)


Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


## Split data into train/test

## Fit a Neural Network to the data

# Make predictions using the test data

## Analysis and Performance

 
Measure of performance (how fast, how many steps, etc.) for training



In [None]:
# import time
# startTime = time.time()

# # fit the model with data
# model.fit(X_train, y_train)
# # Predict
# y_pred = model.predict(X_test)


# executionTime = (time.time() - startTime)
# print('Execution time in seconds: ' + str(executionTime))

Measure of performance for inference (running on the test data)

In [None]:
# import time
# startTime = time.time()

# # fit the model with data
# model.fit(X_test, y_train)
# # Predict
# y_pred = model.predict(X_test)


# executionTime = (time.time() - startTime)
# print('Execution time in seconds: ' + str(executionTime))

Measure accuracy, etc on the test set

Confusion Matrix

In [None]:
# import matplotlib.pyplot as plt
# from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# cm = confusion_matrix(y_test, y_pred)

# disp = ConfusionMatrixDisplay(confusion_matrix=cm,display_labels=[0,1])
# disp.plot()
# plt.show()

Accuracy

In [None]:
tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
accuracy = (tp + tn) / (tp + tn + fp + fn) 
print("Accuracy:", accuracy) 

Precision

In [None]:
precision = tp / (tp + fp)
print("Precision:", precision)

Recall

In [None]:
recall = tp /(tp + fn)
print("Recall:", recall)

F1 Score

In [None]:
f1_score = 2 * (precision * recall) / (precision + recall)
print("F1 Score:", f1_score)