# Artificial Intelligence
# 464/664
# Assignment #7

## General Directions for this Assignment

00. We're using a Jupyter Notebook environment (tutorial available here: https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html),
01. Output format should be exactly as requested (it is your responsibility to make sure notebook looks as expected on Gradescope),
02. Check submission deadline on Gradescope, 
03. Rename the file to Last_First_assignment_7, 
04. Submit your notebook (as .ipynb, not PDF) using Gradescope, and
05. Do not submit any other files.

## Before You Submit...

1. Re-read the general instructions provided above, and
2. Hit "Kernel"->"Restart & Run All".

## Neural Networks

For this assignment we will implement a Neural Network. The dataset is the same dataset from Assignment #6. The goal is to classify a mushroom as either edible ('e') or poisonous ('p'). You are free to use PyTorch, TensorFlow, scikit-learn, to name a few resources; or code your own.


Your output should look kind of like the output of `cross_validate` from Assignment #6:

```
Fold: 0	Train Error: 15.38%	Validation Error: 0.00%
```

It doesn't have to be exactly the same.


Notice that "Test Error" has been replaced by "Validation Error." Split your dataset into train, test, and validation sets. 


Start with a simple network. Train using the train set. Observe model's performance using the validation set. 


Increase the complexity of your network. Train using the train set. Observe model's performance using the validation set. 


Model complexity in Assignment #6 was depth limit. You can think of it here as the architecture of the network (number of layers and units per layer). 


We're trying to find a model complexity that generalizes well. (Recall high bias vs high variance discussion in class.) 


Pick the network architecture that you deem best. Use the test set to report your winning model's performance. 


No other directions for this assignment, other than what's here and in the "General Directions" section. You have a lot of freedom with this assignment.

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import accuracy_score
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

random_state = 1

# Exploratory Analysis

In [2]:
col_names = ['target', 'cap-shape',
                   'cap-surface',
                   'cap-color',
                   'bruises?',
                   'odor',
                   'gill-attachment',
                   'gill-spacing',
                   'gill-size',
                   'gill-color',
                   'stalk-shape',
                   'stalk-root',
                   'stalk-surface-above-ring',
                   'stalk-surface-below-ring',
                   'stalk-color-above-ring',
                   'stalk-color-below-ring',
                   'veil-type',
                   'veil-color',
                   'ring-number',
                   'ring-type',
                   'spore-print-color',
                   'population',
                   'habitat']
df = pd.read_csv("agaricus-lepiota.data", names = col_names, index_col=False)

In [3]:
df.head()

Unnamed: 0,target,cap-shape,cap-surface,cap-color,bruises?,odor,gill-attachment,gill-spacing,gill-size,gill-color,...,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
0,p,x,s,n,t,p,f,c,n,k,...,s,w,w,p,w,o,p,k,s,u
1,e,x,s,y,t,a,f,c,b,k,...,s,w,w,p,w,o,p,n,n,g
2,e,b,s,w,t,l,f,c,b,n,...,s,w,w,p,w,o,p,n,n,m
3,p,x,y,w,t,p,f,c,n,n,...,s,w,w,p,w,o,p,k,s,u
4,e,x,s,g,f,n,f,w,b,k,...,s,w,w,p,w,o,e,n,a,g


# Preprocessing

In [4]:
# Selection
X = np.asarray(df[col_names[1:]])
y = np.asarray(df[col_names[0]])

In [5]:
# Encoding
X_encoder = OrdinalEncoder()
X = X_encoder.fit_transform(X)
y_encoder = LabelEncoder()
y = y_encoder.fit_transform(y).reshape(-1, 1)

In [6]:
# Splitting Data
X_train, X_test, y_train, y_test = train_test_split(torch.tensor(X, dtype = torch.float32), torch.tensor(y, dtype = torch.float32), test_size=0.2, random_state = random_state)

# Model Training

## Training Hyperparameters 

In [7]:
numEpochs = 1000
lr = 0.1
loss_fn = nn.BCELoss()
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state = random_state)

## Simple Model - 1 Hidden Layer

In [8]:
class Simple(nn.Module):
    def __init__(self, in_channels, out_channels):               
        super().__init__()
        self.pipe = nn.Sequential(nn.Linear(in_channels, in_channels // 2),   # 22, 11
                                   nn.ReLU(),
                                   nn.Linear(in_channels // 2, out_channels),  # 11, 1
                                   nn.Sigmoid())
        
    def forward(self, x):
        return self.pipe(x)

In [9]:
# Cross Validation 
for i, (train_index, val_index) in enumerate(skf.split(X_train, y_train)):
    model = Simple(22, 1)
    optimizer = optim.SGD(model.parameters(), lr = lr)
    model.train()
    train_acc = 0.0
    
    for epoch in range(numEpochs):
        # Forward propagation
        y_pred = model(X_train[train_index])
        
        # Calculate loss and accuracy
        loss = loss_fn(y_pred, y_train[train_index])
        train_acc += (torch.round(y_pred) == y_train[train_index]).float().mean()

        # Zero out the gradients before adding them up
        model.zero_grad()

        # Backprop
        loss.backward()

        # Optimization step
        optimizer.step()
        
    model.eval()
    y_pred = model(X_train[val_index])
    val_acc = (torch.round(y_pred) == y_train[val_index]).float().mean()
    print(f"Fold {i}:    Average Train Accuracy: {train_acc / numEpochs * 100:.02f}%    Validation Accuracy: {val_acc * 100:.02f}%")

Fold 0:    Average Train Accuracy: 93.23%    Validation Accuracy: 97.77%
Fold 1:    Average Train Accuracy: 92.64%    Validation Accuracy: 95.31%
Fold 2:    Average Train Accuracy: 93.77%    Validation Accuracy: 98.54%
Fold 3:    Average Train Accuracy: 94.91%    Validation Accuracy: 99.31%
Fold 4:    Average Train Accuracy: 93.43%    Validation Accuracy: 98.31%


In [10]:
# Training
simple_model = Simple(22, 1)
optimizer = optim.SGD(simple_model.parameters(), lr = lr)
simple_model.train()

for epoch in range(numEpochs):
    # Forward propagation
    y_pred = simple_model(X_train)

    # Calculate loss and accuracy
    loss = loss_fn(y_pred, y_train)
    if epoch % 99 == 0:
        acc = (torch.round(y_pred) == y_train).float().mean()
        print(f'Loss: {loss:.02f}    Train Accuracy: {acc * 100:.02f}%')

    # Zero out the gradients before adding them up
    simple_model.zero_grad()

    # Backprop
    loss.backward()

    # Optimization step
    optimizer.step()
    
simple_model.zero_grad()

Loss: 0.67    Train Accuracy: 51.81%
Loss: 0.37    Train Accuracy: 83.41%
Loss: 0.27    Train Accuracy: 89.63%
Loss: 0.22    Train Accuracy: 92.34%
Loss: 0.20    Train Accuracy: 91.28%
Loss: 0.15    Train Accuracy: 95.45%
Loss: 0.14    Train Accuracy: 94.63%
Loss: 0.11    Train Accuracy: 96.41%
Loss: 0.09    Train Accuracy: 97.48%
Loss: 0.07    Train Accuracy: 98.14%
Loss: 0.06    Train Accuracy: 98.48%


## Intermediate Model - 2 Hidden Layers

In [11]:
class Intermediate(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.hdim1 = in_channels // 2              
        self.hdim2 = self.hdim1 // 2                            
        
        self.pipe = nn.Sequential(nn.Linear(in_channels, self.hdim1),    # 22, 11
                                   nn.ReLU(),
                                   nn.Linear(self.hdim1, self.hdim2),    # 11, 5
                                   nn.ReLU(),
                                   nn.Linear(self.hdim2, out_channels),  # 5, 1
                                   nn.Sigmoid())
        
    def forward(self, x):
        return self.pipe(x)

In [12]:
# Cross Validation
for i, (train_index, val_index) in enumerate(skf.split(X_train, y_train)):
    model = Intermediate(22, 1)
    optimizer = optim.SGD(model.parameters(), lr = lr)
    model.train()
    train_acc = 0.0
        
    for epoch in range(numEpochs):
        # Forward propagation
        y_pred = model(X_train[train_index])
        
        # Calculate loss and accuracy
        loss = loss_fn(y_pred, y_train[train_index])
        train_acc += (torch.round(y_pred) == y_train[train_index]).float().mean()

        # Zero out the gradients before adding them up
        model.zero_grad()

        # Backprop
        loss.backward()

        # Optimization step
        optimizer.step()
        
    model.eval()
    y_pred = model(X_train[val_index])
    val_acc = (torch.round(y_pred) == y_train[val_index]).float().mean()
    print(f"Fold {i}:    Average Train Accuracy: {train_acc / numEpochs * 100:.02f}%    Validation Accuracy: {val_acc * 100:.02f}%")

Fold 0:    Average Train Accuracy: 91.15%    Validation Accuracy: 97.54%
Fold 1:    Average Train Accuracy: 92.62%    Validation Accuracy: 97.62%
Fold 2:    Average Train Accuracy: 91.05%    Validation Accuracy: 98.85%
Fold 3:    Average Train Accuracy: 92.73%    Validation Accuracy: 98.31%
Fold 4:    Average Train Accuracy: 91.93%    Validation Accuracy: 95.00%


In [13]:
# Training
intermediate_model = Intermediate(22, 1)
optimizer = optim.SGD(intermediate_model.parameters(), lr = lr)
intermediate_model.train()

for epoch in range(numEpochs):
    # Forward propagation
    y_pred = intermediate_model(X_train)

    # Calculate loss and accuracy
    loss = loss_fn(y_pred, y_train)
    if epoch % 99 == 0:
        acc = (torch.round(y_pred) == y_train).float().mean()
        print(f'Loss: {loss:.02f}    Train Accuracy: {acc * 100:.02f}%')

    # Zero out the gradients before adding them up
    intermediate_model.zero_grad()

    # Backprop
    loss.backward()

    # Optimization step
    optimizer.step()
    
intermediate_model.zero_grad()

Loss: 0.72    Train Accuracy: 52.13%
Loss: 0.48    Train Accuracy: 87.07%
Loss: 0.26    Train Accuracy: 90.86%
Loss: 0.29    Train Accuracy: 88.88%
Loss: 0.16    Train Accuracy: 94.43%
Loss: 0.10    Train Accuracy: 96.48%
Loss: 0.07    Train Accuracy: 97.77%
Loss: 0.05    Train Accuracy: 98.51%
Loss: 0.06    Train Accuracy: 98.17%
Loss: 0.07    Train Accuracy: 97.68%
Loss: 0.04    Train Accuracy: 99.09%


## Complex Model - 3 Hidden Layers

In [14]:
class Complex(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.hdim1 = in_channels // 2              
        self.hdim2 = self.hdim1 // 2               
        self.hdim3 = self.hdim2 // 2               
        
        self.pipe = nn.Sequential(nn.Linear(in_channels, self.hdim1),    # 22, 11
                                  nn.ReLU(), 
                                  nn.Linear(self.hdim1, self.hdim2),     # 11, 5
                                  nn.ReLU(),
                                  nn.Linear(self.hdim2, self.hdim3),     # 5, 2
                                  nn.ReLU(),
                                  nn.Linear(self.hdim3, out_channels),   # 2, 1
                                  nn.Sigmoid())
        
    def forward(self, x):
        return self.pipe(x)

In [15]:
# Cross Validation
for i, (train_index, val_index) in enumerate(skf.split(X_train, y_train)):
    model = Complex(22, 1)
    optimizer = optim.SGD(model.parameters(), lr = lr)
    model.train()
    train_acc = 0.0
        
    for epoch in range(numEpochs):
        # Forward propagation
        y_pred = model(X_train[train_index])
        
        # Calculate loss and accuracy
        loss = loss_fn(y_pred, y_train[train_index])
        train_acc += (torch.round(y_pred) == y_train[train_index]).float().mean()

        # Zero out the gradients before adding them up
        model.zero_grad()

        # Backprop
        loss.backward()

        # Optimization step
        optimizer.step()
        
    model.eval()
    y_pred = model(X_train[val_index])
    val_acc = (torch.round(y_pred) == y_train[val_index]).float().mean()
    print(f"Fold {i}:    Average Train Accuracy: {train_acc / numEpochs * 100:.02f}%    Validation Accuracy: {val_acc * 100:.02f}%")

Fold 0:    Average Train Accuracy: 92.34%    Validation Accuracy: 98.54%
Fold 1:    Average Train Accuracy: 79.34%    Validation Accuracy: 94.54%
Fold 2:    Average Train Accuracy: 82.37%    Validation Accuracy: 96.54%
Fold 3:    Average Train Accuracy: 91.46%    Validation Accuracy: 94.54%
Fold 4:    Average Train Accuracy: 68.77%    Validation Accuracy: 93.53%


In [16]:
# Training
complex_model = Complex(22, 1)
optimizer = optim.SGD(complex_model.parameters(), lr = lr)
complex_model.train()

for epoch in range(numEpochs):
    # Forward propagation
    y_pred = complex_model(X_train)

    # Calculate loss and accuracy
    loss = loss_fn(y_pred, y_train)
    if epoch % 99 == 0:
        acc = (torch.round(y_pred) == y_train).float().mean()
        print(f'Loss: {loss:.02f}    Train Accuracy: {acc * 100:.02f}%')

    # Zero out the gradients before adding them up
    complex_model.zero_grad()

    # Backprop
    loss.backward()

    # Optimization step
    optimizer.step()
    
complex_model.zero_grad()

Loss: 0.71    Train Accuracy: 47.87%
Loss: 0.45    Train Accuracy: 84.18%
Loss: 0.36    Train Accuracy: 89.64%
Loss: 0.30    Train Accuracy: 90.11%
Loss: 0.29    Train Accuracy: 90.32%
Loss: 0.23    Train Accuracy: 93.12%
Loss: 0.11    Train Accuracy: 97.15%
Loss: 0.09    Train Accuracy: 97.66%
Loss: 0.07    Train Accuracy: 98.06%
Loss: 0.05    Train Accuracy: 98.74%
Loss: 0.26    Train Accuracy: 93.06%


# Model Testing

## Simple Model - 1 Hidden Layer

In [17]:
simple_model.eval()
y_pred = simple_model(X_test)
test_acc = (torch.round(y_pred) == y_test).float().mean()
print(f"Simple Model Test Accuracy: {test_acc * 100:.02f}%")

Simple Model Test Accuracy: 98.15%


## Intermediate Model - 2 Hidden Layers

In [18]:
intermediate_model.eval()
y_pred = intermediate_model(X_test)
test_acc = (torch.round(y_pred) == y_test).float().mean()
print(f"Intermediate Model Test Accuracy: {test_acc * 100:.02f}%")

Intermediate Model Test Accuracy: 96.43%


## Complex Model - 3 Hidden Layers

In [19]:
complex_model.eval()
y_pred = complex_model(X_test)
test_acc = (torch.round(y_pred) == y_test).float().mean()
print(f"Complex Model Test Accuracy: {test_acc * 100:.02f}%")

Complex Model Test Accuracy: 92.74%


<div style="background: lemonchiffon; margin:20px; padding: 20px;">
    <strong>Comment</strong>
    <p>
        Briefly comment here what the best architecture was and how it did on test data. 
    </p>
</div>

<b> During cross-validation, the simple and intermediate models achieved consistent 90+% training and validation accuracy on every fold. The complex model had mostly similar results albeit with a lower training accuracy in some folds. During testing, the simple model came in with the highest accuracy followed by the intermediate and complex model. Even though the difference in accuracy between the simple and intermediate model was only 2%, the decline in accuracy doubled for going from the intermediate to complex model. Due to these significant performance gaps and the fact that the simple model is less computationally expensive, the simple model is the best model for this situation. </b>

## Before You Submit...

1. Re-read the general instructions provided above, and
2. Hit "Kernel"->"Restart & Run All".