# COMP4660/8420 Lab 2.1 - Using PyTorch for Binary Classification

In this lab, you will build your own neural network to perform a basic classification task using PyTorch.
______

**Task:**
Build a neural network for a classification task. The dataset you are using is the Glass Identification data set located at http://archive.ics.uci.edu/ml/datasets/Glass+Identification

Have a look!

_**Q1. How many instances are in this data set?**_

_**Q2. How many attributes (features) are there?**_

_**Q3. What was this data set originally used for?**_

_**Q4. What is the output attribute? How many output values are there?**_

Download the data set “glass.data” and save it as a CSV file, i.e. “glass.csv”.

We will begin by simplifying the dataset so that it is only two classes. Make a copy of the original file and name it “glass_binary.csv”.

## Task:
**Implement a neural network that classifies the data set based on whether the type of glass is a Window glass or a Non-window glass.** The inputs for the neural network will be the refractive index and measurements for sodium, magnesium, aluminium, silicon, potassium, calcium, barium and iron.

Hint: Please refer to Task2 in Lab 1 or Introduction to PyTorch Basics if you don’t know how to start.

In [1]:
# import libraries
import numpy as np
import pandas as pd
import torch
import torch.nn.functional as F
from torch.autograd import Variable

## Step 1: Load and setup training dataset:
**Q5 the second line below removes all the data in the column labelled “Id number”. Why do you think we should not use this data to build a model?**

**What does the 4th line below do and why?** Hint: it sets all the values in the final column to either 0 or 1.

In [2]:
# load all data
data = pd.read_csv('dataset/glass/glass.csv',  header=None)

# drop first column
data.drop(data.columns[0], axis=1, inplace=True)

# try shuffle data
data = data.sample(frac=1).reset_index(drop=True)

data[data.shape[1]] = (data[data.shape[1]] < 5).astype(int)

# randomly split data into training set (80%) and testing set (20%)
msk = np.random.rand(len(data)) < 0.8
train_data = data[msk]
test_data = data[~msk]

n_features = train_data.shape[1] - 1

# split training data into input and target
# the first 9 columns are features, the last one is target
train_input = train_data.iloc[:, :n_features]
train_target = train_data.iloc[:, n_features]

# split training data into input and target
# the first 9 columns are features, the last one is target
test_input = test_data.iloc[:, :n_features]
test_target = test_data.iloc[:, n_features]

# create Tensors to hold inputs and outputs, and wrap them in Variables,
# as Torch only trains neural network on Variables
X = Variable(torch.Tensor(train_input.values).float())
Y = Variable(torch.Tensor(train_target.values).long())


## Step 2: Define a neural network

In [3]:
#TODO define the number of inputs, classes, training epochs, and learning rate
input_neurons = n_features
hidden_neurons = 5
output_neurons = 2
num_epochs = 500
learning_rate = 0.01

#TODO define a customised neural network structure
# class Net(torch.nn.Module):
#     def __init__(self, n_input, n_output):
#         super(Net, self).__init__()
#         self.out = torch.nn.Linear(n_input, n_output)
        
#     def forward(self, X):
#         y_pred = self.out(X)
#         return y_pred

class TwoLayerNet(torch.nn.Module):
    def __init__(self, n_input, n_hidden, n_output):
        super(TwoLayerNet, self).__init__()
        self.hidden = torch.nn.Linear(n_input, n_hidden)
        self.out = torch.nn.Linear(n_hidden, n_output)
    
    def forward(self, X):
        h_input = self.hidden(X)
        h_output = torch.nn.Sigmoid(h_input)
        y_pred = self.out(h_output)
        return y_pred
        
#TODO define a neural network using the customised structure 
# net = Net(input_neurons, output_neurons)
net = TwoLayerNet(input_neurons, hidden_neurons, output_neurons)

#TODO define loss function (https://pytorch.org/docs/stable/nn.html#loss-functions)
loss_func = torch.nn.CrossEntropyLoss()

#TODO define optimiser (https://pytorch.org/docs/stable/optim.html)
optimiser = torch.optim.SGD(net.parameters(), lr = learning_rate, momentum = 0.9)


## Step 3: Train and test the neural network

In [4]:
train_input = train_data.iloc[:, :n_features]
train_target = train_data.iloc[:, n_features]
# store all losses for visualisation
all_losses = []

# train a neural network
for epoch in range(num_epochs):
    # Perform forward pass: compute predicted y by passing x to the model.
    Y_pred = net(X)

    # Compute loss
    loss = loss_func(Y_pred, Y)
    all_losses.append(loss.item())

    # print progress
    if (epoch + 1) % 50 == 0:
        # convert three-column predicted Y values to one column for comparison
        _, predicted = torch.max(Y_pred, 1)

        # calculate and print accuracy
        total = predicted.size(0)
        correct = predicted.data.numpy() == Y.data.numpy()

        print('Epoch [%d/%d] Loss: %.4f  Accuracy: %.2f %%'
              % (epoch + 1, num_epochs, loss.item(), 100 * sum(correct)/total))

    # Clear the gradients before running the backward pass.
    net.zero_grad()

    # Perform backward pass
    loss.backward()

    # Calling the step function on an Optimiser makes an update to its
    # parameters
    optimiser.step()

# Optional: plotting historical loss from ``all_losses`` during network learning
# Please uncomment me from next line to ``plt.show()`` if you want to plot loss

import matplotlib.pyplot as plt

plt.figure()
plt.plot(all_losses)
plt.show()


TypeError: __init__() takes 1 positional argument but 2 were given

In [None]:
"""
Evaluating the Results

To see how well the network performs on different categories, we will
create a confusion matrix, indicating for every glass (rows)
which class the network guesses (columns).

"""

confusion = torch.zeros(output_neurons, output_neurons)

Y_pred = net(X)

_, predicted = torch.max(Y_pred, 1)

for i in range(train_data.shape[0]):
    actual_class = Y.data[i]
    predicted_class = predicted.data[i]

    confusion[actual_class][predicted_class] += 1

print('')
print('Confusion matrix for training:')
print(confusion)

In [None]:
"""
Step 3: Test the neural network

Pass testing data to the built neural network and get its performance
"""

# create Tensors to hold inputs and outputs, and wrap them in Variables,
# as Torch only trains neural network on Variables
X_test = Variable(torch.Tensor(test_input.values).float())
Y_test = Variable(torch.Tensor(test_target.values).long())
optimiser
# test the neural network using testing data
# It is actually performing a forward pass computation of predicted y
# by passing x to the model.
# Here, Y_pred_test contains three columns, where the index of the
# max column indicates the class of the instance
Y_pred_test = net(X_test)

# get prediction
# convert three-column predicted Y values to one column for comparison
_, predicted_test = torch.max(Y_pred_test, 1)

# calculate accuracy
total_test = predicted_test.size(0)
correct_test = sum(predicted_test.data.numpy() == Y_test.data.numpy())

print('Testing Accuracy: %.2f %%' % (100 * correct_test / total_test))

"""
Evaluating the Results

To see how well the network performs on different categories, we will
create a confusion matrix, indicating for every iris flower (rows)
which class the network guesses (columns).

"""

confusion_test = torch.zeros(output_neurons, output_neurons)

for i in range(test_data.shape[0]):
    actual_class = Y_test.data[i]
    predicted_class = predicted_test.data[i]

    confusion_test[actual_class][predicted_class] += 1

print('')
print('Confusion matrix for testing:')
print(confusion_test)

**Advanced Steps:**
1. We encourage you to experiment with different ways of accomplishing the task.
2. Explore the normalisation and pre-processing techniques discussed in the lectures and investigate its impact on the performance of the classification.
3. Investigate the performance of the neural network classification by changing the various characteristics of the neural network such as:
    * Number of neurons in each layer
    * Number of layers
    * Number of epochs
    * Learning rate

**Task 2: An advanced classification task in PyTorch**
Now we will work with a more complicated classification task, the original glass data set. Load the unmodified data set “glass.csv” into PyTorch and perform the same classification task as above.

**Q8. How many classes are you predicting now?**

**Q9. How will you represent these classes and how will you calculate the error of classification?**
