# Assignment 5: Classification in PyTorch 


For this exercise, we will switch to an implementation in PyTorch. 
The goal of this exercise is to get used to some concepts in PyTorch, such as relying on the `torch.tensor` data structure, implementing the network, the loss functions, the training loop and accuracy computation, which we will apply to binary and categorical classification.

Please make sure that all your variables are compatible with `torch`.
For example, you cannot mix `torch.tensor`s and `numpy.ndarray`s in any part of the code.

## Dataset

We will use two different datasets, the *spambase* dataset https://archive.ics.uci.edu/ml/datasets/spambase for binary classification and the *wine* dataset https://archive.ics.uci.edu/ml/datasets/wine for categorical classification. Both datasets are available on the UCI Machine Learning repository. 
The binary classification dataset contains features extracted from emails, which are classified as either spam or not. 
The categorical classification dataset contains some manually selected features for three different types of wines. 
For the former, the class is provided in the last column of the data file, whereas for the latter, the first index provides class information.

Please run the code block below to download the data files.

In [1]:
import os
import torch
import pandas as pd
import csv

# download the two dataset files
dataset_files = {
  "spambase.data": "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/",
  "wine.data": "https://archive.ics.uci.edu/ml/machine-learning-databases/wine/"
}
for name, url in dataset_files.items():
  if not os.path.exists(name):
    import urllib.request
    urllib.request.urlretrieve(url+name, name)
    print ("Downloaded datafile", name)

### Task 1: Dataset Loading

The first task deals with the loading of the datasets. 
When training networks in PyTorch, all data needs to be stored as datatype ``torch.tensor``. 
The data should be split between input sets $\mathbf X = [\vec x^{[1]}, \ldots, \vec x^{[N]}]^T \in \mathbb R^{N\times D}$ and targets.
There is **no need to add a bias neuron to the input**, and the transposition of the data matrix is different from what we have seen before.

For the targets, we have to be more careful as there are differences w.r.t. the applied loss function.
For binary classification, we need $\mathbf T = [[t^{[1]}, \ldots, t^{[N]}]]$ to be in dimension $\mathbb R^{N\times1}$ and of type ``torch.float``.
For categorical classification, we only need the class indexes $\vec t = [t^{[1]}, \ldots, t^{[N]}]$ to be in dimension $\mathbb N^N$ and of type ``torch.long``.

Implement a function that returns both the input and the target data for a given dataset

Note:

1. You can use `csv.reader()` to read the dataset, or rely on other methods such as `pandas`.
2. For the wine dataset, subtract the target by `-1` to get the target values in range $\{0, 1, 2\}$.
3. Be aware both datasets are sorted w.r.t. their labels.

In [2]:
def dataset(dataset_file="wine.data"):
  # read dataset
  data = pd.read_csv(dataset_file, header=None).values.tolist()

  print (f"Loaded dataset with {len(data)} samples")
  
  # convert to torch.tensor
  data = torch.tensor(data)

  if dataset_file == "wine.data":
    # target is in the first column and needs to be converted to long
    X = data[:, 1:]
    T = data[:, :1].flatten() - 1
    T = T.long()
  else:
    # target is in the last column and needs to be of type float
    X = data[:, :-1]
    T = data[:, -1:].float()
  return X, T

In [3]:
X, T = dataset("wine.data")


Loaded dataset with 178 samples


### Test 1: Dataset Check

Test 1 assures the correctness of the data and target dimensions.

1. For the wine dataset, we make sure that the dataset is in the correct dimensions, i.e., $\mathbf X\in \mathbb R^{N\times D}$ and $\mathbf T \in \mathbb N^N$. And all class labels are in the correct range $[0, O-1]$ where $O$ is the number of classes.

2. For the spambase data, we assure that all dimensions are correct and that class labels are in range $\{0, 1\}$.

In [4]:
X, T = dataset("wine.data")

assert X.shape[1] == 13, X.shape[1]
assert torch.all(T >= 0) and torch.all(T <= 2)
assert T.dtype == torch.long

X, T = dataset("spambase.data")
assert X.shape[1] == 57, X.shape[1]
assert T.shape[1] == 1, T.shape[1]
assert torch.all(T >= 0) and torch.all(T <= 1)
assert T.dtype == torch.float

Loaded dataset with 178 samples


Loaded dataset with 4601 samples


### Task 2: Split Training and Validation Data

The data should be split into 80% for training and 20% for validation. Implement a function that takes the full dataset $(X,T)$ and returns $(X_t, T_t, X_v, T_v)$ accordingly.

Write a function that splits off training and validation samples from a given dataset. What do we need to assure before splitting?

In [5]:
def split_training_data(X,T,train_percentage=0.8):
  # shuffle data
  ## ??? why do we need to shuffle the data here and not after every epoch ???
  ## ??? why do I get really weired results without this???
  ## data is sorted by classes
  indices = torch.randperm(len(X))
  X = X[indices]
  T = T[indices]
  
  # split into training/validation dataset
  N = X.shape[0]
  # split pytorch tensor into training and validation
  X_train = X[:int(N * train_percentage)]
  T_train = T[:int(N * train_percentage)]
  X_val = X[int(N * train_percentage):]
  T_val = T[int(N * train_percentage):]

  assert X_train.shape[0] + X_val.shape[0] == X.shape[0]
  assert T_train.shape[0] + T_val.shape[0] == T.shape[0]
  return X_train, T_train, X_val, T_val

### Task 3: Input Data Standardization

As we have seen last week, the standardization of the data provides many advantages. 
Hence, in this task you should write a function that takes $(X_t,X_v)$ as input and standardizes them by subtracting the mean and dividing by the 
standard deviation of $X_t$, and returning the standardized versions of both. Assure that each input dimension is standardized individually.

Implement a function that standardizes all input data for the training and validation set.
Return the standardized data.

Note:

1. Use `torch.mean()` and `torch.std()` with the correct dimensions.

In [6]:
def standardize(X_train, X_val):
  # compute statistics
  mean = torch.mean(X_train)
  std = torch.std(X_train)

  # standardize both X_train and X_val
  X_train = X_train - mean / std
  X_val = X_train - mean / std
  return X_train, X_val

## Network Implementation

We will use a two-layer fully-connected network with $D$ input neurons, $K$ hidden neurons and $O$ output neurons. 
Depending on the task, $D$ and $O$ need to be selected appropriately, while $K$ is a parameter to play around with. 
In PyTorch, the easiest way to implement a network is by providing the requested sequence of layers to `torch.nn.Sequential`, which will build a network containing the given layers. 
We will use two `torch.nn.Linear` layers and one `torch.nn.Tanh` activation function in between. 
The network will return the logits $\vec z$ for a given input $\vec x$.


### Task 4: Implement Network

Implement a two-layer fully-connected network in PyTorch. 
The given network uses $\tanh$ as activation function, and provide the possibility to change the number of inputs $D$, the number of hidden neurons $K$ and the number of outputs $O$.

In [7]:
import torch

def Network(D, K, O):
  return torch.nn.Sequential(
    torch.nn.Linear(D, K),
    torch.nn.Tanh(),
    torch.nn.Linear(K, O)
  )

### Task 5: Accuracy Computation

To monitor the training process, we want to compute the accuracy. 
The function will obtain the logits $\vec z$ extracted from the network and the according target $t$. 
Assure that this function works both for binary and categorical classification. 
How can we identify, which of the two variants is currently required?

Note: you can make use of the following pytorch functions:

1. `torch.mean()` which computes the mean or average of the input tensor.
2. `torch.argmax()` which returns the indices of the maximum values of all elements of the tensor.

In [8]:
def accuracy(Z, T):
  # check if we have binary or categorical classification
  # for binary classification, we will have a two-dimensional target tensor
  if len(T.shape) == 2:
    # binary classification
    # If z is equal or larger than the threshold 0.5, then we predict 1, otherwise 0 
    # we use the .float() function to convert the boolean to a float
    # then we compare the prediction with the target and compute the mean
    
    # ??? our data is binary between 0 and 1, so why do you use 0 as threshold ???
    # So only if we use sigmoid activation function for binary or softmax for multi-class
    # after the last layer, we need to use 0.5 as threshold
    return torch.mean(((Z>=0).float() == T).float())

  else:
    # categorical classification
    # the argmax function returns the index of the maximum value
    # we use the .float() function to convert the boolean to a float
    # then we compare the prediction with the target and compute the mean
    # return torch.mean((torch.argmax(Z, dim=1).float() == T).float())
    
    # Y is the index of the maximum value in Z
    Y = torch.argmax(Z, dim=1)
    return torch.mean((Y == T).float())

### Test 2: Test Accuracy Function

Test 2 assures the correctness of your accuracy function in both binary and categorical cases. We make sure that the accuracy will compute the correct values.

In [9]:
# first, test binary classification
ZZ = torch.ones((20,1)) * -5.
ZZ[15:20] = 5
assert(abs(accuracy(ZZ,torch.zeros((20,1))) - 0.75) < 1e-8)
assert(abs(accuracy(ZZ,torch.ones((20,1))) - 0.25) < 1e-8)

# now, test categorical classification with 4 classes
ZZ = torch.ones((20,4)) * -5
ZZ[0:1,0] = 5
ZZ[1:4,1] = 5
ZZ[4:10,2] = 5
ZZ[10:20,3] = 5

assert(abs(accuracy(ZZ,torch.zeros(20)) - 0.05) < 1e-8)
assert(abs(accuracy(ZZ,torch.ones(20)) - 0.15) < 1e-8)
assert(abs(accuracy(ZZ,torch.ones(20)*2) - 0.3) < 1e-8)
assert(abs(accuracy(ZZ,torch.ones(20)*3) - 0.5) < 1e-8)
assert(abs(accuracy(ZZ,torch.tensor((0,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3))) - 1.) < 1e-8)

### Task 6: Training Loop

Implement a function that takes all necessary parameters to run a training on a given dataset.
In this week, we will run gradient descent, i.e., we will train on the whole dataset in each training step, so there is no need to define anything related to batches. 
Select the optimizer to be `torch.optim.SGD`. 

Implement a training loop over 10'000 epochs with a learning rate of $\eta=0.1$. 
Make sure that you train on the training data only, and **not** on the validation data.
In each loop, compute and store the training loss, training accuracy, validation loss and validation accuracy. 
At the end, return the lists of these values.

Note:

1. When storing accuracy or loss values in a list, make sure to convert the to float via `v.item()`.

In [10]:
def train(X_train, T_train, X_val, T_val, network, loss_function, epochs=1000, learning_rate=0.1):
  optimizer = torch.optim.SGD(
    network.parameters(), 
    lr=learning_rate,
  )

  # collect loss and accuracy values
  train_loss, train_acc, val_loss, val_acc = [], [], [], []

  for epoch in range(0, epochs):
    # train on training set
    # ... compute network output on training data
    x_hat = network(X)
    # ... compute loss from network output and target data
    loss = loss_function(x_hat, T)
    # ... perform parameter update
    loss.backward()
    optimizer.step()
    # ... remember loss
    train_loss.append(loss)
    # ... compute training set accuracy
    acc = accuracy(x_hat, T)
    train_acc.append(acc)

    # test on validation data
    with torch.no_grad():
      # ... compute network output on training data
      x_hat = Network(X)
      # ... compute loss from network output and target data
      loss = loss_function(x_hat, T)
      # ... remember loss
      val_loss.append(loss)
      # ... compute validation set accuracy
      acc = accuracy(x_hat, T)
      val_acc.append(acc)

  # return the four lists of losses and accuracies
  return train_loss, train_acc, val_loss, val_acc

## Training and Evaluation

Finally, we want to train our network on our data and plot the accuracy and loss values that were obtained through the epochs. 
Exemplary plots can be found in the exercise slides.


### Task 7: Plotting Function

Implement a function that takes four lists containing the training loss, the training accuracy, the validation loss and the validation accuracy and plot them into two plots. 
The first plot should contain the loss values for both training and validation. The second plot should contain the according accuracy values.

Note:

1. You might need to convert remaining `torch.tensor` values to `float`, lists, or `numpy.nadrray` for plotting.

In [11]:
from matplotlib import pyplot
def plot(train_loss, train_acc, val_loss, val_acc):
  pyplot.figure(figsize=(10,3))
  ax = pyplot.subplot(121)
  ax.plot(..., "g-", label="Training set loss")
  ax.plot(..., "b-", label="Validation set loss")
  ax.legend()

  ax = pyplot.subplot(122)
  ax.plot(..., "g-", label="Training set accuracy")
  ax.plot(..., "b-", label="Validation set accuracy")
  ax.legend()

### Task 8: Binary Classification


1. Load the data for binary classification, using the ``"spambase.data"`` file.
2. Split the data into training and validation sets.
3. Standardize both training and validation input data using the function from Task 3.
4. Instantiate a network with the correct number of input neurons, a reasonable number of $K$ hidden neurons and one output neuron.

Which loss function do we need for this task?

Train the network with our data for 10'000 epochs and plot the training and validation accuracies and losses.

In [14]:
# define loss function
loss_function = torch.nn.BCEWithLogitsLoss()
# load dataset
X, T = dataset()
# split dataset
X_train, T_train, X_val, T_val = split_training_data(X, T)
# standardize input data
X_train, X_val = standardize(X_train, X_val)
# instantiate network
network = Network(X.shape[1], K = 10, O = 1)

# train network on our data
results = train(X_train, T_train, X_val, T_val, network, loss_function)

# plot the results
plot(...)

Loaded dataset with 178 samples


ValueError: Target size (torch.Size([178])) must be the same as input size (torch.Size([178, 1]))

### Task 9: Categorical Classification

Perform the same tasks with the ``"wine.data"`` dataset for categorical classification. 
How many input and output neurons do we need?
Change the number of input, hidden, and output neurons accordingly.

Select the appropriate loss function for categorical classification.
Which loss function will we need this time?

How many hidden neurons will we need to get 100% training set accuracy?

In [None]:
# define loss function
loss = ...
# load dataset
X, T = ...
# split dataset
X_train, T_train, X_val, T_val = ...
# standardize input data
X_train, X_val = ...
# instantiate network
network = ...

# train network on our data
results = ...

# plot the results
plot(...)