# ECE4782 Deep Learning Labs
## 1. Feed-forward Neural Network

In this chapter, we will learn how to implement a feed-forward neural network by using PyTorch.

### Preparing the dataset

We will use the dataset from [Breast Cancer Wisconsin (Diagnostic) Data Set](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29) at UCI Machine Learning Repository.

We will use Pandas to conveniently load the data from its URL directly.
Pandas is useful for preprocess such structured data and easy to connect to most machine learning pipelines, e.g. Scikit-learn and PyTorch we are using here, since it uses Numpy as its backend.

In [1]:
import pandas as pd

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', header=None)
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,1000025,5,1,1,1,2,1,3,1,1,2
1,1002945,5,4,4,5,7,10,3,2,1,2
2,1015425,3,1,1,1,2,2,3,1,1,2
3,1016277,6,8,8,1,3,4,3,7,1,2
4,1017023,4,1,1,3,2,1,3,1,1,2
...,...,...,...,...,...,...,...,...,...,...,...
694,776715,3,1,1,1,3,2,1,1,1,2
695,841769,2,1,1,1,2,1,1,1,1,2
696,888820,5,10,10,3,7,3,8,10,2,4
697,897471,4,8,6,4,3,4,10,6,1,4


As it is decribed in the [dataset description](https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.names), each row contains the following information.

Column 1 is the sample code (ID) number, Column 2 - Column 10 contain feature values ranging from 1 to 10, the last column (Column 11) is the class label 2 for benign and 4 for malignant).

According to the documentation, there are missing values, which are denoted by "?". We need to take care of them by either removing or imputing some values.

In [2]:
import numpy as np
df = df.replace('?',np.NaN)

Let's use the median values to impute those missing values

In [3]:
df = df.fillna(df.median())

We will transfrom the label accordingly.

In [4]:
labels = df[10].values
labels

array([2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 4, 2, 4, 4, 2, 2, 4, 2, 4, 4,
       2, 4, 2, 4, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 4, 2, 4, 4, 2, 4, 4, 4,
       4, 2, 4, 2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 2, 4,
       2, 4, 4, 2, 2, 4, 2, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 2, 4,
       2, 4, 4, 4, 2, 2, 2, 4, 2, 2, 2, 2, 4, 4, 4, 2, 4, 2, 4, 2, 2, 2,
       4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 4, 2, 2, 4, 2, 4, 4, 2,
       2, 4, 2, 2, 2, 4, 4, 2, 2, 2, 2, 2, 4, 4, 2, 2, 2, 2, 2, 4, 4, 4,
       2, 4, 2, 4, 2, 2, 2, 4, 4, 2, 4, 4, 4, 2, 4, 4, 2, 2, 2, 2, 2, 2,
       2, 2, 4, 4, 2, 2, 2, 4, 4, 2, 2, 2, 4, 4, 2, 4, 4, 4, 2, 2, 4, 2,
       2, 4, 4, 4, 4, 2, 4, 4, 2, 4, 4, 4, 2, 4, 2, 2, 4, 4, 4, 4, 2, 2,
       2, 2, 2, 2, 4, 4, 2, 2, 2, 4, 2, 4, 4, 4, 2, 2, 2, 2, 4, 4, 4, 4,
       4, 2, 4, 4, 4, 2, 4, 2, 4, 4, 2, 2, 2, 2, 2, 4, 2, 2, 4, 4, 4, 4,
       4, 2, 4, 4, 2, 2, 4, 4, 2, 4, 2, 2, 2, 4, 4,

In [5]:
labels = labels / 2 - 1
labels

array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 1., 1., 0.,
       0., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0.,
       0., 0., 1., 0., 1., 1., 0., 1., 1., 1., 1., 0., 1., 0., 0., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 0., 1.,
       1., 0., 0., 1., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
       1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1.,
       0., 1., 1., 1., 1., 1., 0., 1., 0., 1., 1., 1., 0., 0., 0., 1., 0.,
       0., 0., 0., 1., 1., 1., 0., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 1., 0., 1., 1.,
       0., 0., 1., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0.,
       0., 0., 0., 1., 1., 1., 0., 1., 0., 1., 0., 0., 0., 1., 1., 0., 1.,
       1., 1., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0.,
       0., 1., 1., 0., 0., 0., 1., 1., 0., 1., 1., 1., 0., 0., 1., 0., 0.,
       1., 1., 1., 1., 0.

In [6]:
labels.shape

(699,)

Extract feature data also.

In [7]:
data = df.loc[:, 1:9].values
data

array([[5, 1, 1, ..., 3, 1, 1],
       [5, 4, 4, ..., 3, 2, 1],
       [3, 1, 1, ..., 3, 1, 1],
       ...,
       [5, 10, 10, ..., 8, 10, 2],
       [4, 8, 6, ..., 10, 6, 1],
       [4, 8, 8, ..., 10, 4, 1]], dtype=object)

In [8]:
data.shape

(699, 9)

Let's split the dataset into training and test subsets

In [9]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, random_state=4782)

In [10]:
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

(559, 9)
(559,)
(140, 9)
(140,)


In this lab, We will use features scaled into values between 0 and 1. &#8251;`MaxAbsScaler` scales the values between -1 and 1, but we have positive values only.

Since we calcaulte the scale using training set only, we have to, the features in the test set may have different scales. It applies same for different scalers even if you want to use another one.

In [11]:
from sklearn.preprocessing import MaxAbsScaler
scaler = MaxAbsScaler().fit(X_train)
X_train_transformed = scaler.transform(X_train)
X_test_transformed = scaler.transform(X_test)

In [12]:
import numpy as np
print("Train - Min: {}, Max: {}".format(np.min(X_train_transformed), np.max(X_train_transformed)))
print("Test  - Min: {}, Max: {}".format(np.min(X_test_transformed), np.max(X_test_transformed)))

Train - Min: 0.1, Max: 1.0
Test  - Min: 0.1, Max: 1.0


### Feedforward Neural Network

Now, we will train a feed-forward neural network by using PyTorch. We will do the following steps in order:

1. Load the training and test datasets using DataLoader
2. Define a Feedforwad Neural Network
3. Define a loss function
4. Train the network on the training data
5. Test the network on the test data

#### 1. Loading datasets
We will use DataLoader and TensorDataset (from [torch.utils.data](http://pytorch.org/docs/master/data.html#)) for convinience in data handling. You can create your custom dataset class by inheriting Dataset with some required member functions.

In [13]:
import torch
from torch.utils.data import DataLoader, TensorDataset

# lets fix the random seeds for reproducibility.
torch.manual_seed(4782)
if torch.cuda.is_available():
    torch.cuda.manual_seed(4782)

trainset = TensorDataset(torch.from_numpy(X_train_transformed.astype('float32')), torch.from_numpy(y_train.astype('float32')).view(-1,1))
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)

testset = TensorDataset(torch.from_numpy(X_test_transformed.astype('float32')), torch.from_numpy(y_test.astype('float32')).view(-1,1))
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)

Let's check some training samples

In [14]:
# get some random training samples
dataiter = iter(trainloader)
records, labels = dataiter.next()

print(records)
print(labels)

tensor([[0.8000, 0.6000, 0.4000, 0.3000, 0.5000, 0.9000, 0.3000, 0.1000, 0.1000],
        [0.3000, 0.1000, 0.1000, 0.1000, 0.2000, 0.1000, 0.2000, 0.1000, 0.2000],
        [0.5000, 0.1000, 0.1000, 0.1000, 0.2000, 0.1000, 0.2000, 0.1000, 0.1000],
        [0.5000, 0.8000, 0.8000, 0.8000, 0.5000, 1.0000, 0.7000, 0.8000, 0.1000]])
tensor([[1.],
        [0.],
        [0.],
        [1.]])


#### 2. Define a Feed-forward Neural Network

Next, we will define a model, feed-forward neural network for this chapter..
For simplicity, we will use 3-layer, 2 hidden layers and 1 hidden-to-output layer, feed-forward net. Each layer is a fully-connected layer where the module `torch.nn.Linear` is the implementation of it. Also, we will apply ReLU activation for each layer.

Basically, we are required to define a member method of `forward(self, x)` when we define a class for any customized network. It represents a forward pass of a computational graph and a backward pass (back-propagation) with automatic differentiation will be performed later based on this forward definition.

Usually, we define layers of entire network structure at the constructor of the class `__init__` with some arguments. Then, define `forward` function for forward computation based on the layers defined in the constructor.

In [15]:
# from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F


class FeedForwardNet(nn.Module):
    def __init__(self, n_input, n_hidden, n_output):
        super(FeedForwardNet, self).__init__()
        self.hidden1 = nn.Linear(n_input, n_hidden)
        self.hidden2 = nn.Linear(n_hidden, n_hidden)
        self.out = nn.Linear(n_hidden, n_output)

    def forward(self, x):
        x = F.relu(self.hidden1(x))
        x = F.relu(self.hidden2(x))
        x = self.out(x)
        return x

net = FeedForwardNet(n_input=9, n_hidden=64, n_output=1)

#### 3. Define a Loss function and Optimizer
We will use Binary Cross Entropy loss and SGD with momentum as our optimizer.
PyTorch provide BCEWithLogitsLoss loss function which combines a Sigmoid layer and the BCEloss together and it is more numerically stable than using them separately. **Keep in mind that you should not apply sigmoid activation after the output layer in the model class definition to use this combined loss.** See the last computation in `forward` function above.

When we create an optimizer in PyTorch, we need to pass parameters that we want to optimize (train) as input arguments. We can retrieve all trainable parameters of the model by calling `MODEL.parameters()`.

In [16]:
import torch.optim as optim

criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

#### 4. Train the network

Now, we will actually train the model.
For each full coverage of train dataset, we just need to do a forward pass computation with a mini-batch of dataset and a backward pass to compute gradients followed by a step of optimization.
We need to do this for a reasonable number of iterations.

In [17]:
for epoch in range(10):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data

        # wrap them in Variable (deprecated)
        # inputs, labels = Variable(inputs), Variable(labels)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        # backward
        loss.backward()
        # optimize
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        
        if i % 10 == 9:    # print every 10 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 10))
            running_loss = 0.0

print('Finished Training')

[1,    10] loss: 0.689
[1,    20] loss: 0.687
[1,    30] loss: 0.683
[1,    40] loss: 0.679
[1,    50] loss: 0.677
[1,    60] loss: 0.665
[1,    70] loss: 0.673
[1,    80] loss: 0.656
[1,    90] loss: 0.648
[1,   100] loss: 0.636
[1,   110] loss: 0.634
[1,   120] loss: 0.625
[1,   130] loss: 0.645
[1,   140] loss: 0.639
[2,    10] loss: 0.652
[2,    20] loss: 0.628
[2,    30] loss: 0.639
[2,    40] loss: 0.610
[2,    50] loss: 0.608
[2,    60] loss: 0.588
[2,    70] loss: 0.579
[2,    80] loss: 0.599
[2,    90] loss: 0.612
[2,   100] loss: 0.602
[2,   110] loss: 0.551
[2,   120] loss: 0.603
[2,   130] loss: 0.546
[2,   140] loss: 0.534
[3,    10] loss: 0.555
[3,    20] loss: 0.580
[3,    30] loss: 0.546
[3,    40] loss: 0.549
[3,    50] loss: 0.563
[3,    60] loss: 0.503
[3,    70] loss: 0.522
[3,    80] loss: 0.531
[3,    90] loss: 0.480
[3,   100] loss: 0.513
[3,   110] loss: 0.464
[3,   120] loss: 0.496
[3,   130] loss: 0.488
[3,   140] loss: 0.462
[4,    10] loss: 0.494
[4,    20] 

#### 5. Test the network on the test data

As we do always, we will calculate a test set performance.
To utilize scikit-learn pacakges, we need to convert PyTorch Tensor to Numpy ndarray by simply calling `TENSOR.numpy()`. **Note again, Tensor and corresponding ndarray share the memory.**

In [18]:
y_true = []
y_scores = []

In [19]:
for data in testloader:
    inputs, labels = data
    outputs = net(inputs)
    outputs = torch.sigmoid(outputs)  # since we have no activation at the end of output layer
    y_true.extend(labels.numpy().flatten().tolist())
    y_scores.extend(outputs.data.numpy().flatten().tolist())

In [20]:
from sklearn.metrics import roc_curve, auc
fpr, tpr, _ = roc_curve(y_true, y_scores)
auc_ffnet = auc(fpr, tpr)
auc_ffnet

0.991817398794143

We have finished a simple example to be familiar with PyTorch. Let's try other types of neural networks in the following chapters.

### Exercise 1: Try to use GPU if you have one

### Exercise 2: How is the result comparing to the previous lab, e.g. SVM? Is it better or worse? If it is worse, can you improve the performance of the network?

### Exercise 3: How could you check whether the network underfit, overfit or well-fit?