# CSE6250BDH Deep Learning Labs
## 1. Feed-forward Neural Network

In this chapter, we will learn how to implement a feed-forward neural network by using PyTorch.
Before moving to neural networks, let's refresh the modeling with Scikit-learn that we have already done in the lab [Spark-mllib](http://www.sunlab.org/teaching/cse6250/fall2017/lab/spark-mllib/#Scikit-learn) and we will compare the results. If you have not completed that part, please complete it first.

### SVM

In [1]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_svmlight_file
from sklearn.preprocessing import MaxAbsScaler

X, y = load_svmlight_file("patients.svmlight")
X = X.toarray() # make it dense

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=41)

In [2]:
scaler = MaxAbsScaler().fit(X_train)
X_train_transformed = scaler.transform(X_train)
X_test_transformed = scaler.transform(X_test)

In [3]:
from sklearn.svm import SVC
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform as sp_rand

In [4]:
# CV
param_dist = {'C': sp_rand(0, 1000), 'kernel': ['linear', 'poly', 'rbf'], 'class_weight': [None, 'balanced']}

svm_cv = RandomizedSearchCV(estimator=SVC(), param_distributions=param_dist, scoring='roc_auc', cv=10, n_iter=10, n_jobs=-1, verbose=1, random_state=42)
svm_cv.fit(X_train_transformed, y_train)

Fitting 10 folds for each of 10 candidates, totalling 100 fits


[Parallel(n_jobs=-1)]: Done  74 out of 100 | elapsed:    6.2s remaining:    2.2s
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    6.6s finished


RandomizedSearchCV(cv=10, error_score='raise',
          estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
          fit_params={}, iid=True, n_iter=10, n_jobs=-1,
          param_distributions={'C': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7fd84b14ce10>, 'kernel': ['linear', 'poly', 'rbf'], 'class_weight': [None, 'balanced']},
          pre_dispatch='2*n_jobs', random_state=42, refit=True,
          return_train_score=True, scoring='roc_auc', verbose=1)

In [5]:
# Full training
svm_full = SVC().set_params(**(svm_cv.best_estimator_.get_params()))
svm_full.set_params(probability=True)
svm_full.fit(X_train_transformed, y_train)

y_pred, y_score = svm_full.predict(X_test_transformed), svm_full.predict_proba(X_test_transformed)
y_score = y_score[:, 1]

In [6]:
from sklearn.metrics import roc_curve, auc
fpr, tpr, _ = roc_curve(y_test, y_score)
auc_svm = auc(fpr, tpr)

In [7]:
auc_svm

0.81812169312169325

### Feedforward Neural Network

Now, we will train a feed-forward neural network. We will do the following steps in order:

1. Load the training and test datasets using DataLoader
2. Define a Feedforwad Neural Network
3. Define a loss function
4. Train the network on the training data
5. Test the network on the test data

#### 1. Loading datasets
We will use DataLoader and TensorDataset (from [torch.utils.data](http://pytorch.org/docs/master/data.html#)) for convinience in data handling. You can create your custom dataset class by inheriting Dataset with some required member functions.

In [8]:
import torch
from torch.utils.data import DataLoader, TensorDataset

trainset = TensorDataset(torch.from_numpy(X_train_transformed.astype('float32')), torch.from_numpy(y_train.astype('float32')).view(-1,1))
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)

testset = TensorDataset(torch.from_numpy(X_test_transformed.astype('float32')), torch.from_numpy(y_test.astype('float32')).view(-1,1))
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)

Let's check some training samples

In [9]:
# get some random training samples
dataiter = iter(trainloader)
records, labels = dataiter.next()

print(records)
print(labels)


 0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
[torch.FloatTensor of size 4x9978]


 1
 1
 1
 0
[torch.FloatTensor of size 4x1]



#### 2. Define a Feed-forward Neural Network

In [46]:
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F


class FeedForwardNet(nn.Module):
    def __init__(self, n_input, n_hidden, n_output):
        super(FeedForwardNet, self).__init__()
        self.hidden1 = nn.Linear(n_input, n_hidden)
        self.hidden2 = nn.Linear(n_hidden, n_hidden)
        self.out = nn.Linear(n_hidden, n_output)

    def forward(self, x):
        x = F.relu(self.hidden1(x))
        x = F.relu(self.hidden2(x))
        x = self.out(x)
        return x

net = FeedForwardNet(n_input=9978, n_hidden=256, n_output=1)

#### 3. Define a Loss function and Optimizer
We will use Binary Cross Entropy loss and SGD with momentum as our optimizer.
PyTorch provide BCEWithLogitsLoss loss function which combines a Sigmoid layer and the BCEloss together and it is more numerically stable than using them separately. Keep in mind that you should not apply sigmoid activation after the output layer to use this combined loss.

In [47]:
import torch.optim as optim

criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

#### 4. Train the network

In [48]:
for epoch in range(10):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data

        # wrap them in Variable
        inputs, labels = Variable(inputs), Variable(labels)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.data[0]
        
        if i % 10 == 9:    # print every 10 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 10))
            running_loss = 0.0

print('Finished Training')

[1,    10] loss: 0.689
[1,    20] loss: 0.687
[1,    30] loss: 0.684
[1,    40] loss: 0.674
[1,    50] loss: 0.679
[1,    60] loss: 0.669
[2,    10] loss: 0.666
[2,    20] loss: 0.679
[2,    30] loss: 0.669
[2,    40] loss: 0.653
[2,    50] loss: 0.677
[2,    60] loss: 0.659
[3,    10] loss: 0.651
[3,    20] loss: 0.655
[3,    30] loss: 0.673
[3,    40] loss: 0.644
[3,    50] loss: 0.644
[3,    60] loss: 0.680
[4,    10] loss: 0.632
[4,    20] loss: 0.639
[4,    30] loss: 0.662
[4,    40] loss: 0.688
[4,    50] loss: 0.625
[4,    60] loss: 0.662
[5,    10] loss: 0.603
[5,    20] loss: 0.629
[5,    30] loss: 0.650
[5,    40] loss: 0.679
[5,    50] loss: 0.659
[5,    60] loss: 0.659
[6,    10] loss: 0.606
[6,    20] loss: 0.700
[6,    30] loss: 0.657
[6,    40] loss: 0.646
[6,    50] loss: 0.645
[6,    60] loss: 0.600
[7,    10] loss: 0.528
[7,    20] loss: 0.594
[7,    30] loss: 0.668
[7,    40] loss: 0.681
[7,    50] loss: 0.693
[7,    60] loss: 0.682
[8,    10] loss: 0.679
[8,    20] 

[58,    50] loss: 0.209
[58,    60] loss: 0.113
[59,    10] loss: 0.199
[59,    20] loss: 0.161
[59,    30] loss: 0.182
[59,    40] loss: 0.092
[59,    50] loss: 0.103
[59,    60] loss: 0.111
[60,    10] loss: 0.160
[60,    20] loss: 0.221
[60,    30] loss: 0.101
[60,    40] loss: 0.084
[60,    50] loss: 0.184
[60,    60] loss: 0.086
[61,    10] loss: 0.075
[61,    20] loss: 0.145
[61,    30] loss: 0.174
[61,    40] loss: 0.145
[61,    50] loss: 0.130
[61,    60] loss: 0.147
[62,    10] loss: 0.073
[62,    20] loss: 0.141
[62,    30] loss: 0.142
[62,    40] loss: 0.136
[62,    50] loss: 0.160
[62,    60] loss: 0.145
[63,    10] loss: 0.052
[63,    20] loss: 0.085
[63,    30] loss: 0.156
[63,    40] loss: 0.258
[63,    50] loss: 0.147
[63,    60] loss: 0.087
[64,    10] loss: 0.127
[64,    20] loss: 0.147
[64,    30] loss: 0.148
[64,    40] loss: 0.145
[64,    50] loss: 0.138
[64,    60] loss: 0.066
[65,    10] loss: 0.091
[65,    20] loss: 0.071
[65,    30] loss: 0.121
[65,    40] loss

[115,    10] loss: 0.131
[115,    20] loss: 0.042
[115,    30] loss: 0.132
[115,    40] loss: 0.038
[115,    50] loss: 0.143
[115,    60] loss: 0.085
[116,    10] loss: 0.025
[116,    20] loss: 0.090
[116,    30] loss: 0.085
[116,    40] loss: 0.136
[116,    50] loss: 0.092
[116,    60] loss: 0.141
[117,    10] loss: 0.243
[117,    20] loss: 0.134
[117,    30] loss: 0.096
[117,    40] loss: 0.044
[117,    50] loss: 0.034
[117,    60] loss: 0.028
[118,    10] loss: 0.083
[118,    20] loss: 0.037
[118,    30] loss: 0.081
[118,    40] loss: 0.309
[118,    50] loss: 0.049
[118,    60] loss: 0.038
[119,    10] loss: 0.133
[119,    20] loss: 0.073
[119,    30] loss: 0.137
[119,    40] loss: 0.089
[119,    50] loss: 0.042
[119,    60] loss: 0.093
[120,    10] loss: 0.027
[120,    20] loss: 0.089
[120,    30] loss: 0.090
[120,    40] loss: 0.148
[120,    50] loss: 0.145
[120,    60] loss: 0.071
[121,    10] loss: 0.031
[121,    20] loss: 0.079
[121,    30] loss: 0.183
[121,    40] loss: 0.095


[170,    10] loss: 0.082
[170,    20] loss: 0.139
[170,    30] loss: 0.076
[170,    40] loss: 0.093
[170,    50] loss: 0.078
[170,    60] loss: 0.088
[171,    10] loss: 0.085
[171,    20] loss: 0.030
[171,    30] loss: 0.137
[171,    40] loss: 0.202
[171,    50] loss: 0.079
[171,    60] loss: 0.032
[172,    10] loss: 0.136
[172,    20] loss: 0.032
[172,    30] loss: 0.030
[172,    40] loss: 0.142
[172,    50] loss: 0.132
[172,    60] loss: 0.083
[173,    10] loss: 0.142
[173,    20] loss: 0.094
[173,    30] loss: 0.071
[173,    40] loss: 0.127
[173,    50] loss: 0.037
[173,    60] loss: 0.080
[174,    10] loss: 0.133
[174,    20] loss: 0.075
[174,    30] loss: 0.038
[174,    40] loss: 0.140
[174,    50] loss: 0.083
[174,    60] loss: 0.084
[175,    10] loss: 0.086
[175,    20] loss: 0.026
[175,    30] loss: 0.139
[175,    40] loss: 0.145
[175,    50] loss: 0.084
[175,    60] loss: 0.079
[176,    10] loss: 0.133
[176,    20] loss: 0.124
[176,    30] loss: 0.091
[176,    40] loss: 0.046


#### 5. Test the network on the test data

In [49]:
y_true = []
y_scores = []

In [50]:
for data in testloader:
    inputs, labels = data
    outputs = net(Variable(inputs))
    outputs = F.sigmoid(outputs)
    y_true.extend(labels.numpy().flatten().tolist())
    y_scores.extend(outputs.data.numpy().flatten().tolist())

In [51]:
fpr, tpr, _ = roc_curve(y_true, y_scores)
auc_ffnet = auc(fpr, tpr)
auc_ffnet

0.79761904761904767

### Excersize 1: Try to use GPU if you have one

### Excersize 2: It seems SVM has better AUC, can you improve the performance of the network?

### Excersize 3: How do you know whether the network underfit, overfit or well-fit?