# Pytorch - Banknote Dataset

This task will look at performing classification on a dataset known as the Banknote Authentication Dataset.

The dataset consists of 1372 samples, where each sample consists of the following 5 attributes:

    variance of Wavelet Transformed image (continuous)
    skewness of Wavelet Transformed image (continuous)
    curtosis of Wavelet Transformed image (continuous)
    entropy of image (continuous)
    class (integer)

The output (class) is either a 0 (genuine note), or a 1 (forged note). The task is therefore a binary classification task.

More information on the dataset can be found here: https://archive.ics.uci.edu/ml/datasets/banknote+authentication

First of all, we import the necessary libraries.


In [96]:
import csv
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from sklearn.metrics import accuracy_score, precision_score, recall_score

import matplotlib.pyplot as plt
from random import shuffle
import numpy as np

Now, we define a function for reading the database.

In [97]:
def read_banknote_file(filename="./datasets/data_banknote_authentication.csv"):
    x = []
    y = []
    
    with open(filename) as csv_file:
        csv_reader = csv.reader(csv_file)
        for row in csv_reader:
            x.append(list(map(float, row[:-1])))
            y.append([int(row[-1])])

    return x, y

data, labels = read_banknote_file()

After defining the function for reading the database, we define two more function for splitting the data into training and test set with the ratio 80/20, and also to suffle the whole data to randomize the order of the samples. 

In [98]:
def shuffle_data(data, labels):
    combined = list(zip(data, labels))
    shuffle(combined)
    return zip(*combined)

def split_data(x, y, train_ratio=0.8):
    pivot = int(train_ratio * len(x))
    return x[:pivot], x[pivot:], y[:pivot], y[pivot:]

data, labels = shuffle_data(data, labels)

data_train, data_test, labels_train, labels_test = split_data(data, labels)

In this part, we convert the numpy ndarray to the torch tensors.

In [99]:
data_train = torch.from_numpy(np.stack((data_train)))
data_test = torch.from_numpy(np.stack((data_test)))

labels_train = torch.from_numpy(np.stack((labels_train)))
labels_test = torch.from_numpy(np.stack((labels_test)))

At this stage, we will define the model class, which constructed by a sequence of three linear fully connected layers.

In [100]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(4, 2)
        self.fc2 = nn.Linear(2, 1)

    def forward(self, X):
        X = self.fc1(X)
        X = F.relu (X)
        X = self.fc2(X)

        return X[0]

It is important to know, to use the network class we should create an object from the available class as follows,

In [101]:
net = Net()

Now, we define the loss function and optimizer,

In [102]:
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.001)

We defined all necessary function and classes. So, we train the model to modify the weights of model.

In [103]:
losses = []
for epoch in range(10):
    total_loss = 0
    for idx in range (len(data_train)):
        optimizer.zero_grad()
        X = data_train[idx,:].float()
        X = torch.unsqueeze(X,0)
        y = labels_train[idx].float()
        pred = net (X)
        loss = criterion(y,pred)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    
        if epoch % 100 == 0:
            print ('number of epoch', epoch, 'loss', loss.item())
    losses.append(total_loss)

number of epoch 0 loss 2.1832852363586426
number of epoch 0 loss 0.11762145906686783
number of epoch 0 loss 0.08649150282144547
number of epoch 0 loss 0.09686300158500671
number of epoch 0 loss 1.835108995437622
number of epoch 0 loss 0.3309168517589569
number of epoch 0 loss 0.15911343693733215
number of epoch 0 loss 0.5368987917900085
number of epoch 0 loss 0.07221671938896179
number of epoch 0 loss 0.3990878760814667
number of epoch 0 loss 0.10393451899290085
number of epoch 0 loss 0.30370867252349854
number of epoch 0 loss 0.09370564669370651
number of epoch 0 loss 0.28883305191993713
number of epoch 0 loss 0.5266157984733582
number of epoch 0 loss 0.5371456742286682
number of epoch 0 loss 0.2076631784439087
number of epoch 0 loss 1.5061463117599487
number of epoch 0 loss 0.5372589826583862
number of epoch 0 loss 0.5337827801704407
number of epoch 0 loss 0.5372520089149475
number of epoch 0 loss 0.26232826709747314
number of epoch 0 loss 0.3124948740005493
number of epoch 0 loss 0.

number of epoch 0 loss 0.32637324929237366
number of epoch 0 loss 0.4467659592628479
number of epoch 0 loss 0.03094675950706005
number of epoch 0 loss 0.4454501271247864
number of epoch 0 loss 0.0003898706054314971
number of epoch 0 loss 0.022428389638662338
number of epoch 0 loss 0.4441218674182892
number of epoch 0 loss 0.028916874900460243
number of epoch 0 loss 0.44279971718788147
number of epoch 0 loss 0.1401241421699524
number of epoch 0 loss 0.05410226434469223
number of epoch 0 loss 0.44065386056900024
number of epoch 0 loss 0.011526711285114288
number of epoch 0 loss 0.11376811563968658
number of epoch 0 loss 0.11331348866224289
number of epoch 0 loss 0.06713970750570297
number of epoch 0 loss 0.22430545091629028
number of epoch 0 loss 0.440396249294281
number of epoch 0 loss 0.10630489885807037
number of epoch 0 loss 0.15436683595180511
number of epoch 0 loss 0.03475329279899597
number of epoch 0 loss 0.5723205804824829
number of epoch 0 loss 0.4413156807422638
number of epoc

number of epoch 0 loss 0.12233288586139679
number of epoch 0 loss 0.3829249143600464
number of epoch 0 loss 0.24993640184402466
number of epoch 0 loss 0.0316939651966095
number of epoch 0 loss 0.3805999755859375
number of epoch 0 loss 0.028194956481456757
number of epoch 0 loss 0.1474331170320511
number of epoch 0 loss 0.3804394602775574
number of epoch 0 loss 0.0015567179070785642
number of epoch 0 loss 0.03773634135723114
number of epoch 0 loss 0.7841540575027466
number of epoch 0 loss 0.38168007135391235
number of epoch 0 loss 0.1470208466053009
number of epoch 0 loss 0.381101131439209
number of epoch 0 loss 0.37957826256752014
number of epoch 0 loss 0.2413196861743927
number of epoch 0 loss 0.37685418128967285
number of epoch 0 loss 0.004776728339493275
number of epoch 0 loss 0.1501418948173523
number of epoch 0 loss 0.004023683723062277
number of epoch 0 loss 0.6726018786430359
number of epoch 0 loss 0.007936341688036919
number of epoch 0 loss 0.15742993354797363
number of epoch 0

number of epoch 0 loss 0.320429265499115
number of epoch 0 loss 1.3826986105414107e-05
number of epoch 0 loss 0.1141820177435875
number of epoch 0 loss 0.3199212849140167
number of epoch 0 loss 0.11422504484653473
number of epoch 0 loss 0.3194064199924469
number of epoch 0 loss 0.29600217938423157
number of epoch 0 loss 0.055013637989759445
number of epoch 0 loss 0.0953778624534607
number of epoch 0 loss 0.3167365789413452
number of epoch 0 loss 0.2273106426000595
number of epoch 0 loss 0.6477327346801758
number of epoch 0 loss 0.3162083327770233
number of epoch 0 loss 0.3149448037147522
number of epoch 0 loss 0.3136862516403198
number of epoch 0 loss 0.22998592257499695
number of epoch 0 loss 0.13536515831947327
number of epoch 0 loss 0.026199456304311752
number of epoch 0 loss 0.3109017014503479
number of epoch 0 loss 0.044144656509160995
number of epoch 0 loss 0.3101271390914917
number of epoch 0 loss 0.022191449999809265
number of epoch 0 loss 0.19706958532333374
number of epoch 0 

number of epoch 0 loss 0.36339932680130005
number of epoch 0 loss 0.23797233402729034
number of epoch 0 loss 0.27759024500846863
number of epoch 0 loss 0.2839012145996094
number of epoch 0 loss 0.22148369252681732
number of epoch 0 loss 0.278595507144928
number of epoch 0 loss 0.0718638151884079
number of epoch 0 loss 0.10154850035905838
number of epoch 0 loss 0.22506260871887207
number of epoch 0 loss 0.2772456109523773
number of epoch 0 loss 0.27613770961761475
number of epoch 0 loss 0.08317025750875473
number of epoch 0 loss 0.27442964911460876
number of epoch 0 loss 0.6437348127365112
number of epoch 0 loss 0.013297244906425476
number of epoch 0 loss 0.08878768235445023
number of epoch 0 loss 0.2753967046737671
number of epoch 0 loss 0.27429628372192383
number of epoch 0 loss 0.2732001543045044
number of epoch 0 loss 0.2721084654331207
number of epoch 0 loss 0.2710210978984833
number of epoch 0 loss 0.060142818838357925
number of epoch 0 loss 0.2978646755218506
number of epoch 0 lo

After training the model, the performance of trained model will be evaluated over the test set as follows,

In [104]:
test_losses = []    
total_test = 0
correct_test = 0
predic = []
labels = []
for epoch in range (1):
    total_loss_test = 0
    for idx in range (len(data_test)):
        X = data_test[idx,:].float()
        X = torch.unsqueeze(X,0)
        y = labels_test[idx].float()
        pred = net (X)
        predic.append (pred)
        labels.append (y)
        loss = criterion(y,pred)
        total_test += y.size(0)
        correct_test += (torch.round(pred.float())).eq((y.float())).sum().item()

        total_loss_test += loss.item()
    test_losses.append(total_loss_test)
    accuracy  = correct_test/total_test
    print ('Accuracy:\t', accuracy*100, '%')

Accuracy:	 97.81818181818181 %


In [105]:
predict_y = torch.cat(predic).round()
labels_test = torch.cat(labels)

In [106]:
print ('prediction accuracy', accuracy_score(labels_test.data, predict_y.data))

# Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
print ('macro precision', precision_score(labels_test.data, predict_y.data, average='macro'))
# Calculate metrics globally by counting the total true positives, false negatives and false positives.
print ('micro precision', precision_score(labels_test.data, predict_y.data, average='micro'))
# Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
print ('macro recall', recall_score(labels_test.data, predict_y.data, average='macro'))
# Calculate metrics globally by counting the total true positives, false negatives and false positives.
print ('micro recall', recall_score(labels_test.data, predict_y.data, average='micro'))

prediction accuracy 0.9781818181818182
macro precision 0.9732142857142857
micro precision 0.9781818181818182
macro recall 0.9822485207100592
micro recall 0.9781818181818182
