# Binary Classifier for Sonar Readings

In this document I will be copying code from https://machinelearningmastery.com/building-a-binary-classification-model-in-pytorch/. This code uses PyTorch to design and train a neural network on training data. It will then evaluate the performance of the neural network using a k-fold cross validation.

This dataset describes data from a sonar chirp which returns bouncing off of different services. There are 60 input variables which each have the strength of the returns at different angles. The classification problem will determine whether it has bounced off of a rock or a metal cylinder

First we need to import pandas so that we can read in the dataset and set the X as the independent variables and the Y as the label or dependent variable which in this case is whether it is a rock or a metal cylinder

In [16]:
import copy
 
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import tqdm
from sklearn.metrics import roc_curve
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn.preprocessing import LabelEncoder

In [17]:
# Read data
data = pd.read_csv("sonar.csv", header=None) #read file in
X = data.iloc[:, 0:60] #all independent variables
y = data.iloc[:, 60] #dependent variable or label

The label which is in the y variable needs to be converted froma a string to a numeric label. As there is only 2 labels, 'M' and 'R' these can be converted to 1 and 0 respectively. 

This can be done using sklearn and the encoder function which will do this automatically

In [18]:
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)

To check this has been done correctly we can use encoder.classes_ to check the classes and we can also print y to see the data outputs. When using encoder.classes_ this should us 'M' and 'R'. When printing y, we should see 1's and 0's.

In [19]:
print(encoder.classes_)

['M' 'R']


In [20]:
print(y)

[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]


As seen this has worked and we have received our expected output. The 0 represents 'M' and the 1 represents the 'R'.

Next we need to convert these in to PyTorch tensors so that we can feed it into our PyTorch model.

In [21]:
X = torch.tensor(X.values, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

We are going to be creating a 3 layer neural network model that only has 1 hidden layer. As this model has 60 pieces of input data to predict one binary variable. As we want to make a wide model with one hidden layer, a hidden layer of 180 neurons would be a good model. 180 is a three times the input features.

In [22]:
#creating a new class which creates a model with one hidden layer
class Wide(nn.Module):
    #initialiser
    def __init__(self):
        super().__init__()
        #creates linear transformation from input layer to hidden layer
        self.hidden = nn.Linear(60, 180)
        self.relu = nn.ReLU()
        #creates linear transformation from hidden layer to output layer
        self.output = nn.Linear(180, 1)
        #applies sigmoid function
        self.sigmoid = nn.Sigmoid()

    #method to move data from left layer to right layer, takes in the data as paramater
    def forward(self, x):
        x = self.relu(self.hidden(x))
        x = self.sigmoid(self.output(x))
        return x

We are alos going to create a second model which uses 3 hidden layers. This is called a deeper model as it has more than one hidden layer. This model will have 3 layers each with 60 neurons.

In [23]:
class Deep(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(60, 60)
        self.act1 = nn.ReLU()
        self.layer2 = nn.Linear(60, 60)
        self.act2 = nn.ReLU()
        self.layer3 = nn.Linear(60, 60)
        self.act3 = nn.ReLU()
        self.output = nn.Linear(60, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.act1(self.layer1(x))
        x = self.act2(self.layer2(x))
        x = self.act3(self.layer3(x))
        x = self.sigmoid(self.output(x))
        return x

We can confirm that the 2 models have similar number of parameters by running the following code. When this code is run we should get 2 numbers which are vaguely similar.

In [24]:
# Compare model sizes
model1 = Wide()
model2 = Deep()
print(sum([x.reshape(-1).shape[0] for x in model1.parameters()]))  # 11161
print(sum([x.reshape(-1).shape[0] for x in model2.parameters()]))  # 11041

11161
11041


As seen in the above cell the models have been created successfully as the two numbers are similar.

The training loop can be defined below. This function will be called for every batch that the stratified testing loop. In this case the function will be called 5 times as we have split our data into 5 batches. Below is the code. Our function takes in a model, our training datasets and our test data.

In [25]:
def model_train(model, X_train, y_train, X_val, y_val):
    # loss function and optimizer
    loss_fn = nn.BCELoss()  # binary cross entropy
    optimizer = optim.Adam(model.parameters(), lr=0.0001)
 
    n_epochs = 250   # number of epochs to run
    batch_size = 10  # size of each batch
    batch_start = torch.arange(0, len(X_train), batch_size)
 
    # Hold the best model
    best_acc = - np.inf   # init to negative infinity
    best_weights = None
 
    #loop through number of epochs
    for epoch in range(n_epochs):
        #train the model
        model.train()
        with tqdm.tqdm(batch_start, unit="batch", mininterval=0, disable=True) as bar:
            bar.set_description(f"Epoch {epoch}")
            for start in bar:
                # take a batch
                X_batch = X_train[start:start+batch_size]
                y_batch = y_train[start:start+batch_size]
                # forward pass
                y_pred = model(X_batch)
                loss = loss_fn(y_pred, y_batch)
                # backward pass
                optimizer.zero_grad()
                loss.backward()
                # update weights
                optimizer.step()
                # print progress
                acc = (y_pred.round() == y_batch).float().mean()
                bar.set_postfix(
                    loss=float(loss),
                    acc=float(acc)
                )
        # evaluate accuracy at end of each epoch
        model.eval()
        y_pred = model(X_val)
        acc = (y_pred.round() == y_val).float().mean()
        acc = float(acc)
        if acc > best_acc:
            best_acc = acc
            best_weights = copy.deepcopy(model.state_dict())
    # restore model and return best accuracy
    model.load_state_dict(best_weights)
    return best_acc

Now that both models have been created we now need ot train the data using our testing set. A testing method that we can use is called k-fold cross validation. This splits a large dataset into k amount of portions and takes one portion as the test set while the k-1 portions are the training set. There will be k number of combinations and the size of the training set will increase each time. 

Scikit-learn will use stratified k fold which means when seperating the data into the portions, it will ensure that there is a fair distribution of data in each portion.

First we need to import the needed libraries from sklearn.

In [26]:
# define 5-fold cross validation test harness

kfold = StratifiedKFold(n_splits=5, shuffle=True)
cv_scores = []
for train, test in kfold.split(X, y):
    # create model, train, and get accuracy
    model = Wide()
    acc = model_train(model, X[train], y[train], X[test], y[test])
    print("Accuracy (wide): %.2f" % acc)
    cv_scores.append(acc)

# evaluate the model
acc = np.mean(cv_scores)
std = np.std(cv_scores)
print("Model accuracy: %.2f%% (+/- %.2f%%)" % (acc*100, std*100))

Accuracy (wide): 0.86
Accuracy (wide): 0.76
Accuracy (wide): 0.81
Accuracy (wide): 0.63
Accuracy (wide): 0.68
Model accuracy: 74.91% (+/- 8.13%)


The below code creates a training dataset and a testing dataset by splitting the dataset into a 70/30 split. The data is also shuffled before it is split to ensure that the data is unabiased between the training and test set. The same stratified training is used as above and is done for both the wide and deep model. Both will print out the accuracy after each batch has been passed in.

In [27]:
from sklearn.model_selection import StratifiedKFold, train_test_split

# train-test split: Hold out the test set for final model evaluation
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)

# define 5-fold cross validation test harness
kfold = StratifiedKFold(n_splits=5, shuffle=True)
cv_scores_wide = []
for train, test in kfold.split(X_train, y_train):
    # create model, train, and get accuracy
    model = Wide()
    acc = model_train(model, X_train[train], y_train[train], X_train[test], y_train[test])
    print("Accuracy (wide): %.2f" % acc)
    cv_scores_wide.append(acc)
cv_scores_deep = []
for train, test in kfold.split(X_train, y_train):
    # create model, train, and get accuracy
    model = Deep()
    acc = model_train(model, X_train[train], y_train[train], X_train[test], y_train[test])
    print("Accuracy (deep): %.2f" % acc)
    cv_scores_deep.append(acc)

# evaluate the model
wide_acc = np.mean(cv_scores_wide)
wide_std = np.std(cv_scores_wide)
deep_acc = np.mean(cv_scores_deep)
deep_std = np.std(cv_scores_deep)
print("Wide: %.2f%% (+/- %.2f%%)" % (wide_acc*100, wide_std*100))
print("Deep: %.2f%% (+/- %.2f%%)" % (deep_acc*100, deep_std*100))

Accuracy (wide): 0.76
Accuracy (wide): 0.76
Accuracy (wide): 0.90
Accuracy (wide): 0.79
Accuracy (wide): 0.66
Accuracy (deep): 0.90
Accuracy (deep): 0.90
Accuracy (deep): 0.79
Accuracy (deep): 0.79
Accuracy (deep): 0.76
Wide: 77.24% (+/- 7.74%)
Deep: 82.76% (+/- 5.77%)


Depending on which model is more accurate, the more accurate model will be trained once again with the entire training data set. The model accuracy will be outputted.

In [28]:
# rebuild model with full set of training data
if wide_acc > deep_acc:
    print("Retrain a wide model")
    model = Wide()
else:
    print("Retrain a deep model")
    model = Deep()
acc = model_train(model, X_train, y_train, X_test, y_test)
print(f"Final model accuracy: {acc*100:.2f}%")

Retrain a deep model
Final model accuracy: 82.54%


Below we can see the final output layer and what the expected output should be. We are looking for the first number to be as close to the expected number as possible

In [29]:
model.eval()
with torch.no_grad():
    # Test out inference with 5 samples
    for i in range(5):
        y_pred = model(X_test[i:i+1])
        print(f"{X_test[i].numpy()} -> {y_pred[0].numpy()} (expected {y_test[i].numpy()})")

[0.0132 0.008  0.0188 0.0141 0.0436 0.0668 0.0609 0.0131 0.0899 0.0922
 0.1445 0.1475 0.2087 0.2558 0.2603 0.1985 0.2394 0.3134 0.4077 0.4529
 0.4893 0.5666 0.6234 0.6741 0.8282 0.8823 0.9196 0.8965 0.7549 0.6736
 0.6463 0.5007 0.3663 0.2298 0.1362 0.2123 0.2395 0.2673 0.2865 0.206
 0.1659 0.2633 0.2552 0.1696 0.1467 0.1286 0.0926 0.0716 0.0325 0.0258
 0.0136 0.0044 0.0028 0.0021 0.0022 0.0048 0.0138 0.014  0.0028 0.0064] -> [0.5399592] (expected [1.])
[0.0274 0.0242 0.0621 0.056  0.1129 0.0973 0.1823 0.1745 0.144  0.1808
 0.2366 0.0906 0.1749 0.4012 0.5187 0.7312 0.9062 0.926  0.7434 0.4463
 0.5103 0.6952 0.7755 0.8364 0.7283 0.6399 0.5759 0.4146 0.3495 0.4437
 0.2665 0.2024 0.1942 0.0765 0.3725 0.5843 0.4827 0.2347 0.0999 0.3244
 0.399  0.2975 0.1684 0.1761 0.1683 0.0729 0.119  0.1297 0.0748 0.0067
 0.0255 0.0113 0.0108 0.0085 0.0047 0.0074 0.0104 0.0161 0.022  0.0173] -> [0.93508375] (expected [1.])
[0.0303 0.0353 0.049  0.0608 0.0167 0.1354 0.1465 0.1123 0.1945 0.2354
 0.2898 0.281