<a href="https://colab.research.google.com/github/anand221992/Binary_Classification/blob/main/BINARY_CLASSIFICATION.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Checking the python version

In [2]:
!python3 --version

Python 3.7.13


# Unziiping the data set

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
!ls

drive  sample_data


#Kindly make sure that the assignment.zip data set is uploaded

In [5]:
!unzip /content/drive/MyDrive/assignment.zip -d ./

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: ./assignment/train/1/1000.png  
  inflating: ./assignment/train/1/1001.png  
  inflating: ./assignment/train/1/1002.png  
  inflating: ./assignment/train/1/1003.png  
  inflating: ./assignment/train/1/1004.png  
  inflating: ./assignment/train/1/1005.png  
  inflating: ./assignment/train/1/1006.png  
  inflating: ./assignment/train/1/1007.png  
  inflating: ./assignment/train/1/1008.png  
  inflating: ./assignment/train/1/1009.png  
  inflating: ./assignment/train/1/1010.png  
  inflating: ./assignment/train/1/1011.png  
  inflating: ./assignment/train/1/1012.png  
  inflating: ./assignment/train/1/1013.png  
  inflating: ./assignment/train/1/1014.png  
  inflating: ./assignment/train/1/1015.png  
  inflating: ./assignment/train/1/1016.png  
  inflating: ./assignment/train/1/1017.png  
  inflating: ./assignment/train/1/1018.png  
  inflating: ./assignment/train/1/1019.png  
  inflating: ./assignment/train/1/1

Just looking at the python version just to make sure we are not using any deprecated functions below.

In [6]:
!python3 -V

Python 3.7.13


In [7]:
import cv2
import glob
from tqdm import tqdm
import numpy as np
import torch
import torch.nn.functional as F
import torch.nn as nn
import sys

Helper functions that we will need to extract data and convert it into batches of some size

In [10]:
num_classes = 2
# flatten the array nn module
class Flatten(nn.Module):
    """A custom layer that views an input as 1D."""

    def forward(self, input):
        return input.view(input.size(0), -1)

# Helpers
#Takes a set of data points and labels and groups them into batches.
# also does the special task of converting arrays from np to torch on the available device
def batchify_data(x_data, y_data, batch_size, device):
    # Only take batch_size chunks (i.e. drop the remainder)
    N = int(len(x_data) / batch_size) * batch_size
    batches = []
    for i in range(0, N, batch_size):
        batches.append({
            'x': torch.tensor(x_data[i:i+batch_size], dtype=torch.float32, device = device),
            'y': torch.tensor(y_data[i:i+batch_size], dtype=torch.long, device = device
        )})
    return batches
# split data into test and validation
def split_data(X_train, y_train):
    dev_split_index = int(9 * len(X_train) / 10)
    X_dev = X_train[dev_split_index:]
    y_dev = y_train[dev_split_index:]
    X_train = X_train[:dev_split_index]
    y_train = y_train[:dev_split_index]

    return X_train, y_train, X_dev, y_dev

# randomise the data
def randomise_data(X_train, y_train):
    permutation = np.array([i for i in range(len(X_train))])
    np.random.shuffle(permutation)
    X_train = [X_train[i] for i in permutation]
    y_train = [y_train[i] for i in permutation]
    return X_train, y_train

# compute the accuracy against the targets
def compute_accuracy(predictions, y):
    return torch.mean(torch.eq(predictions, y).double())
    #return np.mean(np.equal(predictions.numpy(), y.numpy()))


# Training Procedure
# Train a model for N epochs given data and hyper-params
def train_model(train_data, dev_data, model, lr=0.01, momentum=0.9, nesterov=False, n_epochs=30):
    # We optimize with SGD
    optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=momentum, nesterov=nesterov)

    for epoch in range(1, n_epochs+1):
        print("-------------\nEpoch {}:\n".format(epoch))


        # Run training
        loss, acc = run_epoch(train_data, model.train(), optimizer)
        print('Train loss: {:.6f} | Train accuracy: {:.6f}'.format(loss, acc))

        # Run validation NOTE the ".eval" mode below
        val_loss, val_acc = run_epoch(dev_data, model.eval(), optimizer)
        print('Val loss:   {:.6f} | Val accuracy:   {:.6f}'.format(val_loss, val_acc))

        # Save model
        torch.save(model, 'mnist_model_fully_connected.pt')
    return val_acc

# Train model for one pass of train data, and return loss, acccuracy
def run_epoch(data, model, optimizer):
    # Gather losses
    losses = []
    batch_accuracies = []

    # If model is in train mode, use optimizer.
    is_training = model.training

    # Iterate through batches
    for batch in tqdm(data):
        # Grab x and y
        x, y = batch['x'], batch['y']

        # Get output predictions
        out = model(x)
        #print("out : {}".format(out))

        # Predict and store accuracy
        #print(out)
        predictions = torch.argmax(out, dim=1)
        #print("predictions : {}".format(predictions))
        batch_accuracies.append(compute_accuracy(predictions, y).data.item())

        # Compute loss
        loss = F.cross_entropy(out, y)
        losses.append(loss.data.item())

        # If training, do an update.
        if is_training:
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

    # Calculate epoch level scores
    avg_loss = np.mean(losses)
    avg_accuracy = np.mean(batch_accuracies)
    return avg_loss, avg_accuracy

# function to load images, make sure we normalise them before passing on
# we augment using rotation
def load_images(folder_path, augment=False):
    cv_img = []
    for img in glob.glob(folder_path +"/*.png"):
        n = cv2.imread(img, cv2.IMREAD_GRAYSCALE)
        n = normalize(n)
        cv_img.append(n)
        if augment == True:
          n90 = np.rot90(n)
          n180 = np.rot90(n90)
          n270 = np.rot90(n180)
          cv_img = cv_img + [n90, n180, n270]
    return cv_img

# normalising an array function
# to avoid nan values during training
def normalize(x):
    x = np.asarray(x)
    return (x - x.min()) / (np.ptp(x))

# function for getting the complete dataset
def get_assignment_data():
    assignment_path = "assignment"
    train_set_0 = load_images(assignment_path + "/train/0", True)
    train_set_0_y = len(train_set_0)*[0]
    train_Set_1 = load_images(assignment_path + "/train/1", True)
    train_set_1_y = len(train_Set_1)*[1]
    train_set_x = np.array(train_set_0 + train_Set_1)
    train_set_y = np.array(train_set_0_y + train_set_1_y)

    test_set_0 = load_images(assignment_path + "/test/0")
    test_set_0_y = len(test_set_0)*[0]
    test_set_1 = load_images(assignment_path + "/test/1")
    test_set_1_y = len(test_set_1)*[1]
    test_set_x = np.array(test_set_0 + test_set_1)
    test_set_y = np.array(test_set_0_y + test_set_1_y)

    return (train_set_x, train_set_y, test_set_x, test_set_y)




Check for GPU availability.

In [11]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

#Multilayer Perceptron model(need data to be flatten)

In [12]:
hidden_layer = 2**14#10
mlp_model = nn.Sequential(
              nn.Linear(64*64, hidden_layer),
              nn.ReLU(),
              nn.Linear(hidden_layer, num_classes),
            )
mlp_model = mlp_model.to(device)

Lets test the above model

In [13]:
X_train, y_train, X_test, y_test = get_assignment_data()

# We need to rehape the data back into a flat vector
X_train = np.reshape(X_train, (X_train.shape[0], 64*64)) 
X_test = np.reshape(X_test, (X_test.shape[0], 64*64))

X_train, y_train = randomise_data(X_train, y_train)

X_train, y_train, X_dev, y_dev = split_data(X_train, y_train)

# Split dataset into batches
batch_size = 32
train_batches = batchify_data(X_train, y_train, batch_size, device)
dev_batches = batchify_data(X_dev, y_dev, batch_size, device)
test_batches = batchify_data(X_test, y_test, batch_size, device)


train_model(train_batches, dev_batches, mlp_model, nesterov=False, n_epochs = 11, lr=0.001)

## Evaluate the model on test data
loss, accuracy = run_epoch(test_batches, mlp_model.eval(), None)

print ("Loss on test set:"  + str(loss) + " Accuracy on test set: " + str(accuracy))



-------------
Epoch 1:



100%|██████████| 1350/1350 [00:24<00:00, 55.11it/s]


Train loss: 0.103581 | Train accuracy: 0.969931


100%|██████████| 150/150 [00:00<00:00, 495.73it/s]


Val loss:   0.072960 | Val accuracy:   0.978958
-------------
Epoch 2:



100%|██████████| 1350/1350 [00:21<00:00, 61.59it/s]


Train loss: 0.061847 | Train accuracy: 0.981366


100%|██████████| 150/150 [00:00<00:00, 472.03it/s]


Val loss:   0.056691 | Val accuracy:   0.982708
-------------
Epoch 3:



100%|██████████| 1350/1350 [00:21<00:00, 61.73it/s]


Train loss: 0.052422 | Train accuracy: 0.984514


100%|██████████| 150/150 [00:00<00:00, 449.31it/s]


Val loss:   0.048533 | Val accuracy:   0.985208
-------------
Epoch 4:



100%|██████████| 1350/1350 [00:21<00:00, 61.87it/s]


Train loss: 0.046373 | Train accuracy: 0.986458


100%|██████████| 150/150 [00:00<00:00, 434.15it/s]


Val loss:   0.043528 | Val accuracy:   0.986458
-------------
Epoch 5:



100%|██████████| 1350/1350 [00:21<00:00, 61.96it/s]


Train loss: 0.042030 | Train accuracy: 0.988264


100%|██████████| 150/150 [00:00<00:00, 427.98it/s]


Val loss:   0.040159 | Val accuracy:   0.986875
-------------
Epoch 6:



100%|██████████| 1350/1350 [00:21<00:00, 62.07it/s]


Train loss: 0.038711 | Train accuracy: 0.989282


100%|██████████| 150/150 [00:00<00:00, 406.22it/s]


Val loss:   0.037675 | Val accuracy:   0.987292
-------------
Epoch 7:



100%|██████████| 1350/1350 [00:21<00:00, 61.97it/s]


Train loss: 0.036034 | Train accuracy: 0.989676


100%|██████████| 150/150 [00:00<00:00, 384.61it/s]


Val loss:   0.035769 | Val accuracy:   0.987917
-------------
Epoch 8:



100%|██████████| 1350/1350 [00:21<00:00, 62.05it/s]


Train loss: 0.033770 | Train accuracy: 0.990255


100%|██████████| 150/150 [00:00<00:00, 408.87it/s]


Val loss:   0.034218 | Val accuracy:   0.989167
-------------
Epoch 9:



100%|██████████| 1350/1350 [00:21<00:00, 62.11it/s]


Train loss: 0.031793 | Train accuracy: 0.990718


100%|██████████| 150/150 [00:00<00:00, 405.33it/s]


Val loss:   0.032961 | Val accuracy:   0.989583
-------------
Epoch 10:



100%|██████████| 1350/1350 [00:21<00:00, 62.09it/s]


Train loss: 0.030019 | Train accuracy: 0.991250


100%|██████████| 150/150 [00:00<00:00, 403.75it/s]


Val loss:   0.031877 | Val accuracy:   0.989375
-------------
Epoch 11:



100%|██████████| 1350/1350 [00:21<00:00, 62.05it/s]


Train loss: 0.028399 | Train accuracy: 0.991852


100%|██████████| 150/150 [00:00<00:00, 389.09it/s]


Val loss:   0.030938 | Val accuracy:   0.989375


100%|██████████| 62/62 [00:00<00:00, 472.42it/s]

Loss on test set:0.049289979511917 Accuracy on test set: 0.9863911290322581





#Convolution based neural network

In [14]:
# note: dimensions can be figured out while running and solving errors
conv_model = nn.Sequential(
              nn.Conv2d(1, 32, (4, 4)),
              nn.ReLU(),
              nn.MaxPool2d((2, 2)),
              nn.Conv2d(32, 64, (4,4)),
              nn.ReLU(),
              nn.MaxPool2d((2, 2)),
              nn.Flatten(),
              nn.Linear(10816, 128), #1600, 128),
              nn.Dropout(p=0.5),
              nn.Linear(128, num_classes),
            )
conv_model = conv_model.to(device)

Lets test the above model

In [15]:
X_train, y_train, X_test, y_test = get_assignment_data()

# We need to rehape the data back into a 1x64x64 image
X_train = np.reshape(X_train, (X_train.shape[0], 1, 64,64))
X_test = np.reshape(X_test, (X_test.shape[0], 1, 64,64))

X_train, y_train = randomise_data(X_train, y_train)

X_train, y_train, X_dev, y_dev = split_data(X_train, y_train)

# Split dataset into batches
batch_size = 32
train_batches = batchify_data(X_train, y_train, batch_size, device)
dev_batches = batchify_data(X_dev, y_dev, batch_size, device)
test_batches = batchify_data(X_test, y_test, batch_size, device)


train_model(train_batches, dev_batches, conv_model, nesterov=False, n_epochs = 11, lr=0.001)

## Evaluate the model on test data
loss, accuracy = run_epoch(test_batches, conv_model.eval(), None)

print ("Loss on test set:"  + str(loss) + " Accuracy on test set: " + str(accuracy))

-------------
Epoch 1:



100%|██████████| 1350/1350 [00:10<00:00, 133.41it/s]


Train loss: 0.155433 | Train accuracy: 0.953935


100%|██████████| 150/150 [00:00<00:00, 781.28it/s]


Val loss:   0.082938 | Val accuracy:   0.973542
-------------
Epoch 2:



100%|██████████| 1350/1350 [00:05<00:00, 231.01it/s]


Train loss: 0.073187 | Train accuracy: 0.975856


100%|██████████| 150/150 [00:00<00:00, 762.65it/s]


Val loss:   0.057298 | Val accuracy:   0.979792
-------------
Epoch 3:



100%|██████████| 1350/1350 [00:05<00:00, 230.85it/s]


Train loss: 0.059608 | Train accuracy: 0.980625


100%|██████████| 150/150 [00:00<00:00, 768.03it/s]


Val loss:   0.050369 | Val accuracy:   0.983125
-------------
Epoch 4:



100%|██████████| 1350/1350 [00:05<00:00, 234.71it/s]


Train loss: 0.052947 | Train accuracy: 0.982407


100%|██████████| 150/150 [00:00<00:00, 771.22it/s]


Val loss:   0.045223 | Val accuracy:   0.984375
-------------
Epoch 5:



100%|██████████| 1350/1350 [00:05<00:00, 232.92it/s]


Train loss: 0.048667 | Train accuracy: 0.983866


100%|██████████| 150/150 [00:00<00:00, 769.15it/s]


Val loss:   0.042626 | Val accuracy:   0.985417
-------------
Epoch 6:



100%|██████████| 1350/1350 [00:05<00:00, 232.82it/s]


Train loss: 0.044684 | Train accuracy: 0.985093


100%|██████████| 150/150 [00:00<00:00, 761.29it/s]


Val loss:   0.039787 | Val accuracy:   0.984375
-------------
Epoch 7:



100%|██████████| 1350/1350 [00:05<00:00, 233.29it/s]


Train loss: 0.041571 | Train accuracy: 0.986366


100%|██████████| 150/150 [00:00<00:00, 773.80it/s]


Val loss:   0.037825 | Val accuracy:   0.985417
-------------
Epoch 8:



100%|██████████| 1350/1350 [00:05<00:00, 234.03it/s]


Train loss: 0.039514 | Train accuracy: 0.986875


100%|██████████| 150/150 [00:00<00:00, 771.86it/s]


Val loss:   0.036532 | Val accuracy:   0.985833
-------------
Epoch 9:



100%|██████████| 1350/1350 [00:05<00:00, 233.81it/s]


Train loss: 0.037562 | Train accuracy: 0.987755


100%|██████████| 150/150 [00:00<00:00, 784.27it/s]


Val loss:   0.034623 | Val accuracy:   0.986458
-------------
Epoch 10:



100%|██████████| 1350/1350 [00:05<00:00, 234.47it/s]


Train loss: 0.034830 | Train accuracy: 0.988611


100%|██████████| 150/150 [00:00<00:00, 765.64it/s]


Val loss:   0.033634 | Val accuracy:   0.987083
-------------
Epoch 11:



100%|██████████| 1350/1350 [00:05<00:00, 235.96it/s]


Train loss: 0.033568 | Train accuracy: 0.988981


100%|██████████| 150/150 [00:00<00:00, 780.37it/s]


Val loss:   0.032505 | Val accuracy:   0.986458


100%|██████████| 62/62 [00:00<00:00, 820.27it/s]

Loss on test set:0.04001661799209673 Accuracy on test set: 0.9868951612903226



