<h1 align = center>Linear Classifier with Pytorch</h1>

<h1>Table of Contents</h1>

<div font = 3>

1. <a href = #obj>Objective</a>
2. <a href = #lib>Import Libraries and Auxiliary Function</a>
3. <a href = #data>Download Data</a>
4. <a href = #class>Dataset Class</a>
5. <a href = #trans>Transform Object and Dataset Object</a>
6. <a href = #q>Question</a>

</div>

<br>

<hr>

<h2 id = obj>Objectives</h2>

<ul><li> How to use linear classifier in pytorch.</li></ul> 

Before you use a  Deep neural network to solve the classification problem,  it 's a good idea to try and solve the problem with the simplest method. You will need the dataset object from the previous section.
In this lab, we solve the problem with a linear classifier.


You will be asked to determine the maximum accuracy your linear classifier can achieve on the validation data for 5 epochs. We will give some free parameter values if you follow the instructions you will be able to answer the quiz. Just like the other labs there are several steps, but in this lab you will only be quizzed on the final result.


<h2 id = lib>Import Libraries and Auxiliary Functions</h2>

The following are the libraries we are going to use for this lab:

In [1]:
import pandas as pd

import torch
from torch import nn
from torch import optim
from torch.utils.data import Dataset, DataLoader

import torchvision.transforms as transforms

from PIL import Image
import matplotlib.pyplot as plt

import os
import zipfile
import glob

from typing import Tuple,List,Dict,Union, Any

<h2 id = data>Download Data</h2>

In [16]:
# For downloading the dataset of images
try:
    # Check if the directory exists and create the direction
    os.mkdir(os.getcwd()+'/..'+'/images')
    # If the path of the images doesn't exists it downloads the images
    if not os.path.exists(os.path.join(os.getcwd(),'..','images','Negative')):
        with zipfile.ZipFile('concrete_crack_images_for_classification.zip', 'r') as f:
            f.extractall(os.path.abspath(os.getcwd())+'\..'+'\images')
except:
    pass

<h2 id = class>Dataset Class</h2>

In this section, we will use the previous code to build a dataset class. As before, make sure the even samples are positive, and the odd samples are negative.  In this case, if the parameter <code>train</code> is set to <code>True</code>, use the first 10 000 samples as training data; otherwise, the last 10 000 samples will be used as validation data. Do not forget to sort your files so they are in the same order.  

**Note:** We are using the first 10,000 samples as our training data instead of the available 30,000 to decrease the training time of the model. If you want, you can train it yourself with all 30,000 samples just by modifying 2 lines in the following code chunk.

In [17]:
class Data(Dataset):
    # Constructor
    def __init__(self,transform = None,train:bool = True, portion:bool = True):
        directory = os.path.join(os.getcwd(),'..','images')
        positive = 'Positive'
        negative = 'Negative'

        # Paths and files
        positive_path = os.path.join(directory, positive)
        negative_path = os.path.join(directory, negative)
        positive_files = [os.path.join(positive_path,file) for file in  os.listdir(positive_path) if file.endswith(".jpg")]
        positive_files.sort()
        negative_files=[os.path.join(negative_path,file) for file in  os.listdir(negative_path) if file.endswith(".jpg")]
        negative_files.sort()

        self.number_of_samples = len(positive_files) + len(negative_files)

        # Atrributes
        self.all_files = [None]*self.number_of_samples
        self.all_files[::2] = positive_files
        self.all_files[1::2] = negative_files 
        # The transform is goint to be used on image
        self.transform = transform
        #torch.LongTensor
        self.Y = torch.zeros([self.number_of_samples]).type(torch.LongTensor)
        self.Y[::2] = 1
        self.Y[1::2] = 0
        # To use all the training data or not
        if portion:
            part = 10_000
        else:
            part = 30_000
        
        if train:
            self.all_files=self.all_files[0:part] #Change to 30000 to use the full test dataset
            self.Y=self.Y[0:part] #Change to 30000 to use the full test dataset
            self.len=len(self.all_files)
        else:
            self.all_files=self.all_files[30000:]
            self.Y=self.Y[30000:]
            self.len=len(self.all_files)    
       
    # Get the length
    def __len__(self):
        return self.len
    
    # Getter
    def __getitem__(self, idx:int):
        image=Image.open(self.all_files[idx])
        y=self.Y[idx]
        # If there is any transform method, apply it onto the image
        if self.transform:
            image = self.transform(image)
        return image, y

<h2 id = trans>Transform Object and Dataset Object</h2>

Create a transform object, that uses the <code>Compose</code> function. First use the transform <code>ToTensor()</code> and followed by <code>Normalize(mean, std)</code>. The value for <code> mean</code> and <code>std</code> are provided for you.

In [18]:
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]

transform =transforms.Compose([ transforms.ToTensor(), transforms.Normalize(mean, std)])

Create object for the training data  <code>dataset_train</code> and validation <code>dataset_val</code>. Use the transform object to convert the images to tensors using the transform object:

In [28]:
dataset_train = Data(transform=transform,train=True)
dataset_val = Data(transform=transform,train=False)

We can find the shape of the image:

In [39]:
dataset_train[0][0].shape

torch.Size([3, 227, 227])

<h2 id = q> Question </h2>

Create a custom module for Softmax for two classes,called model. The input size should be the <code>size_of_image</code>, you should record the maximum accuracy achieved on the validation data for the different epochs. For example if the 5 epochs the accuracy was 0.5, 0.2, 0.64,0.77, 0.66 you would select 0.77.

Train the model with the following free parameter values:

<b>Parameter Values</b>
   <li>learning rate: 0.1 </li>
   <li>momentum term: 0.1 </li>
   <li>batch size training: 5</li>
   <li>Loss function: Cross Entropy Loss </li>
   <li>epochs: 5</li>
   <li>set: torch.manual_seed(0)</li>

`Manual_seed`

In [24]:
torch.manual_seed(0)

<torch._C.Generator at 0x1f3c71d4fd0>

<b>Custom Module:</b>

In [65]:
class Softmax(nn.Module):
    # Constructor
    def __init__(self,input_size:int = 227*227*3,output_size:int = 2):
        super(Softmax,self).__init__()
        self.linear = nn.Linear(input_size,output_size)
    def forward(self,x:torch.Tensor) -> torch.Tensor:
        out = self.linear(x)
        return out

Model Object:

In [66]:
model = Softmax()
model

Softmax(
  (linear): Linear(in_features=154587, out_features=2, bias=True)
)

Optimizer

In [67]:
optimizer = optim.SGD(model.parameters(),lr = 0.1, momentum=0.1)

Criterion:

In [68]:
criterion = nn.CrossEntropyLoss()

Data Loader Training and Validation:

In [69]:
train_loader = DataLoader(dataset_train,batch_size=5)
val_loader = DataLoader(dataset_val,batch_size=5)

In [70]:
def train(model:Softmax, criterion:nn.CrossEntropyLoss,train_loader:DataLoader,
          validation_loader:DataLoader, optimizer:optim.SGD,
          epochs:int = 100, verbose:int = 0) -> Dict:
    train_loss, val_acc = [],[]

    for epoch in range(epochs):
        total = 0
        for x,y  in train_loader:
            z = model(x.view(-1,227*227*3))
            optimizer.zero_grad()
            loss = criterion(z,y)
            loss.backward()
            optimizer.step()
            total += loss.item()
        
        train_loss.append(total)

        correct = 0
        for x,y in validation_loader:
            yhat = model(x.view(-1,227*227*3))
            _,label = torch.max(yhat,1)
            correct += (label==y).sum().item()
        accuracy = 100 * (correct / len(dataset_val))

        val_acc.append(accuracy)
        
        if verbose != 0:
            if ((epoch + 1) % verbose == 0):
                print(f'Epoch [{(epoch + 1):>4d}]: Cost = {total:>8.4f} | Accuracy = {accuracy:>8.4f}')
        else:
            pass
        
    hist = {'training_loss':train_loss,'validation_accuracy':val_acc}

    return hist

In [72]:
hist = train(model,criterion,train_loader,val_loader,optimizer,5,1)

Epoch [   1]: Cost = 1964074.4299 | Accuracy =  81.4200
Epoch [   2]: Cost = 1397367.7476 | Accuracy =  70.6700
Epoch [   3]: Cost = 1265354.7851 | Accuracy =  83.4000
Epoch [   4]: Cost = 1157045.0106 | Accuracy =  69.0800
Epoch [   5]: Cost = 1090768.4316 | Accuracy =  82.6100


We save the model in path of the project

In [77]:
torch.save(model,'.\saved_models\linearclassifier.pt')