
---



# **Logistic regression from scratch using pytorch**


Logistic regression is a classification algorithm used to assign observations to a discrete set of classes. Unlike linear regression which outputs continuous number values, logistic regression transforms its output using the logistic sigmoid function to return a probability value which can then be mapped to two or more discrete classes.

In this notebook, we explore the binary logistic regression. We change the clsssification problem with 10 classes into a binary classification problem by considering only the points from the classes *'ship'* and *'car'*.

![Logistic regression image](https://drive.google.com/uc?id=1eRF1-2qnQYAkkCDpAwROj5MiMKzct2x0)


** **bold text**Task 1. Generating training dataset** 

Since logistic regression is a classification problem with two classes, we need a labelled dataset with two classes as the training set. Complete the function to obtain the datapoints corresponding to labels *'ship'* and *'car'*. 

steps to build a new dataset with these coditions: 

1. Loading the train and test sets of CIFAR 10 from torchvision using a batch size of 1024.
2. Splitting the training samples by 80:20 ratio into train set and validation set respectively. 
3. Filtering the datasets to only have images with classes 'ship' or 'car'. The corresponding labels are 8 and 1 respectively.
4. The new labels for binary classification problem should be 'ship' : 0 and 'car': 1.
5. Define a [torch.utils.data.Dataset](https://pytorch.org/docs/stable/_modules/torch/utils/data/dataset.html) with the filtered tensors of images and the newly created labels.
6. Define a dataloader for training and validation datasets with batch_size 64.



In [0]:
# TODO : generate the train, validation, and test sets from CIFAR 10  
import os
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader, random_split
from torchvision import transforms, utils, datasets
from torch import nn,autograd
from torch.autograd import Variable
import warnings
warnings.filterwarnings("ignore")



In [0]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) #mean, std for Red, Green and Blue
])
train = datasets.CIFAR10(root='data/cifar/', train=True, transform=transform, target_transform=None, download=True)
test =datasets.CIFAR10(root='data/cifar/', train=False, transform=transform, target_transform=None, download=True)

0it [00:00, ?it/s]

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to data/cifar/cifar-10-python.tar.gz


170500096it [00:06, 27175875.93it/s]                               


Extracting data/cifar/cifar-10-python.tar.gz to data/cifar/
Files already downloaded and verified


In [0]:

class TrainCIFARDataset(Dataset):
    

    def __init__(self, file):
        self.file = self.filter_dataset(file)
    
 
    def __len__(self):
        return len(self.file)
    def __getitem__(self, idx):
        return self.file[idx]

    def replace_target_values(self ,t):
        t = list(t)
        targets = {8:0, 1:1}
        t[1] = targets[t[1]]
        return tuple(t)

    def filter_dataset(self, data):
        """
        filter the data to get only classes with ship and car, then replace the values of ships and car to 0,1
        """
        targets = {8:0, 1:1}
        filtered_ = [data[i] for i in range(len(data)) if data[i][1] in targets] 
        filtered_ = list(map(self.replace_target_values, filtered_)) 
        return filtered_

class TestCIFARDataset(TrainCIFARDataset):
    def __init__(self, file):
        self.file = self.filter_dataset(file)
    def __len__(self):
        return len(self.file)

    def __getitem__(self, idx):
        return self.file[idx]


In [0]:
train_data = TrainCIFARDataset(train)
train_loader = DataLoader(train_data, batch_size=1024)
test_dataset = TestCIFARDataset(test)
test_dataset = DataLoader(test_dataset, batch_size=1024)

train_size = int(0.8 * len(train_data))
test_size = len(train_data) - train_size


train_dataset, val_dataset = random_split(train_data, [train_size, test_size])
train_dataset, val_dataset = DataLoader(train_dataset, batch_size=64), DataLoader(val_dataset, batch_size= 64)

** **bold text**Task 2. Logistic regression hypothesis** 

In order to map predicted values to probabilities, logistic regression needs a function which returns values between 0 and 1. Logistic function is used in this case. This function maps any real value into another value between 0 and 1. In machine learning, it is also referred to as sigmoid and is used to map predictions to probabilities.

$f(x) = \frac{1}{1 + e^{-x}}$

![Image of logistic regression function](https://en.wikipedia.org/wiki/Logistic_function#/media/File:Logistic-curve.svg)

the following function returns the sigmoid of a given input using torch utilities. 



In [0]:
# TODO find and return the sigmoid of x
def sigmoid(x):
    m = nn.Sigmoid()
    return m(x)
    #another implementatios is return 1/(1+torch.exp(-x))

** **bold text**Task 3. Loss function** 


A common loss function used when dealing with probabilities in binary classification is binary cross entropy loss.

$cross\_entropy\_loss(y, \hat y) = \frac{1}{N} \sum_{i=0}^{N} y\log \hat y_{i} + (1-y)\log (1 - \hat y_{i})$

For binary cross entropy loss, the number of classes is 2.

Read about cross entropy in this [link](https://en.wikipedia.org/wiki/Cross_entropy).

we do not have to worry also about implementing that ourselves. 


In [0]:
# TODO : compute mean binary cross entropy loss given a list of predicted and true labels
def bce(y_true, y_pred):
    loss = nn.BCELoss()
    bce_loss = loss(y_pred, y_true) 
    #another implementation: bce_loss = torch.mean(y_true * torch.log(y_pred) + (1-y_true)*torch.log(1-y_pred))

    return bce_loss


**Task 4. Gradient descent to minimize the loss** 

The logistic regression parameter need to be optimized to minimize the loss function.

We have the output of the logistic regression given a vector **x** as follows.

$f(x) = \frac{1}{1 + e^{-wx}}$

the following function calculates the gradient of binary cross entropy loss function with respect to the parameter w. 

In [0]:
def gradient(loss):
  # TODO : compute and return the gradient of loss w.r.t the weight parameter
  
  return loss.backward()

**Task 5 . Fitting the model** [5 point]

the function below  fits a logistic regression model on the given input data with the specified learning rate and number of epochs using stochastic gradient descent.

Follow the steps below to complete the function.
```
For each epoch:
  For each mini batch:
    1. Compute the predicted probabilities for all samples in the batch (y_pred)
    2. Compute the predicted probabilities for all samples in the batch (y_pred).
    3. Compute mean loss of the batch using function defined in task 9.
    4. Compute the gradient of the loss w.r.t the weight parameter. use functin defined in task 10.
    5. Keep track of the mean loss during each epoch.
    6. Update weight parameter using stochastic gradient descent. The batch size is 64, according to the dataloaders defined in task 6. 
Return the loss and the optimized weight parameter.
```


In [0]:
np.random.seed(0)

def fit(data, epochs, learning_rate):
        
    # TODO: get the data points and corresponding labels
    #   x, y = [sample for sample in data]
    '''TODO: The input x, which is multidimensional in this case, is multiplied
     with the logistic regression parameter W to get a scalar. This is then 
     passed to the sigmoid function to get the probability. Use a tensor of 
     the required shape to initialize the weight parameter
    '''
    num_of_batches = len(data)
    weight = torch.randn(1,3072, requires_grad= True)
    # looping over the data   
    for epoch in range(epochs):  
        net_loss = 0.0 
        # for each mini batch
        for batch_x, batch_y in data:
            x_ = batch_x.view(64,3072) # 3 x 32 x 32
            z = torch.mm(weight, x_.t())
            y_pred = sigmoid(z)
            y_true = batch_y.view(1, 64).float()
            loss = bce(y_true,y_pred)
            net_loss += loss

            # TODO: compute the gradient of the loss w.r.t weight
            gradient(loss)
            # TODO : perform one step of stochastic gradient descent to update weight
            with torch.no_grad():
                weight -= learning_rate * weight.grad
                weight.grad.zero_()
            


        print('epoch: %d net_loss: %6.3f'%(epoch, net_loss/num_of_batches))
        
    return weight, net_loss/num_of_batches

**Task 6. Hyperparameter tuning** 

The learning rate and the number of epochs are important hyperparameters that need to be set before training. 
Complete the function below to select the best hyperparameters given the list of possible combinations.

In [0]:
"""
we implemented predict() function before its task becuase we will 
use it in predicting the data in validation dataset in order to fine-tune the hyperparams 
"""
def predict(x, weight):
    z = torch.mm(weight, x.t())
    z = sigmoid(z)
    return z

def get_validation_accuracy(weight):
    acc = 0.0
    n = 0
    for image, label in val_dataset:
        n += label.shape[0] 
        out = predict(image, weight)
        out = (out>0.5)
        correct = (out == label).float().sum()
        acc+= correct
    acc = acc/n
    return acc


def select_best_hyperparams(data, learning_rates, epochs):
  # TODO : initialize best loss
    best_val_acc = 0
    best_hyperparams= None

    for learning_rate, epoch in zip(learning_rates, epochs):
        np.random.seed(0)
        # TODO find the hyperparameter combination which returns the minimum loss after training (using fit function)
        weight, loss = fit(data,epoch, learning_rate)
        validation_acc = get_validation_accuracy(weight)
        if validation_acc > best_val_acc:
            print("changing best val acc from: {0} to {1}".format(best_val_acc, validation_acc))
            best_val_acc = validation_acc
            best_hyperparams = [epoch, learning_rate]
                 
    return best_hyperparams, best_val_acc

**Task 7. Training using the best hyperparameters** [0.5 point]

Complete the code below to select the best hyperparamater combination and then fit the training data using the selected learning rate and number of epochs.



In [0]:
# hyperparameters combinations 

np.random.seed(0) #to reproduce the results 
learning_rates = [0.01, 0.1, 1] 
epochs = [25, 50, 100]
# TODO : use the function defined in task 12 to find the best hyperparameter combination from the above list
best_hyperparams , best_loss= select_best_hyperparams(train_dataset,learning_rates, epochs)

print('Best hyperparameters using validation data.\nLearning rate: %5.3f, Number of epochs: %d '% (best_hyperparams[1], best_hyperparams[0]))

epoch: 0 net_loss: 13.891
epoch: 1 net_loss: 12.061
epoch: 2 net_loss: 10.455
epoch: 3 net_loss:  9.307
epoch: 4 net_loss:  8.513
epoch: 5 net_loss:  7.938
epoch: 6 net_loss:  7.515
epoch: 7 net_loss:  7.228
epoch: 8 net_loss:  6.942
epoch: 9 net_loss:  6.712
epoch: 10 net_loss:  6.520
epoch: 11 net_loss:  6.374
epoch: 12 net_loss:  6.267
epoch: 13 net_loss:  6.179
epoch: 14 net_loss:  6.046
epoch: 15 net_loss:  5.965
epoch: 16 net_loss:  5.880
epoch: 17 net_loss:  5.787
epoch: 18 net_loss:  5.696
epoch: 19 net_loss:  5.619
epoch: 20 net_loss:  5.544
epoch: 21 net_loss:  5.474
epoch: 22 net_loss:  5.411
epoch: 23 net_loss:  5.329
epoch: 24 net_loss:  5.241
epoch: 0 net_loss:  9.941
epoch: 1 net_loss:  6.750
epoch: 2 net_loss:  6.196
epoch: 3 net_loss:  5.875
epoch: 4 net_loss:  5.393
epoch: 5 net_loss:  5.137
epoch: 6 net_loss:  4.828
epoch: 7 net_loss:  4.480
epoch: 8 net_loss:  4.258
epoch: 9 net_loss:  4.032
epoch: 10 net_loss:  3.892
epoch: 11 net_loss:  3.696
epoch: 12 net_loss:  

**Task 8. Logistic regression threshold** 


Logistic regression takes an input and returns a values between 0 and 1. To interpret this output as a probability of the input being in a class, we need to define a threshold. We set a threshold of 0.5.

We predict class 0 if f(x) is greater than or equal to 0.5, else we predict the data point to be of an instance of class 1.

the following function[link text](https://) predicts the class (ship or car) of a given input. [1 point]

In [0]:
threshold = 0.5
l_Rate, epoch = best_hyperparams[1], best_hyperparams[0]
weight, loss = fit(train_dataset,epoch, l_Rate)



epoch: 0 net_loss:  9.312
epoch: 1 net_loss:  6.980
epoch: 2 net_loss:  6.266
epoch: 3 net_loss:  5.766
epoch: 4 net_loss:  5.301
epoch: 5 net_loss:  4.920
epoch: 6 net_loss:  4.584
epoch: 7 net_loss:  4.222
epoch: 8 net_loss:  3.962
epoch: 9 net_loss:  3.722
epoch: 10 net_loss:  3.538
epoch: 11 net_loss:  3.402
epoch: 12 net_loss:  3.342
epoch: 13 net_loss:  3.225
epoch: 14 net_loss:  3.164
epoch: 15 net_loss:  3.101
epoch: 16 net_loss:  3.075
epoch: 17 net_loss:  2.973
epoch: 18 net_loss:  2.890
epoch: 19 net_loss:  2.840
epoch: 20 net_loss:  2.688
epoch: 21 net_loss:  2.758
epoch: 22 net_loss:  2.722
epoch: 23 net_loss:  2.646
epoch: 24 net_loss:  2.590
epoch: 25 net_loss:  2.545
epoch: 26 net_loss:  2.502
epoch: 27 net_loss:  2.455
epoch: 28 net_loss:  2.435
epoch: 29 net_loss:  2.421
epoch: 30 net_loss:  2.340
epoch: 31 net_loss:  2.344
epoch: 32 net_loss:  2.288
epoch: 33 net_loss:  2.274
epoch: 34 net_loss:  2.290
epoch: 35 net_loss:  2.260
epoch: 36 net_loss:  2.248
epoch: 37 n

In [0]:
def predict(x):
  # TODO : compute the predicted class label
    z = torch.mm(weight, x.t())
    z = sigmoid(z)
    return z

**Task 9. Reporting accuracy on test set** [*0.5* point]

The test set is used to give an indication of the generalization abilities of the model, that is to estimate how good the model is over random guessing at an unseen data point.

the code below computes the accuracy of logistic regression model on the test set. For this, first bring the test set to the low dimensional subspace and then make predictions using the trained model. 


In [0]:

# TODO: compute the accuracy on the reduced test set
acc = 0.0
n = 0
for image, label in test_dataset:
    n += label.shape[0] 
    image = image.view(image.size()[0],3*32*32)
    out = predict(image)

    out = (out>0.5)
    correct = (out == label).float().sum()
    acc+= correct
print(n)
acc = acc/n



print('Accuracy on the test set : %6.3f'% (acc))

2000
Accuracy on the test set :  0.754
