<a href="https://colab.research.google.com/github/VincentGaoHJ/Course-CS5242/blob/master/practice_2_linearLogisticRegression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Practice 2: Pytorch autograd - Linear and Logistic Regression

Pytorch has an autograd feature which is essential for Deep Learning, as it is required for the framework to automatically figure out gradients of loss functions w.r.t all the parameters of any model.

In this tutorial, we will learn the PyTorch Autograd feature and use it to solve linear regression and logistic regression problems.

NOTE: For the whole notebook perform all your operation and instantiate all the variables on a GPU. (Google collab allows you to make environments with a GPU)

We start by importing pytorch and numpy

In [3]:
import torch
import numpy as np

We need to set device for the environment, here we have only one gpu. Gpus are indexed starting from 0, as we have only one gpu, we will use that one.


In [4]:
!nvidia-smi

Wed Jan 27 00:36:31 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   41C    P8     9W /  70W |      0MiB / 15079MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [5]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

device(type='cuda', index=0)

**Q1. We consider the linear regression case, where we have data x, y and we want to find w,
b so that the data follows y = x * w + b approximately.**
1. Create a random normal tensor x having dimension (10, 3), where 10 denotes
sample size and 3 denotes feature size.
2. Create two randn tensors w_true, b_true having size 3 and 1 respectively.
Multiply 5 with w and -2 with b .
3. Generate tensor y_true by using the equation y_true = x * w_true + b_true

Note that this value of w_true and b_true will not be used henceforth, rather it is only being
used to generate the data for our regression problem, and to compare at the end our fitted w, b
with the true values we have here.

In [None]:
# Create a random normal tensor x having dimension (10, 3), where 10 denotes sample size and 3 denotes feature size
x = torch.randn(10, 3)
print("x:\n", x)

# Create two randn tensors w_true, b_true having size 3 and 1 respectively. Multiply 5 with w and -2 with b
w_true = torch.randn(3) * 5
b_true = torch.randn(1) - 2
print("w_true:\n", w_true)
print("b_true:\n", b_true)

# Generate tensor y_true by using the equation y_true = x * w_true + b_true
y_true = torch.matmul(x, w_true) + b_true
print("y_true = x * w_true + b_true:\n", y_true)

# The difference between torch.mm, torch.mul, torch.matmul
# torch.mm(a, b): (3, 4) * (4, 2) -> ​​(3, 2)
# torch.matmul(a, b): (3, 4) * (5, 4, 2) -> ​​(5, 3, 2)

x:
 tensor([[ 0.1258,  2.0514, -0.3690],
        [ 0.7158,  1.0307,  0.6046],
        [ 0.8109,  0.7432,  0.4827],
        [-0.1428, -1.0949, -0.6134],
        [-0.8011,  0.1795,  0.0511],
        [-2.0741,  0.2468, -1.5186],
        [-1.1366, -1.4331,  0.1144],
        [-0.8764,  0.5175,  1.3571],
        [ 2.0504, -0.5699,  0.9009],
        [-0.5387, -0.5388,  0.8024]])
w_true:
 tensor([ 1.0543, -4.3083,  3.8480])
b_true:
 tensor([-3.0357])
y_true = x * w_true + b_true:
 tensor([-13.1610,  -4.3954,  -3.5253,  -0.8292,  -4.4570, -12.1294,   2.3804,
         -0.9673,   5.0481,   1.8052])


**Q2. Now define two randn tensor w, b of the same size as w_true and b_true , but with
[requires_grad](https://pytorch.org/docs/stable/notes/autograd.html) switched on. Calculate a tensor y = x * w + b , check whether this y
has requires_grad switched on.**

In [None]:
# Define two randn tensor w, b of the same size as w_true and b_true
w = torch.randn(3, requires_grad=True)
b = torch.randn(1, requires_grad=True)
print("w:\n", w)
print("b:\n", b)

# requires_grad switched on. Calculate a tensor y = x * w + b
y = torch.matmul(x, w) + b
print("y = x * w + b:\n", y)

# check whether this y has requires_grad switched on.
print("Check requires_grad: ", y.requires_grad)

w:
 tensor([0.1856, 1.8790, 0.1220], requires_grad=True)
b:
 tensor([-1.1403], requires_grad=True)
y = x * w + b:
 tensor([ 2.6926,  1.0029,  0.4654, -3.2989, -0.9454, -1.2467, -4.0301, -0.1650,
        -1.7208, -2.1548], grad_fn=<AddBackward0>)
Check requires_grad:  True


**Q3. Define a loss function loss which takes w and b as arguments and returns loss which is
squared error = $\sum(y-y_{true})^2$, where y = x * w + b.**

In [8]:
def loss_func(output, target):
    loss = torch.sum((output - target)**2)
    return loss

**Q4. Create a notebook cell with the following tasks**
1. Call the loss function and check whether the calculated loss has requires_grad
switched on
2. Call .backward() for the loss. Check gradients of w and b after this.
3. At this point repeatedly execute this cell, print values of loss, w, b and also the
gradients of w and b . Notice the changes in the gradient value of w, b and no change
in loss, y, w, b.
4. Now manually set the gradients of w and b to zero, and re-execute this whole cell
multiple times and check the gradients

In [None]:
# Call the loss function and check whether the calculated loss has requires_grad switched on
loss = loss_func(y, y_true)
print("loss:\n", loss)
print("Check loss requires_grad: ", loss.requires_grad)

def run_backward(reset_grad=False):

    # manually set the gradients of w and b to zero
    if reset_grad:
        w.grad.zero_()
        b.grad.zero_()
    
    # Call .backward() for the loss.
    loss.backward(retain_graph = True)

    # Check w and b
    print("w: ", w)
    print("b: ", b)

    # Check gradients of w and b
    print("Gradients of w: ", w.grad)
    print("Gradients of b: ", b.grad)

run_backward()
# At this point repeatedly execute this cell, print values of loss, w, b and also the gradients of w and b . 
# Notice the changes in the gradient value of w, b and no change in loss, y, w, b.
for _ in range(4):
    print('Excute'.center(60, '='))
    run_backward()

# Now manually set the gradients of w and b to zero
# re-execute this whole cell multiple times and check the gradients
for _ in range(4):
    print('Excute with zero gradients'.center(60, '='))
    run_backward(reset_grad=True)

loss:
 tensor(536.5044, grad_fn=<SumBackward0>)
Check loss requires_grad:  True
w:  tensor([0.1856, 1.8790, 0.1220], requires_grad=True)
b:  tensor([-1.1403], requires_grad=True)
Gradients of w:  tensor([-42.2015, 125.3335, -48.8236])
Gradients of b:  tensor([41.6599])
w:  tensor([0.1856, 1.8790, 0.1220], requires_grad=True)
b:  tensor([-1.1403], requires_grad=True)
Gradients of w:  tensor([-84.4031, 250.6670, -97.6473])
Gradients of b:  tensor([83.3198])
w:  tensor([0.1856, 1.8790, 0.1220], requires_grad=True)
b:  tensor([-1.1403], requires_grad=True)
Gradients of w:  tensor([-126.6046,  376.0005, -146.4709])
Gradients of b:  tensor([124.9797])
w:  tensor([0.1856, 1.8790, 0.1220], requires_grad=True)
b:  tensor([-1.1403], requires_grad=True)
Gradients of w:  tensor([-168.8061,  501.3340, -195.2946])
Gradients of b:  tensor([166.6396])
w:  tensor([0.1856, 1.8790, 0.1220], requires_grad=True)
b:  tensor([-1.1403], requires_grad=True)
Gradients of w:  tensor([-211.0077,  626.6674, -244.1

**Q5. Linear Regression with shallow networks. Tie all the operations together in a different cell with the following steps**
1. Re-initialize w and b with random numbers.
2. Calculate loss involving unknown parameters w, b
3. Calculate gradient by calling backward on loss
4. Use gradients to update values of parameters (gradient descent update: Keep learning rate 0.01)
5. Set gradients to zero
6. Go to step 2 until convergence (value of the loss is less than tolerance, set the tolerance to
1e-5 )
7. Check whether the value of w and b is close to the true values, i.e. w_true and b_true

In [16]:
class LinearRegression():
    def __init__(self):
        self.w = torch.randn(3, requires_grad=True)
        self.b = torch.randn(1, requires_grad=True)
        self.loss = float("inf")
        self.tolerance = 1e-5
        self.iter_num = 0

    def train(self, x_train, y_true):
        print('Start Training'.center(50,'='))
        while self.loss >= self.tolerance:
            self.iteration(x_train, y_true)
            self.iter_num += 1
            if self.iter_num % 10 == 0:
                print(f"iter {self.iter_num} loss: {self.loss}")
            if self.iter_num >= 1000: break
        print('Training Complete'.center(50,'='))

    def iteration(self, x_train, y_true):
        y_pred = self.predict(x_train)
        self.loss = loss_func(y_pred, y_true) # Calculate loss
        self.loss.backward()
        self.update_parameters()

    def predict(self, input):
        y_pred = torch.matmul(input, self.w) + self.b
        return y_pred

    def update_parameters(self, learning_rate=0.01):
        # If we don't use torch.no_grad then weight update step will be added to 
        # the computation graph of the Neural Network which is not desired.
        with torch.no_grad():
            # Update values of parameters
            self.w -= self.w.grad * learning_rate
            self.b -= self.b.grad * learning_rate
            # Set gradients to zero
            self.w.grad.zero_()
            self.b.grad.zero_()

In [17]:
# Data Generation
x = torch.randn(10, 3) # Create a random normal tensor x
w_true = torch.randn(3) * 5 # Create two randn tensors w_true, b_true
b_true = torch.randn(1) - 2
y_true = torch.matmul(x, w_true) + b_true # Generate tensor y_true

# Model training
lr_model = LinearRegression()
lr_model.train(x_train=x, y_true=y_true)
y_pred = lr_model.predict(x)
loss = loss_func(y_pred, y_true)
print(f'The Final Loss is: {loss}')

iter 10 loss: 65.78162384033203
iter 20 loss: 11.58525562286377
iter 30 loss: 2.0746517181396484
iter 40 loss: 0.3730771541595459
iter 50 loss: 0.06722046434879303
iter 60 loss: 0.012130334042012691
iter 70 loss: 0.002191745676100254
iter 80 loss: 0.0003965335781686008
iter 90 loss: 7.179342355811968e-05
iter 100 loss: 1.3017619494348764e-05
The Final Loss is: 7.807415386196226e-06


**Q6. Now we will change the problem to a Logistic Regression Problem**
1. Change y_true to be 1 if y_true > 0 else 0 , make it a float tensor and make sure it is
on the device.
2. Define a new loss function to include a sigmoid transformation of prediction y , to make it a
probability. Change the loss to cross-entropy loss for binary classification with probability y.
3. Repeat the steps of Q5 for this problem and check the final values of w and b, make
stopping criteria to be loss value <= 0.05 . Keep learning rate the same as before.
4. You might notice that you do not recover the w_true and b_true value.

Took 13383 iterations
tensor([ -6.7814,   0.7458, -10.0162], device='cuda:0', requires_grad=True) tensor([0.2812], device='cuda:0', requires_grad=True)


In [None]:
b