### In this Notebook, I will start with the same flow as mentioned in the official textbook.

* How to train a pretrained model in Pytorch
* How to work with pytorch like tensor, numpy to tensor, named tensor, storage in pytorch etc.
* How to implement One-hot encoding in Pytorch.
* How to handle Time series data in Pytorch
* How to handle Images in Pytorch, changing dimension as required in pytorch module.
* How to perform Word to Index conversion
* How to split data between train and validation
* How to build model in Pytorch
* How to find derivative in pytorch
* How to move from training mode to evaluation mode
* How to train a network with help of Optimizer and Criterion (Loss Function).

## How to predict a single image on a Pre-trained Model - Resnet34

### Importing Libraries

Pytorch provides three set of libraries i.e. **torchvision, torchaudio, torchtext for Computer Vision, Audio and Text respectively.**

In [2]:
import torch
from torchvision import models
from torchvision import transforms
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import os
import cv2
import torch.optim as optim
import torch.nn as nn

### Loading Pretrained Model

Using Models module from Torchvision, we can load all the pretrained models.
I am loading a resnet34 model with **pretrained=True**, which means i will be using weights of the model on which it was trained on. 
Resnet34 was trained on Imagenet dataset.

In [3]:
resnet = models.resnet34(pretrained=True)

### Creating Transforms

Transforms is cool feature in torchvision, because we can apply a list of **transforms/augmentation** on an image by just simply adding it as parameter in transforms module. We can also customize other transforms if required, if its not included in **torchvision.transforms**.

In [4]:
preprocess= transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean = [0.5, 0.5, 0.5],
        std = [0.2, 0.2, 0.2])
    ])

### Load your Single File

List of things happening in the below cell

* I am using Python Image Library (PIL) for loading an single image.
* Applying the transformation declared above.
* Checking out the shape of the image.
* Note the shape of the image, it should apply the transforms.centercrop() and resize the image.

In [10]:
img = Image.open('/home/mayur/Desktop/opencv_tut/Images/traffic.jpeg')
img_p = preprocess(img)
print(img_p.shape)

torch.Size([3, 224, 224])


### Model requires batch size as first dimension, so we reshape the image dimension.

Its quite a standard practice to keep the dimension of the batch size as first dimension.
In pytorch, we use **unsqueeze** to add an dimension to existing matrix.

In [11]:
batch_t = torch.unsqueeze(img_p, 0)
batch_t.shape

torch.Size([1, 3, 224, 224])

In pytorch, it is easy to switch between training phase and testing phase. 

**Model.eval(),** disables the batchnorm and dropout layers of a models. Similarly we have **model.train()** phase for training.

**Since we are predicting using pretrained model, we use model under eval mode. 
And in eval mode, the batchNorm and Dropout layers of the model will be disabled.
we are initializing the resnet34 model and predicting on one image under batch_t variable.
The out variable contains our predicted output over 1000 classes. Since resnet was trained on Imagenet, which has 1000 classes**

In [12]:
resnet.eval()
out = resnet(batch_t)
print(out.shape)

torch.Size([1, 1000])


### Loading Images Classes from txt file

Below, we are loading the class names of the classes in Imagenet.

In [14]:
with open('imagenet_class.txt') as f:
    classes = [line.strip().split(",")[1].strip() for line in f.readlines()]

### Finding Index of the max probability class

The variable **out** is a vector with 1000 elements with set of values providing weights to each class. Higher weight of the class results as predicted class of the image. Using max and dimension=1, we are fetching the index of the vector, where the weight is maximum among 1000 classes.

In [26]:
_, index = torch.max(out, 1)

In [27]:
index

tensor([920])

### Confidence of Prediction
**Softmax function is used in Multiclass classification, it squeezes the value/weight as mentioned above between 0 and 1.** 
So the all 1000 weights are squeezed between 0 to 1 and **all summing up to 1.** We further convert the class index into label of the class and present it as confidence percentage.

In [12]:
percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100
classes[index[0]], percentage[index[0]].item()

('traffic_light', 99.99995422363281)

### Top 5 predictions

Similar to above code, only showing top five predictions of an image.

In [13]:
_, indices = torch.sort(out, descending=True)
[(classes[idx], percentage[idx].item()) for idx in indices[0][:5]]

[('traffic_light', 99.99995422363281),
 ('street_sign', 2.8018390366923995e-05),
 ('pole', 9.717282409837935e-06),
 ('loudspeaker', 2.9554805678344565e-06),
 ('binoculars', 1.4750306718269712e-06)]

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

### Basics of Pytorch

* Declare the matrix with 5x5 dimension with 3 channel.
* Creating a vector of size 3.

In [29]:
img_t = torch.randn(3, 5, 5) # shape [channels, rows, columns]
weights = torch.tensor([0.2126, 0.7152, 0.0722])
print(img_t.shape)
print(weights.shape)

torch.Size([3, 5, 5])
torch.Size([3])


* Creating Matrix with batch dimension in comparision with Image.

In [30]:
batch_t = torch.randn(2, 3, 5, 5) # shape [batch, channels, rows, columns]
batch_t.shape

torch.Size([2, 3, 5, 5])

* Finding mean of the matrix. Three channel are added and the average is found. Combining three channel to one channel i.e. gray scale image.

In [32]:
img_gray_naive = img_t.mean(-3)
batch_gray_naive = batch_t.mean(-3)
print(img_gray_naive.shape, batch_gray_naive.shape)

torch.Size([5, 5]) torch.Size([2, 5, 5])


* Adding dimension to a vector to convert it into matrix, since most of the calculation in deep learning involves weight matrix multiplication.
* Multiplying weights with one matrix.
* Multiplying weights with batch of matrix.

In [36]:
print(weights.shape)
unsqueezed_weights = weights.unsqueeze(-1).unsqueeze_(-1)
print(unsqueezed_weights.shape)
unsqueezed_weights

torch.Size([3])
torch.Size([3, 1, 1])


tensor([[[0.2126]],

        [[0.7152]],

        [[0.0722]]])

In [37]:
img_weights = (img_t * unsqueezed_weights)
img_weights.shape

torch.Size([3, 5, 5])

In [38]:
batch_weights = (batch_t * unsqueezed_weights)
batch_weights.shape

torch.Size([2, 3, 5, 5])

* Converting three channel matrix to one channel matrix.

In [39]:
img_gray_weighted = img_weights.sum(-3)

In [40]:
img_gray_weighted

tensor([[-0.3185,  0.4227,  0.5607,  0.3115, -1.0247],
        [-0.0632, -0.5625,  1.3939, -1.1633, -0.3991],
        [-1.1099, -0.1263, -0.4878,  0.1427, -0.5896],
        [-0.7000,  0.7684,  1.0612, -0.0851, -0.3613],
        [ 0.3097, -1.1336,  1.5068, -1.5912,  0.6566]])

In [41]:
batch_gray_weighted = batch_weights.sum(-3)

In [42]:
batch_gray_weighted.shape

torch.Size([2, 5, 5])

In [43]:
batch_weights.shape, batch_t.shape, unsqueezed_weights.shape

(torch.Size([2, 3, 5, 5]), torch.Size([2, 3, 5, 5]), torch.Size([3, 1, 1]))

### Fancy of doing the mean calculation as mentioned above.

In [31]:
img_gray_weighted_fancy = torch.einsum('...chw,c->...hw', img_t, weights)
batch_gray_weighted_fancy = torch.einsum('...chw,c->...hw', batch_t, weights)
batch_gray_weighted_fancy.shape

torch.Size([2, 5, 5])

In [32]:
img_gray_weighted_fancy

tensor([[-0.4142, -0.2200, -0.2374, -1.1039,  0.3578],
        [ 0.0676,  0.1268,  0.9222, -1.0375,  0.2899],
        [ 1.3845,  0.1224, -0.8231, -0.3893, -0.1400],
        [-0.0285,  0.7510,  0.7330, -0.2020, -1.3866],
        [-0.1159, -0.0665, -1.3296,  1.7022,  0.0150]])

### Named Tensor

* It is new feature in Pytorch.
* It assigns name to dimension and can make calculation using those names.

In [33]:
weights_named = torch.tensor([0.2126, 0.7152, 0.0722], names=['channels'])



In [34]:
weights_named

tensor([0.2126, 0.7152, 0.0722], names=('channels',))

**Below, we are assigning first dim as channel name, second dim as row and third dim as columns. It assigns name from right to left because the 
batch dimension is assigned as None.**

In [35]:
img_named = img_t.refine_names(..., 'channels', 'rows', 'columns')
batch_named = batch_t.refine_names(..., 'channels', 'rows', 'columns')
print("img named:", img_named.shape, img_named.names)
print("batch named:", batch_named.shape, batch_named.names)

img named: torch.Size([3, 5, 5]) ('channels', 'rows', 'columns')
batch named: torch.Size([2, 3, 5, 5]) (None, 'channels', 'rows', 'columns')


In [36]:
weights_aligned = weights_named.align_as(img_named)
weights_aligned.shape, weights_aligned.names

(torch.Size([3, 1, 1]), ('channels', 'rows', 'columns'))

In [37]:
gray_named = (img_named * weights_aligned).sum('channels')
gray_named.shape, gray_named.names

(torch.Size([5, 5]), ('rows', 'columns'))

In [38]:
#gray_named = (img_named[..., :3] * weights_named).sum('channels')

* Renaming the dimensions

In [39]:
gray_plain = gray_named.rename(None)
gray_plain.shape, gray_plain.names

(torch.Size([5, 5]), (None, None))

### Tensor Storage

Tensor Storage makes pytorch quite fast. It assigns block of continous memory for each tensor (matrix or vector) and whenever operation like dimension changes are done, it happens within the same memory without assigning to any other block of memory. checkout book for more details.

In [40]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points.storage()

 4.0
 1.0
 5.0
 3.0
 2.0
 1.0
[torch.FloatStorage of size 6]

In [41]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points_storage = points.storage()
points_storage[0] = 2.0
points

tensor([[2., 1.],
        [5., 3.],
        [2., 1.]])

### Inplace replacement using _

With _, we introduce what is called as inplace replacement of values.

In [42]:
a = torch.ones(3, 2)

In [43]:
a.zero_()

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

### Playing with Storage

* Offset - tensor's offset in the underlying storage in terms of number of storage elements (not bytes).

In [45]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
second_point = points[1]
print(second_point.storage_offset())
print(points[1])

2
tensor([5., 3.])


* Stride is the jump necessary to go from one element to the next one in the
specified dimension :attr:`dim`. A tuple of all strides is returned when no
argument is passed in. Otherwise, an integer value is returned as the stride in
the particular dimension.

In [54]:
a = list(range(9))
a = torch.tensor(a)
a.size()
a.stride(dim=0)

1

In [55]:
a = a.view(3,3)

In [56]:
a.stride(dim=0)

3

### Check how tensor storage works, a and b are referencing the same memory block.

In [58]:
a.view(3,3)
b = a.view(3,3)

In [59]:
id(a.storage)==id(b.storage)

True

In [61]:
# Manipulating dimension
c = b[1:, 1:]
c

tensor([[4, 5],
        [7, 8]])

### Working with Images

In [62]:
import imageio
image_arr = imageio.imread('/home/mayur/Desktop/opencv_tut/Images/traffic.jpeg')
image_arr.shape

(259, 194, 3)

* Pytorch format for image C * H * W
* Convert from Numpy to torch.tensor
* Using Permute, aligning the dimension as required by pytorch. it moves dimension as mentioned in permute.

In [63]:
img_tensor = torch.from_numpy(image_arr)
img_tensor_chw = img_tensor.permute(2, 0, 1)
img_tensor_chw.shape

torch.Size([3, 259, 194])

### Including Batch 1 at different Position

In [64]:
print(img_tensor_chw.unsqueeze(0).shape)
print(img_tensor_chw.unsqueeze(1).shape)
print(img_tensor_chw.unsqueeze(-1).shape)

torch.Size([1, 3, 259, 194])
torch.Size([3, 1, 259, 194])
torch.Size([3, 259, 194, 1])


### Alternative Way to load images

Declaring Zero matrix with batch size.

In [65]:
batch_size = 3
batch = torch.zeros(batch_size, 3, 256, 256, dtype=torch.int16)

Sometimes images also have an alpha channel indicating transparency.

* Loading 25 Images from a dataset.
* Moving a set of 3 image into batch variable since batch size is 3.

In [69]:
data = '/home/mayur/Desktop/Kaggle Notebooks/Generative Dog Images/all-dogs'
filenames = [name for name in os.listdir(data) if os.path.splitext(name)[-1] == '.jpg']
pil_img_25 = [Image.open(os.path.join(data, f)) for f in filenames[:25]]
pil_transform = transforms.Compose([transforms.Resize((256, 256))])
f = lambda: [pil_transform(img) for img in pil_img_25]

In [70]:
for i, file in enumerate(f()):
    if i == 3:
        break
    file = np.asarray(file)
    file = torch.from_numpy(file).permute(2, 0, 1)
    batch[i] = file



In [71]:
batch.shape

torch.Size([3, 3, 256, 256])

### Normalization

while applying transforms, we included parameters like image mean and standard deviation. Here, we can calculate, how to find the 
image mean and standard deviation of image. **In general, we find image mean and standard deviation of the whole dataset, use that as our parameter in transforms.**

#### One way to normalize a grayscale image is image/=255.0

In [73]:
### Second way to Normalize
batch = batch.float()

n_channels = batch.shape[1]
for c in range(n_channels):
    mean = torch.mean(batch[:, c])
    std = torch.std(batch[:, c])
    batch[:, c] = (batch[:, c] - mean) / std

### One Hot Encoding in Torch

To handle categorical variables like class names or text feature etc, we use one hot encoding.
We create a zero matrix of 25 rows and then plug a value 1 at each index along the dimension mentioned in _scatter. Checkout docstring for more details.

In [80]:
target = torch.randint(1, 5, (25,))
print(target)

tensor([4, 2, 1, 2, 2, 2, 3, 1, 2, 2, 4, 2, 3, 3, 2, 2, 4, 3, 1, 3, 1, 2, 4, 2,
        4])


In [82]:
target_onehot = torch.zeros(target.shape[0], 5)
target_onehot.scatter_(1, target.unsqueeze(1), 1.0)

tensor([[0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

* Preforming Normalization on data.
* Filtering records based on condition.

In [83]:
data = torch.randn(25, 10)
d_mean = torch.mean(data, dim=0)
d_var = torch.var(data, dim=0)
data_normalized = (data - d_mean) / torch.sqrt(d_var)
print(data_normalized.shape)

torch.Size([25, 10])


In [84]:
bad_index = target<=2
bad_data = data[bad_index]
bad_data.shape

torch.Size([15, 10])

### Working on Time Series using Bike Sharing dataset

* Loading dataset using Numpy
* Skip the column names
* Convert string to float

In [115]:
# 2 Years Data
bikes_numpy = np.loadtxt("/home/mayur/Desktop/Pytorch/data/hour-fixed.csv",
dtype=np.float32,
delimiter=",",
skiprows=1,
converters={1: lambda x: float(x[8:10])})

* Convert numpy to pytorch tensor.
* Check after how many elements the next(second) records starts from zero offset.

In [116]:
print(bikes_numpy.shape)
bikes = torch.from_numpy(bikes_numpy)
print(bikes.stride())

(17520, 17)
(17, 1)


* **Convert the hour based into days as 1st dim, 24 hours as 2nd dim and 17 features as 3rd dim. It is quite common to reshape the timeseries data to find seasonality or trends.**
* The original data is presented on hour bases.
* We have 2 years data, 730 days, each day has 24 hours and each hour represent 17 columns.
* We convert the hours into days, which becomes our records or rows. Each row or record is further segregated among hours our second dim.
* Each row in second dim contains 17 elements.
* Stride value (408, 17, 1), to jump to next day records i need to jump 408 elements.

In [117]:
daily_bikes = bikes.view(-1, 24, bikes.shape[1])
print(daily_bikes.shape)
print(daily_bikes.shape, daily_bikes.stride())

torch.Size([730, 24, 17])
torch.Size([730, 24, 17]) (408, 17, 1)



Stride : 24 * 17 = 408

N: 730 days

* Reshaping the data to make available for training purpose.
* Converting Class into One Hot encoding

In [118]:
daily_bikes = daily_bikes.transpose(1, 2)
print(daily_bikes.shape, daily_bikes.stride())
first_day = bikes[:24].long()
weather_onehot = torch.zeros(first_day.shape[0], 4)
first_day[:,9]

torch.Size([730, 17, 24]) (408, 1, 17)


tensor([1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 2, 2, 2, 2])

In [119]:
weather_onehot.scatter_(
dim=1,
index=first_day[:,9].unsqueeze(1).long() - 1,
value=1.0)

tensor([[1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [0., 1., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.],
        [0., 0., 1., 0.],
        [0., 0., 1., 0.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.]])

* Combining the one hot encoded matrix with feature matrix.
* Creating Zero Matrix for Class label.

In [120]:
torch.cat((bikes[:24], weather_onehot), 1)[:1]
daily_weather_onehot = torch.zeros(daily_bikes.shape[0], 4,
daily_bikes.shape[2])

In [121]:
daily_weather_onehot[0]
daily_weather_onehot.scatter_(
1, daily_bikes[:,9,:].long().unsqueeze(1) - 1, 1.0)
daily_weather_onehot.shape

torch.Size([730, 4, 24])

In [122]:
daily_weather_onehot[0]

tensor([[1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1.,
         0., 0., 1., 1., 1., 1.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         1., 1., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0.]])

In [123]:
daily_bikes = torch.cat((daily_bikes, daily_weather_onehot), dim=1)
daily_bikes.shape

torch.Size([730, 21, 24])

In [124]:
daily_bikes[:, 9, :][0]

tensor([1., 1., 1., 1., 1., 2., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.,
        3., 3., 2., 2., 2., 2.])

In [105]:
daily_bikes[:, 9, :] = (daily_bikes[:, 9, :] - 1.0) / 3.0

In [126]:
daily_bikes[0].shape

torch.Size([21, 24])

### Keeping Temp values between [0, 1] with standardization

In [107]:
temp = daily_bikes[:, 10, :]
temp_min = torch.min(temp)
temp_max = torch.max(temp)
daily_bikes[:, 10, :] = ((daily_bikes[:, 10, :] - temp_min)
/ (temp_max - temp_min))

In [108]:
temp = daily_bikes[:, 10, :]
daily_bikes[:, 10, :] = ((daily_bikes[:, 10, :] - torch.mean(temp))
/ torch.std(temp))

### Working with Text using pytorch

### Character Level Conversion

There are 128 Ascii character, we convert our text/letter into index or integer. Letters not present in ASCII are turned into 0.

In [130]:
with open("/home/mayur/Desktop/Pytorch/data/anna.txt", encoding='utf-8') as f:
    text = f.read()

In [131]:
lines = text.split("\n")
line = lines[100]
letter_t = torch.zeros(len(line), 128)
letter_t.shape

torch.Size([29, 128])

In [132]:
for i, letter in enumerate(line.lower().strip()):
    letter_index = ord(letter) if ord(letter) < 128 else 0
    #print(letter_index)
    letter_t[i][letter_index] = 1

### Word Level Conversion

* Clean the text
* Sorting the text
* Mapping the words into integer

In [136]:
def clean_words(input_str):
    punctuation = '.,;:"!?”“_-'
    word_list = input_str.lower().replace('\n',' ').split()
    word_list = [word.strip(punctuation) for word in word_list]
    return word_list

In [137]:
words_in_line = clean_words(line)
line, words_in_line

('despair, and found no answer.', ['despair', 'and', 'found', 'no', 'answer'])

In [138]:
word_list = sorted(set(clean_words(text)))
word2index_dict = {word: i for (i, word) in enumerate(word_list)}
len(word2index_dict.keys())

15070

* Creating a Matrix where columns size is length of all words and each row is length of each document.

In [139]:
word_t = torch.zeros(len(words_in_line), len(word2index_dict))
word_t.shape

torch.Size([5, 15070])

In [140]:
for i, word in enumerate(words_in_line):
    word_index = word2index_dict[word]
    word_t[i][word_index] = 1
    print('{:2} {:4} {}'.format(i, word_index, word))

 0 3588 despair
 1  680 and
 2 5362 found
 3 8926 no
 4  732 answer


### How to create a Model

* Creating linear regression
* Mapping of x and y.
* Turing List into Tensor.
* Create a function for basic line equation
* Create a function to measure the loss using mean square error.

In [141]:
t_c = [0.5, 14.0, 15.0, 28.0, 11.0, 8.0, 3.0, -4.0, 6.0, 13.0, 21.0]
t_u = [35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4]
t_c = torch.tensor(t_c)
t_u = torch.tensor(t_u)

In [142]:
def model(t_u, w, b):
    return w*t_u + b

In [143]:
def loss(t_p, t_c):
    return torch.mean((t_p-t_c)**2)

In [144]:
w = torch.ones(())
b = torch.zeros(())

In [145]:
t_p = model(w, t_u, b)
Loss = loss(t_p, t_c)
Loss

tensor(1763.8846)

* Learning rate - delta
* Update weights w with delta.
* Measure Loss

In [146]:
delta = 0.1

loss_rate_of_change_w = \
(loss(model(t_u, w + delta, b), t_c) -
loss(model(t_u, w - delta, b), t_c)) / (2.0 * delta)

In [147]:
loss_rate_of_change_w

tensor(4517.2979)

In [148]:
#(loss(model(t_u, w + delta, b), t_c) - loss(model(t_u, w - delta, b), t_c)) / (2 * delta)

In [149]:
learning_rate = 1e-2

In [150]:
w = w - learning_rate * loss_rate_of_change_w

In [151]:
loss_rate_of_change_b = \
(loss(model(t_u, w, b + delta), t_c) -
loss(model(t_u, w, b - delta), t_c)) / (2.0 * delta)
b = b - learning_rate * loss_rate_of_change_b

### Computing Derivative

d loss / d w = (d loss / d t_p) * (d t_p / d w)

**grad_fn** - Updating the weight and bias using learning rate after calculating the loss.

In [152]:
def d_loss(t_p, t_c):
    dsq_diff = 2 * (t_p - t_c) / t_p.size(0)
    
    return dsq_diff

In [153]:
#### if d(w*t_u + b)/ dw = t_u + 0 = t_u

def d_model_dw(w, t_u, b):
    return t_u 

In [154]:
#### if d(w*t_u + b)/ db = 0 + 1.0 = 1.0

def d_model_db(w, t_u, b):
    return 1.0

In [155]:
def grad_fn(t_u, t_c, t_p, w, b):
    
    dloss_dtp = d_loss(t_p, t_c)
    dloss_dw = dloss_dtp * d_model_dw(t_u, w, b)
    dloss_db = dloss_dtp * d_model_db(t_u, w, b)
    

    return torch.stack([dloss_dw.sum(), dloss_db.sum()])

* Creating training loop, to iterate over the training data to learn weight by updating it based on the loss.

In [141]:
def training_loop(n_epochs, learning_rate, params, t_u, t_c):
    for epoch in range(1, n_epochs + 1):
        w, b = params
        t_p = model(t_u, w, b) #Forward pass
        
        Loss = loss(t_p, t_c)
        grad = grad_fn(t_u, t_c, t_p, w, b) #Backward pass
        
        params = params - learning_rate * grad
        if epoch % 10 == 1:
            print('Epoch %d, Loss %f' % (epoch, float(Loss)))
            print(f'params: {params}')
            print(f'grad: {grad}')
    return params

In [142]:
params = training_loop(
n_epochs = 100,
learning_rate = 1e-2,
params = torch.tensor([1.0, 0.0]),
t_u = t_u,
t_c = t_c)

Epoch 1, Loss 1763.884644
params: tensor([ 0.1740, -0.8260])
grad: tensor([82.6000, 82.6000])
Epoch 11, Loss 30.432922
params: tensor([ 0.2113, -0.6289])
grad: tensor([-0.1006, -0.4782])
Epoch 21, Loss 29.714087
params: tensor([ 0.2143, -0.6147])
grad: tensor([-0.0065, -0.0305])
Epoch 31, Loss 29.671791
params: tensor([ 0.2145, -0.6138])
grad: tensor([-0.0004, -0.0019])
Epoch 41, Loss 29.669165
params: tensor([ 0.2146, -0.6137])
grad: tensor([-2.5302e-05, -1.1790e-04])
Epoch 51, Loss 29.669001
params: tensor([ 0.2146, -0.6137])
grad: tensor([-1.8179e-06, -8.5831e-06])
Epoch 61, Loss 29.668993
params: tensor([ 0.2146, -0.6137])
grad: tensor([-2.9802e-07, -1.6689e-06])
Epoch 71, Loss 29.668993
params: tensor([ 0.2146, -0.6137])
grad: tensor([-2.9802e-07, -1.6689e-06])
Epoch 81, Loss 29.668993
params: tensor([ 0.2146, -0.6137])
grad: tensor([-2.9802e-07, -1.6689e-06])
Epoch 91, Loss 29.668993
params: tensor([ 0.2146, -0.6137])
grad: tensor([-2.9802e-07, -1.6689e-06])


### Autograd for derivation !

Pytorch provides a **.grad** characteristic to each tensor. If a tensor is created as **require_grad = True**, then that tensor turns into learnable parameter. We can check if the parameters are getting updated after executing loss.backward(). Loss.backward() tells pytorch to update the all learnable parameters to update weight based on loss. We can turn **.grad** to zero by **zero_()** because if we don't turn the **.grad to zero then grad values gets accumulated into .grad after each epoch.**

In [156]:
params = torch.tensor([1.0, 0.0],requires_grad=True)
params.grad is None

True

In [157]:
Loss = loss(model(t_u, *params), t_c)
Loss.backward()
params.grad

tensor([4517.2969,   82.6000])

In [158]:
if params.grad is not None:
    params.grad.zero_()
params.grad

tensor([0., 0.])

In [149]:
def training_loop_AG(n_epochs, learning_rate, params, t_u, t_c):
    
    for epoch in range(1, n_epochs+1):
        
        if params.grad is not None:
            params.grad.zero_()
            
        t_p = model(t_u, *params)
        Loss = loss(t_p, t_c)
        Loss.backward()
        
        with torch.no_grad():
            params -= learning_rate * params.grad
            
        if epoch % 100 == 0:
            print(f'epoch {epoch}, Loss: {Loss}')
            
    return params

In [150]:
t_un = 0.1 * t_u #scaling down

In [151]:
training_loop_AG(
n_epochs = 1000,
learning_rate = 1e-2,
params = torch.tensor([1.0, 0.0], requires_grad=True),
t_u = t_un,
t_c = t_c)

epoch 100, Loss: 22.148710250854492
epoch 200, Loss: 16.608064651489258
epoch 300, Loss: 12.664560317993164
epoch 400, Loss: 9.857802391052246
epoch 500, Loss: 7.8601155281066895
epoch 600, Loss: 6.438284397125244
epoch 700, Loss: 5.426309585571289
epoch 800, Loss: 4.706046104431152
epoch 900, Loss: 4.1934051513671875
epoch 1000, Loss: 3.828537940979004


tensor([  4.8021, -14.1031], requires_grad=True)

### Optimizer

Optimizer is the tool used to update the weights. Pytorch provides wide range of optimizer from SGD to ADAM & many more. Check docstring for more details.
* Iterate through EPOCHs.
* Training the model and find predicted value.
* Pass the predicted value to loss function to calculate the loss.
* Turn the existing parameter/weights to zero.
* calculate the weights using optimizer based on loss.
* Apply the weights update to parameters.

In [152]:
def training_loop_optim(n_epochs, optimizer, params, t_u, t_c):
    for epoch in range(1, n_epochs+1):
        
        t_p = model(t_u, *params)
        Loss = loss(t_p, t_c)
        optimizer.zero_grad()
        
        Loss.backward()
        optimizer.step()
        
        if epoch % 100 == 0:
            print(f'Epoch: {epoch}, Loss: {Loss}')
        
    return params

In [153]:
params = torch.tensor([1.0, 0.0], requires_grad=True)
learning_rate = 1e-2
optimizer = optim.SGD([params], lr=learning_rate)

In [154]:
training_loop_optim(n_epochs=500, optimizer=optimizer,params= params,  t_u = t_un,t_c= t_c)

Epoch: 100, Loss: 22.148710250854492
Epoch: 200, Loss: 16.608068466186523
Epoch: 300, Loss: 12.664565086364746
Epoch: 400, Loss: 9.857809066772461
Epoch: 500, Loss: 7.8601179122924805


tensor([ 4.0443, -9.8133], requires_grad=True)

### Splitting Dataset into Train and Validation

* we are using indexing to split the dataset.
* Not recommend approach for splitting dataset into train and validation.

In [155]:
n_samples = t_u.shape[0]
n_val = int(0.2 * n_samples)
shuffled_indices = torch.randperm(n_samples)
train_indices = shuffled_indices[:-n_val]
val_indices = shuffled_indices[-n_val:]
train_indices, val_indices

(tensor([ 0,  6, 10,  2,  3,  8,  5,  9,  1]), tensor([4, 7]))

In [156]:
train_t_u = t_u[train_indices]
train_t_c = t_c[train_indices]
val_t_u = t_u[val_indices]
val_t_c = t_c[val_indices]
train_t_un = 0.1 * train_t_u
val_t_un = 0.1 * val_t_u

In [157]:
def training_loop(n_epochs, optimizer, params, train_t_u, val_t_u, train_t_c, val_t_c):
    for epoch in range(1, n_epochs + 1):
        train_t_p = model(train_t_u, *params)
        train_loss = loss(train_t_p, train_t_c)
        val_t_p = model(val_t_u, *params)
        val_loss = loss(val_t_p, val_t_c)

        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()

        if epoch <= 3 or epoch % 100 == 0:
            print(f"Epoch {epoch}, Training loss {train_loss.item():.4f},"
                    f" Validation loss {val_loss.item():.4f}")
    return params

In [158]:
params = torch.tensor([1.0, 0.0], requires_grad=True)
learning_rate = 1e-2
optimizer = optim.SGD([params], lr=learning_rate)

In [159]:
training_loop(
n_epochs = 500,
optimizer = optimizer,
params = params,
train_t_u = train_t_un,
val_t_u = val_t_un,
train_t_c = train_t_c,
val_t_c = val_t_c)

Epoch 1, Training loss 90.7754, Validation loss 33.5146
Epoch 2, Training loss 33.7982, Validation loss 34.4394
Epoch 3, Training loss 27.0283, Validation loss 42.0913
Epoch 100, Training loss 21.0966, Validation loss 35.6010
Epoch 200, Training loss 17.0360, Validation loss 26.2173
Epoch 300, Training loss 13.8714, Validation loss 19.2201
Epoch 400, Training loss 11.4051, Validation loss 14.0459
Epoch 500, Training loss 9.4831, Validation loss 10.2596


tensor([ 3.9074, -8.6521], requires_grad=True)

In [160]:
def training_loop(n_epochs, optimizer, params, train_t_u, val_t_u,train_t_c, val_t_c):
    for epoch in range(1, n_epochs + 1):
        train_t_p = model(train_t_u, *params)
        train_loss = loss(train_t_p, train_t_c)
        
        with torch.no_grad():
            val_t_p = model(val_t_u, *params)
            val_loss = loss(val_t_p, val_t_c)
            assert val_loss.requires_grad == False

        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()
        
    return params

### Set Grad Enabled - Acts like switch to ON and OFF the autograd

In [161]:
def calc_forward(t_u, t_c, is_train):
    with torch.set_grad_enabled(is_train):
        t_p = model(t_u, *params)
        loss = loss_fn(t_p, t_c)
    return loss

### Model - w2 * t_u ** 2 + w1 * t_u + b

In [162]:
params = torch.randn(3,) 
params.requires_grad=True
criterion = torch.nn.MSELoss()
optimizer = optim.Adam([params], lr=1e-2)

In [163]:
def new_model(t_u, w1, w2, b):
    return w2 * t_u ** 2 + w1 * t_u + b

In [230]:
def train(n_epochs, optimizer, optimizer, train_t_u, val_t_u,train_t_c, val_t_c):
    for epoch in range(1, n_epochs + 1):
        w1, w2, b = params
        train_t_p = new_model(train_t_u, w1, w2, b)
        train_loss = criterion(train_t_p, train_t_c)
        
        with torch.no_grad():
            val_t_p = new_model(val_t_u, *params)
            val_loss = criterion(val_t_p, val_t_c)
            assert val_loss.requires_grad == False

        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()
        if epoch % 100==0:
            print(f'Epoch: {epoch}, Loss: {train_loss}')
            print(params)
        
    return params

SyntaxError: duplicate argument 'optimizer' in function definition (<ipython-input-230-73a45e2719d4>, line 4)

In [165]:
train(n_epochs = 500,
optimizer = optimizer,
params=params,
train_t_u = train_t_un,
val_t_u = val_t_un,
train_t_c = train_t_c,
val_t_c = val_t_c)

Epoch: 100, Loss: 4.886602401733398
tensor([ 0.4475,  0.3748, -1.6703], requires_grad=True)
Epoch: 200, Loss: 3.933260679244995
tensor([ 0.2843,  0.4087, -2.0471], requires_grad=True)
Epoch: 300, Loss: 3.2933266162872314
tensor([ 0.1325,  0.4411, -2.3996], requires_grad=True)
Epoch: 400, Loss: 2.9597949981689453
tensor([ 0.0151,  0.4665, -2.6855], requires_grad=True)
Epoch: 500, Loss: 2.8161251544952393
tensor([-0.0629,  0.4840, -2.8981], requires_grad=True)


tensor([-0.0629,  0.4840, -2.8981], requires_grad=True)

### Developing Neural Nets Using nn.Module

**Model with one sample with one feature**

In [159]:
x = torch.ones(1)
l_model = nn.Linear(1,1)
l_model(x)
print(l_model.weight)
print(l_model.bias)

Parameter containing:
tensor([[0.3783]], requires_grad=True)
Parameter containing:
tensor([0.7125], requires_grad=True)


In [160]:
x_add_d = x.unsqueeze(1)

In [161]:
x_add_d.shape

torch.Size([1, 1])

**Model which takes batch of samples with one feature**

In [162]:
x = torch.ones(10, 1)
x

tensor([[1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [1.]])

In [163]:
l_model(x)

tensor([[1.0908],
        [1.0908],
        [1.0908],
        [1.0908],
        [1.0908],
        [1.0908],
        [1.0908],
        [1.0908],
        [1.0908],
        [1.0908]], grad_fn=<AddmmBackward>)

### Building Neural Network with One Hidden Layer
* Converting list to tensor
* Pytorch has Sequence Module similar to Keras, We can add layers in sequence, in the order of operation.
* Finding Total number of parameters in the model.
* Assigning names to layers, it is helpful when you want to retrain particular layer.
* Training the network with optimizer and criterion function.

In [218]:
x = torch.tensor(t_u).unsqueeze(1)
y = torch.tensor(t_c).unsqueeze(1)

  """Entry point for launching an IPython kernel.
  


In [219]:
seq_model = nn.Sequential(
    nn.Linear(1, 10),
    nn.Tanh(),
    nn.Linear(10, 1)
    )

In [221]:
[p.shape for p in seq_model.parameters()]

[torch.Size([10, 1]), torch.Size([10]), torch.Size([1, 10]), torch.Size([1])]

In [222]:
for name, param in seq_model.named_parameters():
    print(name, param.shape)

0.weight torch.Size([10, 1])
0.bias torch.Size([10])
2.weight torch.Size([1, 10])
2.bias torch.Size([1])


In [233]:
from collections import OrderedDict
seq_model = nn.Sequential(OrderedDict([
('hidden_linear', nn.Linear(1, 9)),
('hidden_activation', nn.Tanh()),
('output_linear', nn.Linear(9, 1))
]))

In [234]:
seq_model

Sequential(
  (hidden_linear): Linear(in_features=1, out_features=9, bias=True)
  (hidden_activation): Tanh()
  (output_linear): Linear(in_features=9, out_features=1, bias=True)
)

In [235]:
for name, param in seq_model.named_parameters():
    print(name, param.shape)

hidden_linear.weight torch.Size([9, 1])
hidden_linear.bias torch.Size([9])
output_linear.weight torch.Size([1, 9])
output_linear.bias torch.Size([1])


In [236]:
seq_model.output_linear.bias

Parameter containing:
tensor([0.3329], requires_grad=True)

In [237]:
def training_loop(n_epochs, optimizer, model, loss_fn, t_u_train, t_u_val,
t_c_train, t_c_val):
    for epoch in range(1, n_epochs + 1):
        t_p_train = model(t_u_train)
        loss_train = loss_fn(t_p_train, t_c_train)
        t_p_val = model(t_u_val)
        loss_val = loss_fn(t_p_val, t_c_val)
        optimizer.zero_grad()
        loss_train.backward()
        optimizer.step()

        if epoch == 1 or epoch % 1000 == 0:
            print(f"Epoch {epoch}, Training loss {loss_train.item():.4f},"
            f" Validation loss {loss_val.item():.4f}")

In [242]:
optimizer = optim.SGD(seq_model.parameters(), lr=1e-3)

training_loop(
n_epochs = 5000,
optimizer = optimizer,
model = seq_model,
loss_fn = nn.MSELoss(),
t_u_train = train_t_un.unsqueeze(1),
t_u_val = val_t_un.unsqueeze(1),
t_c_train = train_t_c.unsqueeze(1),
t_c_val = val_t_c.unsqueeze(1))

print('output', seq_model(val_t_un.unsqueeze(1)))
print('answer', val_t_c.unsqueeze(1))
print('hidden', seq_model.hidden_linear.weight.grad)

Epoch 1, Training loss 1.8214, Validation loss 6.1820
Epoch 1000, Training loss 1.8137, Validation loss 6.3981
Epoch 2000, Training loss 1.8061, Validation loss 6.6104
Epoch 3000, Training loss 1.7986, Validation loss 6.8202
Epoch 4000, Training loss 1.7910, Validation loss 7.0274
Epoch 5000, Training loss 1.7834, Validation loss 7.2324
output tensor([[12.5741],
        [-0.5377]], grad_fn=<AddmmBackward>)
answer tensor([[11.],
        [-4.]])
hidden tensor([[ 2.8302e-03],
        [ 8.7560e-03],
        [-1.5450e-02],
        [ 8.4116e-03],
        [-9.8095e-03],
        [-2.1610e-03],
        [ 3.1948e-05],
        [ 1.0729e-04],
        [ 1.5020e-03]])
