<a href="https://colab.research.google.com/github/Pepcoders/Data-Science-January/blob/main/Pytorch/Pytorch_Intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import torch

In [2]:
a = torch.empty(6,3)
print(a)

tensor([[4.4402e-35, 0.0000e+00, 7.0065e-44],
        [7.0065e-44, 6.3058e-44, 6.7262e-44],
        [7.4269e-44, 6.3058e-44, 6.8664e-44],
        [7.2868e-44, 1.1771e-43, 6.8664e-44],
        [7.5670e-44, 8.1275e-44, 6.7262e-44],
        [6.8664e-44, 8.1275e-44, 7.2868e-44]])


In [3]:
a = torch.rand(4,3)
print(a)

tensor([[0.5160, 0.6096, 0.6928],
        [0.7994, 0.9095, 0.8827],
        [0.3924, 0.0766, 0.2542],
        [0.6606, 0.8452, 0.0583]])


In [4]:
a = torch.zeros(4,3, dtype=torch.long)
print(a)

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])


In [5]:
a = torch.tensor([7.8, 5])
type(a)

torch.Tensor

In [6]:
a = a.new_ones(6,5, dtype=torch.double)    # new methods take in sizes
print(a)

tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]], dtype=torch.float64)


In [7]:
print(a.size())

torch.Size([6, 5])


In [8]:
b = torch.rand(6,5)
print(a + b)

tensor([[1.1740, 1.0313, 1.0562, 1.0073, 1.1170],
        [1.0859, 1.8642, 1.8961, 1.0506, 1.0157],
        [1.4998, 1.9887, 1.4093, 1.8988, 1.3330],
        [1.1299, 1.7396, 1.6964, 1.4656, 1.7564],
        [1.7740, 1.6679, 1.1318, 1.5097, 1.6687],
        [1.0027, 1.1932, 1.5086, 1.2113, 1.2437]], dtype=torch.float64)


In [9]:
print(torch.add(a,b))

tensor([[1.1740, 1.0313, 1.0562, 1.0073, 1.1170],
        [1.0859, 1.8642, 1.8961, 1.0506, 1.0157],
        [1.4998, 1.9887, 1.4093, 1.8988, 1.3330],
        [1.1299, 1.7396, 1.6964, 1.4656, 1.7564],
        [1.7740, 1.6679, 1.1318, 1.5097, 1.6687],
        [1.0027, 1.1932, 1.5086, 1.2113, 1.2437]], dtype=torch.float64)


In [10]:
x = torch.ones(4)
print(x)

tensor([1., 1., 1., 1.])


In [11]:
y = x.numpy()
print(y)

[1. 1. 1. 1.]


In [14]:
import numpy as np
f = np.ones(4)
g = torch.from_numpy(f)
np.add(f, 1, out=f)
print(f)
print(g)

[2. 2. 2. 2.]
tensor([2., 2., 2., 2.], dtype=torch.float64)


In [15]:
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))   

tensor([2., 2., 2., 2.], device='cuda:0')
tensor([2., 2., 2., 2.], dtype=torch.float64)


The key step is between the last convolution and the first Linear block. Conv2d outputs a tensor of shape [batch_size, n_features_conv, height, width] whereas Linear expects [batch_size, n_features_lin]. To make the two align you need to "stack" the 3 dimensions [n_features_conv, height, width] into one [n_features_lin]. As follows, it must be that n_features_lin == n_features_conv * height * width. In the original code this "stacking" is achieved by

x = x.view(-1, self.num_flat_features(x))

and if you inspect num_flat_features it just computes this n_features_conv * height * width product. In other words, your first conv must have num_flat_features(x) input features, where x is the tensor retrieved from the preceding convolution. But we need to calculate this value ahead of time, so that we can initialize the network in the first place...

The calculation follows from inspecting the operations one by one.

input is 32x32
we do a 5x5 convolution without padding, so we lose 2 pixels at each side, we drop down to 28x28
we do maxpooling with receptive field of 2x2, we cut each dimension by half, down to 14x14
we do another 5x5 convolution without padding, we drop down to 10x10
we do another maxpooling, we drop down to 5x5
and this 5x5 is why in the tutorial you see self.fc1 = nn.Linear(16 * 5 * 5, 120). It's n_features_conv * height * width, when starting from a 32x32 image. If you want to have a different input size, you have to redo the above calculation and adjust your first Linear layer accordingly.

For the further operations, it's just a chain of matrix multiplications (that's what Linear does). So the only rule is that the n_features_out of previous Linear matches n_features_in of the next one. Values 120 and 84 are entirely arbitrary, though they were probably chosen by the author such that the resulting network performs well.

In [18]:
#Le-Net Architecture

import torch
import torch.nn as nn
import torch.nn.functional as F


class network(nn.Module):

    def __init__(self):
        super(network, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = mx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = network()
print(net)

network(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


In [19]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

10
torch.Size([6, 1, 3, 3])


In [20]:
inp = torch.randn(1, 1, 32, 32)
out = net(inp)
print(out)

tensor([[-0.0206, -0.0353, -0.0252, -0.1303,  0.0772,  0.0003,  0.0943, -0.0841,
         -0.0413,  0.0454]], grad_fn=<AddmmBackward0>)


In [21]:
output = net(inp)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(0.9857, grad_fn=<MseLossBackward0>)


In [None]:
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)