### Tensors

Tensor is similar to Numpy's ndarray, the additional point for tensors in we can use it in GPUs to accelerate computing.

In [1]:
from __future__ import print_function
import torch

Note:-  Uninitialized matrix is declared, but does't contain definite known values before it is used. When we created un Unintialized matrix whatever values were allocated inside the memory will apear as the initial values.

Construct 6x3 matrix, uninitialized:

In [2]:
a = torch.empty(6,3)
print(a)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])


Construct a randomly initialized matrix:

In [3]:
a = torch.rand(4,3)
print(a)

tensor([[0.8034, 0.0934, 0.6888],
        [0.3241, 0.5029, 0.5854],
        [0.7264, 0.2559, 0.0858],
        [0.5570, 0.8004, 0.2291]])


Construct a matrix filled zeros and of dtype long:

In [6]:
a = torch.zeros(4,3, dtype = torch.long)
print(a)
print(type(a))
print(a.dtype)

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])
<class 'torch.Tensor'>
torch.int64


### Construct a tensor with data:

In [11]:
a = torch.tensor([7.8,5])
print(a)

tensor([7.8000, 5.0000])


Or we can create new tensor with existing tensor. These methods we reuse its prpperties of input tensor, e.g. dtype, unless new values are provided by us.

In [12]:
print(a)
a = a.new_ones(6,5, dtype = torch.double)# new methos take in size.
print(a)

tensor([7.8000, 5.0000])
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]], dtype=torch.float64)


In [13]:
a = torch.randn_like(a, dtype = torch.float) # Override dtype
print(a) # result will be the same size

tensor([[-0.0063,  0.1055, -0.7122, -1.1404, -0.0680],
        [-0.7920, -0.4047,  1.8657, -0.2069,  1.2807],
        [ 1.1231, -1.2583, -0.2503,  0.9851,  0.3088],
        [-1.8640,  0.6418,  0.7276,  1.0641, -0.4141],
        [ 0.6147,  0.1966, -0.3201,  0.1480, -0.7002],
        [ 0.9660, -0.8856, -1.2558,  0.0765,  0.9836]])


Let's get the size.

In [14]:
print(a.size())

torch.Size([6, 5])


Note:- torch_size is actually a tuple, so it supports all tuple operations

### Operations

There are multiple syntaxes for operations. In the following examples, we used addition operation.

Addition: Syntax 1

In [16]:
print(a)

tensor([[-0.0063,  0.1055, -0.7122, -1.1404, -0.0680],
        [-0.7920, -0.4047,  1.8657, -0.2069,  1.2807],
        [ 1.1231, -1.2583, -0.2503,  0.9851,  0.3088],
        [-1.8640,  0.6418,  0.7276,  1.0641, -0.4141],
        [ 0.6147,  0.1966, -0.3201,  0.1480, -0.7002],
        [ 0.9660, -0.8856, -1.2558,  0.0765,  0.9836]])


In [15]:
b = torch.rand(6,5)
print(a+b)

tensor([[ 0.6181,  0.9772, -0.1690, -0.3744,  0.0573],
        [-0.7637, -0.2883,  2.0585,  0.6161,  1.3623],
        [ 1.9069, -1.1586,  0.0023,  1.3490,  0.5960],
        [-1.6303,  1.0654,  1.4319,  1.8383,  0.3667],
        [ 0.9387,  0.5822,  0.6499,  0.7336, -0.2308],
        [ 1.6057, -0.6942, -1.0637,  0.8349,  1.0430]])


Addition: Syntax 2

In [18]:
print(a)
print(b)

tensor([[-0.0063,  0.1055, -0.7122, -1.1404, -0.0680],
        [-0.7920, -0.4047,  1.8657, -0.2069,  1.2807],
        [ 1.1231, -1.2583, -0.2503,  0.9851,  0.3088],
        [-1.8640,  0.6418,  0.7276,  1.0641, -0.4141],
        [ 0.6147,  0.1966, -0.3201,  0.1480, -0.7002],
        [ 0.9660, -0.8856, -1.2558,  0.0765,  0.9836]])
tensor([[0.6244, 0.8717, 0.5432, 0.7661, 0.1253],
        [0.0283, 0.1165, 0.1928, 0.8230, 0.0816],
        [0.7838, 0.0997, 0.2527, 0.3639, 0.2872],
        [0.2337, 0.4236, 0.7043, 0.7742, 0.7808],
        [0.3240, 0.3857, 0.9700, 0.5857, 0.4694],
        [0.6397, 0.1914, 0.1921, 0.7584, 0.0594]])


In [17]:
print(torch.add(a,b))

tensor([[ 0.6181,  0.9772, -0.1690, -0.3744,  0.0573],
        [-0.7637, -0.2883,  2.0585,  0.6161,  1.3623],
        [ 1.9069, -1.1586,  0.0023,  1.3490,  0.5960],
        [-1.6303,  1.0654,  1.4319,  1.8383,  0.3667],
        [ 0.9387,  0.5822,  0.6499,  0.7336, -0.2308],
        [ 1.6057, -0.6942, -1.0637,  0.8349,  1.0430]])


Addition: Provisiong an output as an arguments

In [19]:
result = torch.empty(6,5)
torch.add(a,b, out = result)
print(result)

tensor([[ 0.6181,  0.9772, -0.1690, -0.3744,  0.0573],
        [-0.7637, -0.2883,  2.0585,  0.6161,  1.3623],
        [ 1.9069, -1.1586,  0.0023,  1.3490,  0.5960],
        [-1.6303,  1.0654,  1.4319,  1.8383,  0.3667],
        [ 0.9387,  0.5822,  0.6499,  0.7336, -0.2308],
        [ 1.6057, -0.6942, -1.0637,  0.8349,  1.0430]])


Addition:- in place

In [20]:
# adds a to b
b.add_(a)
print(b)

tensor([[ 0.6181,  0.9772, -0.1690, -0.3744,  0.0573],
        [-0.7637, -0.2883,  2.0585,  0.6161,  1.3623],
        [ 1.9069, -1.1586,  0.0023,  1.3490,  0.5960],
        [-1.6303,  1.0654,  1.4319,  1.8383,  0.3667],
        [ 0.9387,  0.5822,  0.6499,  0.7336, -0.2308],
        [ 1.6057, -0.6942, -1.0637,  0.8349,  1.0430]])


Note:- Any operation that mutates a tensor in-place is post-fixed with an. for example:- a.copy(b), a.b_() will change a.

We can use standard Numpy-like indexing with all bells and whistles!

In [21]:
print(a[:,2])

tensor([-0.7122,  1.8657, -0.2503,  0.7276, -0.3201, -1.2558])


Resizing:- We can resize or reshape tensor, use tensor.view for that:

In [24]:
a = torch.randn(3,3)
b = a.view(9)
c = a.view(-1,9) # The size -1 is inferred  from other dimesions
print(a.size())
print(b.size())
print(c.size())

torch.Size([3, 3])
torch.Size([9])
torch.Size([1, 9])


If you have one value tensor, use.item() to get the value of the python number

In [26]:
a = torch.randn(1)
print(a)
print(a.item())

tensor([-0.6473])
-0.6472882628440857


### Numpy Bridge

Converting a torch tensor to numpy array and vice versa is breeze

The Torch Tensor and Numpy array will share thweir underlying memory locations (if the torch tensor is on CPU), and changing one will change the other.

### Converting a Torch Tensor to Nupy Array

In [28]:
x = torch.ones(4)
print(x)

tensor([1., 1., 1., 1.])


In [29]:
y = x.numpy()
print(y)

[1. 1. 1. 1.]


See How numpy array changed in value

In [30]:
x.add_(1)
print(x)
print(y)

tensor([2., 2., 2., 2.])
[2. 2. 2. 2.]


### Converting Numpy Array to Torch Tensor

lets see how changing the numpy array chanmged the torch tensor automatically

In [31]:
import numpy as np
f = np.ones(4)
g = torch.from_numpy(f)
np.add(f,1,out = f)
print(f)
print(g)

[2. 2. 2. 2.]
tensor([2., 2., 2., 2.], dtype=torch.float64)


All the tensor on the cpu except a charTensor Support converting to Numpy and back.

## Tensors Can be Moved Onto Any Device Using The .to method.

## Any device in general CPU and GPU and some special casxe TPU is also there.

## AUTOGRAD:- Automatic Differentiation 

In [32]:
import torch

Create a Tensor and Set requires_grad = True to track cimputation with it.

In [33]:
a = torch.ones(2,2, requires_grad = True)
print(a)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


Do A Tensor Operation

In [34]:
b = a+2
print(b)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


b was created as a result of an operation, so it has a grad_fn

In [35]:
print(b.grad_fn)

<AddBackward0 object at 0x0000012AF83A6400>


Do More Operation On b

In [36]:
c = b * b* 3
out = c.mean()

print(c, out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


.requires_grad_() changes an existing Tensor's requires_grad flag in-place.
The Input Flag Defaults to false if not given.

In [37]:
p = torch.randn(3,3)
p = ((p * 3) / (p-1))
print(p.requires_grad)


False


In [41]:
p.requires_grad_(True)
print(p.requires_grad)

True


In [43]:
q = (p * p).sum()
print(q)
print(q.grad_fn)

tensor(3154.6455, grad_fn=<SumBackward0>)
<SumBackward0 object at 0x0000012AF37CA710>


### Gradients

Let's backdrop now. Because out contains a single scaler, out.backword is equivalent to out.backward(torch.tensor(1.)).

In [44]:
out.backward()

print gradients d(out)/dx

In [46]:
print(a)
print(a.grad)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


you should have got a matrix of 4.5.

### Mathematically - Jacobians and Vectors

Now Let's Have a look at an example of vector - jacobian product.

In [47]:
x = torch.randn(3, requires_grad = True)

y = x * 2

while y.data.norm() < 1000:
    y = y * 2
    
print(y)

tensor([  408.5759, -1318.5188, -1170.6854], grad_fn=<MulBackward0>)


Now in this case y is no longer a scalar. torch.autograd(torch.tensor(1.)) could not compute the full jacobian directly, but if we just want the vector- jacobian product simply pass the vector to backward as argument:

In [48]:
v = torch.tensor([0.1, 1.0, 0.0001], dtype = torch.float)
y.backward(v)

print(x.grad)

tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])


You can also stop autograd from tracking history on Tesors with .requires_grad = True either by wrapping the code block in with torch.no_grad():

In [49]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print(( x ** 2).requires_grad)

True
True
False


or by using .detach() to get a new Tensor with the same content but that does not require gradients:

In [50]:
print(x.requires_grad)
y = x.detach()
print(y.requires_grad)
print(x.eq(y).all())

True
False
tensor(True)


### Define The Network

Let's Define The Network

In [51]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class network(nn.Module):
    
    def __int__(self):
        super(network, self).__init__()
        # 1 Input image channel, 6 output channels, 3x3 square convolution
        # Kernel
        self.conv1 = nn.Conv2d(1,6,3)
        self.conv2 = nn.Conv2d(6,16,3)
        # an affine operation: y = mx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)# 6*6 from image dimesions
        self.fc2 = nn.Linear(120,84)
        self.fc3 = nn.Linear(84,10)
        
    def forward(self,x):
        # Max Pooling over a (2,2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
    def num_flat_features(self,x):
        size = x.size()[1:] # All Dimesions except the batch dimesion
        num_features = 1
        for s in size:
            num_features *= s
        return num_features
    
    
net = network()
print(net)

network()


You just have to define the forward and the backward function (where gradients are computed) is automatically defined for you using autograd. You can use ant of the tensor operations in the forward function.

The Learnable parameters of a model are returened by net.parameters()

In [55]:
params = list(net.parameters())
print(len(params))
print(params) # Conv1's.weight

0
[]


Let's try a random 32x32 input. Note:- Expected input size of this net(LeNet) is 32x32. To use this net on the MNIST dataset, please resize the images from the dataset to 32x32.

In [56]:
inp = torch.randn(1,1,32,32)
out = net(inp)
print(out)

AttributeError: 'network' object has no attribute 'conv1'

Zero the gradient buffers of all parametrs and backprops with random gradients:

In [57]:
net.zero_grad()
out.backward(torch.randn(1,10))

RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1, 10]) and output[0] has a shape of torch.Size([]).

Note:- torch.nn only supports mini-batches. The entire torch.nn package only supports inputs that are a mini batch of samples and not a single sample. for example nn.Conv2d will take in a 4D tensor of nSamples xnChannels x Height x Width. if you have a single sample  just use input.unsqueeze(0) to add a fake batch dimesion.

### At this point, we covered.

 Defined Neural Network, 
 Processing Input and calling Backward

### Loss Function

For Example

In [58]:
output = net(inp)
target = torch.randn(10) # a dummy target for example
target = target.view(1,-1) # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

AttributeError: 'network' object has no attribute 'conv1'

For Illustration, let us follow a few steps backward:

In [59]:
print(loss.grad_fn) # MSELoss
print(loss.grad_fn.next_functions[0][0]) # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # Relu

NameError: name 'loss' is not defined

### Backprop 

Now we shall call loss.backward(), and have a look at conv1's bias gradients before and after the backward.

In [60]:
net.zero_grad() # zeros the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bia.grad)

conv1.bias.grad before backward


AttributeError: 'network' object has no attribute 'conv1'

### Update The Weights 

We can implement this using simple python code:

In [61]:
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

However, as you use neural networks, you want to use various differents update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable this we built a small package: torch.optim that implements all these methods. using it is very simple.

In [63]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr = 0.01)

# in your training loop:
optimizer.zero_grad() # Zero The Gradient Buffers

output = net(inp)
loss = criterion(output, target)
loss.backward()
optimizer.step()# Does The Update

ValueError: optimizer got an empty parameter list

Observed How Gradient buffers had to be manually set to zero using optimizer.zero_grad(). This is because gradients are accumulated as explained in the backprop section.