Author: Timur Sokhin

We will look at the basic concepts and tools of the PyTorch framework.

# Intro

[PyTorch](https://pytorch.org/) is the most preferred ML framework for researchers. PyTorch allows for flexible graph construction of neural networks (dynamic graphs), and is (subjectively) more user friendly.

Another alternative is [Tensorflow](https://www.tensorflow.org/), which is currently more common in the industry, but recent releases have brought them much closer together in principle of functioning.

The installation process provided on PyTorch official website. If you work in Google Colab, there is PyTorch as a part of kernel

# Tensors

In [None]:
# main framework
import torch
# additional functions on tensors or networks
from torch.nn import functional as F

# work with arrays in pure python
import numpy as np

Tensors is a data structure optimized for automatic differentiation. In most cases, working with them is similar to working with arrays in [numpy](https://numpy.org/doc/stable/user/whatisnumpy.html).

## Tensor from *list*

In [None]:
tensor = torch.tensor([[1, 2, 3], [2, 3, 4]])
print(tensor)

tensor([[1, 2, 3],
        [2, 3, 4]])


You can access to some meta information about tensors

In [None]:
print(tensor.dtype)
print(tensor.device)

torch.int64
cpu


You really want to use shape and size functions

In [None]:
# .shape is an alias for .size() to remain similarity with numpy
print(tensor.shape)
print(tensor.size())

torch.Size([2, 3])
torch.Size([2, 3])


You can change data type

In [None]:
print('Float\n', tensor.float())
print('Boolean\n', tensor.bool())

Float
 tensor([[1., 2., 3.],
        [2., 3., 4.]])
Boolean
 tensor([[True, True, True],
        [True, True, True]])


Or set the type with initialization

In [None]:
tensor = torch.tensor([[1, 2, 3], [2, 3, 4]], dtype=torch.float)
print(tensor)

tensor([[1., 2., 3.],
        [2., 3., 4.]])


In [None]:
tensor = torch.FloatTensor([[1, 2, 3], [2, 3, 4]])
print(tensor)

tensor([[1., 2., 3.],
        [2., 3., 4.]])


## Tensor from Numpy arrays

In [None]:
import numpy as np

narray = np.array([[1, 2, 3], [2, 3, 4]])
tensor = torch.tensor(narray)
print(tensor)
tensor = tensor * 0
print('Tensor: {}\t Numpy: {}'.format(tensor[0], narray[0]))

tensor([[1, 2, 3],
        [2, 3, 4]])
Tensor: tensor([0, 0, 0])	 Numpy: [1 2 3]


In [None]:
# Be careful!
tensor = torch.from_numpy(narray)
tensor *= 0
print('Tensor: {}\t Numpy: {}'.format(tensor[0], narray[0]))

Tensor: tensor([0, 0, 0])	 Numpy: [0 0 0]


[torch.from_array]( https://pytorch.org/docs/stable/generated/torch.from_numpy.html?highlight=from_numpy#torch.from_numpy) create tensor that share the memory with narray, any changes will be reflected!

## Tensor from shape
Sometimes you want to create tensor of some default values with specific shape

In [None]:
shape = (2, 5, 5)
tensor = torch.zeros(shape)
print('Zero:\n{}'.format(tensor))
tensor = torch.ones(shape)
print('Ones:\n{}'.format(tensor))
tensor = torch.rand(shape)
print('Random:\n{}'.format(tensor))
tensor = torch.arange(10)
print('Sequence:\n{}'.format(tensor))

Zero:
tensor([[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]])
Ones:
tensor([[[1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.]],

        [[1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.]]])
Random:
tensor([[[0.0592, 0.4121, 0.2863, 0.1080, 0.0065],
         [0.1565, 0.2808, 0.6307, 0.9060, 0.0171],
         [0.7711, 0.3198, 0.1926, 0.4217, 0.4660],
         [0.9862, 0.4093, 0.0780, 0.9964, 0.2663],
         [0.8938, 0.4718, 0.7065, 0.3110, 0.4375]],

        [[0.1315, 0.6832, 0.7865, 0.6816, 0.9708],
         [0.3164, 0.8545, 0.7030, 0.1160, 0.

Sometimes you want to make a tensor with a shape of another tensor, but with default values

In [None]:
tensor = torch.rand((1, 2, 2))
# for example, you want to create a mask for the image or something
tensor_mask = torch.zeros_like(tensor)
print('Image:\n{}'.format(tensor))
print('Image_mask:\n{}'.format(tensor_mask))

Image:
tensor([[[0.2386, 0.3602],
         [0.2813, 0.4000]]])
Image_mask:
tensor([[[0., 0.],
         [0., 0.]]])


## PyTorch data to pure python/numpy
There is default approach to conver pytorch tensor in numpy, which work in all cases

In [None]:
tensor = torch.rand((1, 2, 2))
narray = tensor.detach().cpu().numpy()
print(narray)

[[[0.6392166  0.8779399 ]
  [0.14973867 0.11761254]]]


Concepts of .detach(), .cpu() will be explained later

In [None]:
# to the pure list
tensor_list = tensor.detach().cpu().tolist()

# Tensor manipulations

Shape of a tensor can be changed in many ways.

For example, you have a data with N samples (batch_size), where each sample have shape (W, H, C) - default image shape

In [None]:
N = 2
W, H = 224, 224
C = 3

# NOTE: pytorch convention about image shape is channel-first
tensor = torch.rand((N, C, W, H))
print(tensor.shape)

torch.Size([2, 3, 224, 224])


## You want to represent data as a flatten list of pixels



*   .view() [link text](https://pytorch.org/docs/stable/generated/torch.Tensor.view.html?highlight=view#torch.Tensor.view)
*   .reshape() [link text](https://pytorch.org/docs/stable/generated/torch.reshape.html?highlight=reshape#torch.reshape)

NOTE: a lot of methods in pytorch has alias in torch.nn.functional



In [None]:
# you may use .view
print('View\n')
print(tensor.view(N, W * H * C))
print(tensor.view(N, W * H * C).shape)

# or reshape
print('Reshape\n')
print(tensor.reshape((N, W * H * C)))
print(tensor.reshape((N, W * H * C)).shape)

View

tensor([[0.5094, 0.7684, 0.1524,  ..., 0.2103, 0.0770, 0.0563],
        [0.8793, 0.0719, 0.8111,  ..., 0.7288, 0.3203, 0.2942]])
torch.Size([2, 150528])
Reshape

tensor([[0.5094, 0.7684, 0.1524,  ..., 0.2103, 0.0770, 0.0563],
        [0.8793, 0.0719, 0.8111,  ..., 0.7288, 0.3203, 0.2942]])
torch.Size([2, 150528])


.view() method actually provide a changed view on a data and does not change data in a memory

.reshape() *can* change data in a memory and *can* work as a .view()

This behavior follows from the contiguous property: view method work only for contiguos data. Contiguous is the a way to store data in a single block of memory. The example of non-contiguos operation - taking a transpose.

The best explanation is [here](https://stackoverflow.com/questions/26998223/what-is-the-difference-between-contiguous-and-non-contiguous-arrays/26999092#26999092)

## Add new dimension to the data

We add new dimension to the data mostly for a technical reasons.

For example, we have two images as tensors. We want to concatenate them in a single tensor

In [None]:
image_one = torch.rand((3, 224, 224))
image_two = torch.rand((3, 224, 224))
print('Shape: {}'.format(image_one.shape))

# use of [None] slice (alias for .newaxis() in numpy)
print('None slice: {}'.format(image_one[None].shape))
# use torch.cat to concatenate tensors
print('Concat shape: {}'.format(torch.cat([image_one[None], image_two[None]]).shape))

# unsqueeze()
print('Unsqueeze: {}'.format(image_one.unsqueeze(0).shape))

Shape: torch.Size([3, 224, 224])
None slice: torch.Size([1, 3, 224, 224])
Concat shape: torch.Size([2, 3, 224, 224])
Unsqueeze: torch.Size([1, 3, 224, 224])


## Slicing and indexing

PyTorch tensors (except the gradients, devices, batchs) have the same behavior as numpy arrays.

In [None]:
tensor = torch.rand((2, 3, 3))

In [None]:
print('Tensor:\n{}\n'.format(tensor))
print('First sample in a batch:\n{}\n'.format(tensor[0]))
print('First row in the first sample in a batch:\n{}\n'.format(tensor[0][0]))
print('First rows in all samples:\n{}\n'.format(tensor[:, 0]))

Tensor:
tensor([[[0.9688, 0.2760, 0.5376],
         [0.7055, 0.4197, 0.9963],
         [0.7580, 0.8123, 0.4045]],

        [[0.0484, 0.4293, 0.8820],
         [0.9580, 0.0394, 0.4435],
         [0.8137, 0.0960, 0.8236]]])

First sample in a batch:
tensor([[0.9688, 0.2760, 0.5376],
        [0.7055, 0.4197, 0.9963],
        [0.7580, 0.8123, 0.4045]])

First row in the first sample in a batch:
tensor([0.9688, 0.2760, 0.5376])

First rows in all samples:
tensor([[0.9688, 0.2760, 0.5376],
        [0.0484, 0.4293, 0.8820]])



We also can indexing using arrays

In [None]:
array_to_indexing = [0, 2]
print('List\n', tensor[:, array_to_indexing])

array_to_indexing = np.array([0, 2])
print('Numpy array\n', tensor[:, array_to_indexing])

array_to_indexing = torch.tensor([0, 2])
print('Tensor\n', tensor[:, array_to_indexing])

List
 tensor([[[0.9688, 0.2760, 0.5376],
         [0.7580, 0.8123, 0.4045]],

        [[0.0484, 0.4293, 0.8820],
         [0.8137, 0.0960, 0.8236]]])
Numpy array
 tensor([[[0.9688, 0.2760, 0.5376],
         [0.7580, 0.8123, 0.4045]],

        [[0.0484, 0.4293, 0.8820],
         [0.8137, 0.0960, 0.8236]]])
Tensor
 tensor([[[0.9688, 0.2760, 0.5376],
         [0.7580, 0.8123, 0.4045]],

        [[0.0484, 0.4293, 0.8820],
         [0.8137, 0.0960, 0.8236]]])


## Most common operations

In [None]:
tensor = torch.ones((2, 3, 3))
print(tensor)

tensor([[[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]]])


We can clone the tensor. This create a new copy of a tensor on a memory

In [None]:
tensor_copy = tensor.clone()
tensor_copy = tensor_copy * 0
print('Origin: {}\n'.format(tensor))
print('Copy: {}\n'.format(tensor_copy))

Origin: tensor([[[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]]])

Copy: tensor([[[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]]])



Mean, std, median

In [None]:
print('Mean over all tensor: {}'.format(tensor.mean()))
print('Mean over first dimension: {}'.format(tensor.mean(0)))
print('Std over all tensor: {}'.format(tensor.std()))
print('Median over all tensor: {}'.format(tensor.median()))

Mean over all tensor: 1.0
Mean over first dimension: tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
Std over all tensor: 0.0
Median over all tensor: 1.0


# Devices, autograd

Tensors can be stored in general memory, GPU memory, TPU memory.

NOTE: use colab with GPU (Runtime -> Change Runtime Type -> Hardware Accelerating)

In [None]:
tensor = torch.rand((2, 5, 5))

In [None]:
print(tensor.device)

cpu


In [None]:
print(tensor.to('cuda:0').device)

cuda:0


Operations with tensors placed in GPU memory are performed on GPU.

If you want to conver pytorch tensors to numpy you have to move tensor to cpu

In [None]:
try:
    tensor.to('cuda:0').numpy()
except Exception as e:
    print(e)

can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.


## Autograd
Auto differentiation is a core of ML frameworks. PyTorch can figure out the computation of gradients for a set of operations. Almost all pytorch operations are differentiable.

required_grad=True make PyTorch to store gradients for this particular tensor. Usually, for input values this parameters is set to False - we don't want to change our real data.

In [None]:
# tensor = torch.rand((2, 4, 4), requires_grad=True)
tensor = torch.ones((2, 4, 4))
tensor.requires_grad = True

In [None]:
# define function y
y = 5 * tensor ** 3 - 3
print(y)

tensor([[[2., 2., 2., 2.],
         [2., 2., 2., 2.],
         [2., 2., 2., 2.],
         [2., 2., 2., 2.]],

        [[2., 2., 2., 2.],
         [2., 2., 2., 2.],
         [2., 2., 2., 2.],
         [2., 2., 2., 2.]]], grad_fn=<SubBackward0>)


In [None]:
print(tensor.grad)

None


There is no gradient for our tensor, because we have to call .backward() method of variable *y*. This method will calculate gradien of *y* over variable *tensor*

NOTE: gradient can be calculated only for a scalar. The output of *y* is a tensor, we can calculate mean, sum, etc

In [None]:
y.sum().backward()
print(tensor.grad)

tensor([[[15., 15., 15., 15.],
         [15., 15., 15., 15.],
         [15., 15., 15., 15.],
         [15., 15., 15., 15.]],

        [[15., 15., 15., 15.],
         [15., 15., 15., 15.],
         [15., 15., 15., 15.],
         [15., 15., 15., 15.]]])


NOTE: you can not run .backward() again without calculating y value again

In [None]:
try:
    y.sum().backward()
except Exception as e:
    print(e)

Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.


If you run *y* function one more time the gradient values for *tensor* variable will be summed.

In [None]:
y = 5 * tensor ** 3 - 3
y.sum().backward()
print(tensor.grad)

tensor([[[30., 30., 30., 30.],
         [30., 30., 30., 30.],
         [30., 30., 30., 30.],
         [30., 30., 30., 30.]],

        [[30., 30., 30., 30.],
         [30., 30., 30., 30.],
         [30., 30., 30., 30.],
         [30., 30., 30., 30.]]])


# Neural network in PyTorch

NN in PyTorch defines as a set of different layers. Each layer is a specific function:


*   Linear layer, convolutional layer, etc
*   Activation function
*   Tensors operations

The first type has parameters called weights and biases.The process of NN training is to change weights of NN layers so the prediction of network will match the real object.



## Linear layer

[torch.nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear) takes a matrix (N, \*, H_in) and produce a matrix (N, \*, H_out).

\* means any number of additional dimensions, H_in - input features, H_out - output features

Linear layer is a Wx+b operation, where W - weights of the layer and b - bias. Bias can be ommited with bias=False

In [None]:
input_tensor = torch.ones((3, 5))
print('input shape: ', input_tensor.shape)

input shape:  torch.Size([3, 5])


We fed sample of data with batch_size=3 and features of each sample=5. All samples in a batch processed separately. This is true for all layers and functions of PyTorch.

In [None]:
layer = torch.nn.Linear(in_features=5, out_features=100, bias=False, device='cuda:0')
# you can print layer info
layer

Linear(in_features=5, out_features=100, bias=False)

Let's take a look at layers parameters.
layer.paramaters() outputs a generator of all weights and biases of this object.

In [None]:
print(type(layer.parameters()))
# iterate over layer parameters and print shapes
for i in layer.parameters():
    print(i.shape)

<class 'generator'>
torch.Size([100, 5])


Single layer contains only one weight matrix of shape (H_out, H_in). You can access the weights of a layer directly

In [None]:
print(layer.weight.shape)

torch.Size([100, 5])


You can move layer from CPU to GPU in the same way as tensor

In [None]:
layer.to('cpu') # or layer.cpu() for simplicity

Linear(in_features=5, out_features=100, bias=False)

In [None]:
layer.to('cuda:0')

Linear(in_features=5, out_features=100, bias=False)

NOTE: input tensor and layers have to be on the same device

In [None]:
try:
    output = layer(input_tensor)
except Exception as e:
    print(e)

Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument mat2 in method wrapper_mm)


In [None]:
output = layer(input_tensor.to('cuda:0'))
print(output)

tensor([[ 0.2249,  0.1494,  0.0546, -0.5408,  0.4931, -1.0947, -0.0707, -0.3990,
         -1.1058, -0.4487, -0.0599,  0.9939, -0.4911,  0.2746,  0.6051, -0.1861,
         -0.7499,  0.0362, -0.2726, -0.6997, -0.8335,  0.1780,  0.5181, -0.0251,
          0.2601, -0.3752, -0.1893,  0.4160, -0.6927, -0.1374,  0.0378, -1.2172,
          0.1541,  0.0525,  0.2003, -0.6088,  0.5833,  0.9191, -0.6121,  0.0120,
         -0.7556, -0.4457, -0.5579, -1.0691, -0.9166, -0.2394, -0.4141,  0.3409,
         -0.1608,  0.4399, -0.5244,  0.8682, -0.1355, -0.3215, -0.0653,  0.7020,
          0.6238,  0.8382, -0.0121,  0.6483, -0.0820, -0.3357, -0.6359,  0.7003,
         -0.0931, -0.1530, -0.7365, -1.3038, -0.4835,  0.6567, -0.0836, -1.0226,
          0.1445,  0.7913, -0.4086,  0.2391, -0.3196, -0.3540, -0.0102,  0.4607,
          0.4225, -0.2746,  0.4953, -0.4186,  0.1484, -1.2640, -0.5679,  0.9650,
         -1.3487,  0.3041, -0.0664,  0.2455,  0.1330, -0.3949, -0.0134, -0.0120,
          0.2873, -1.0092,  

## Convolution layer
Apply convolution over an input (N, C_in, H_in, W_in), the results is (N, C_out, H_out, W_out).

PyTorch use channel-first notation in case of conv layers

In [None]:
input_tensor = torch.rand((5, 3, 224, 224))
layer_conv = torch.nn.Conv2d(in_channels=3, out_channels=100, kernel_size=(3, 3), stride=1, padding=(1, 1))

In [None]:
output = layer_conv(input_tensor)
print(output.shape)

torch.Size([5, 100, 224, 224])


## Recurrent layer
Recurrent neural network is used for sequence processing. Input shape is (L, N, H_in) or (N, L, H_in) if batch_first=True. Our recommendation to use second notations, which is more intuitive.

In [None]:
# you can read this as 5 sample of sequential data, each sample have 3 elements and \
# each element represented with 224 features
input_tensor = torch.rand((5, 3, 224))
layer_rnn = torch.nn.RNN(input_size=224, hidden_size=512, num_layers=2, batch_first=True)

In [None]:
output, hidden_states = layer_rnn(input_tensor)
print(output.shape, hidden_states.shape)

torch.Size([5, 3, 512]) torch.Size([2, 5, 512])


## Neural network

You can write a NN as a set of layers and then apply them sequentially

In [None]:
input_tensor = torch.rand((1, 3, 224, 224))

layer1 = torch.nn.ConvTranspose1d(3, 16, (3, 3), 1)
layer2 = torch.nn.ConvTranspose1d(16, 64, (3, 3), 1)

output = layer2(layer1(input_tensor))

In [None]:
print(output.shape)

torch.Size([1, 64, 228, 228])


## Let's put them together

In [None]:
class MyNN(torch.nn.Module):
    def __init__(self):
        super(MyNN, self).__init__()

        self.layer1 = torch.nn.ConvTranspose1d(3, 16, (3, 3), 1)
        self.layer2 = torch.nn.ConvTranspose1d(16, 64, (3, 3), 1)
        self.layer3 = torch.nn.Linear(3326976, 5)

        self.activation = torch.nn.ReLU6()
        self.activation_class = torch.nn.Softmax(1)

    def forward(self, input):
        y = self.layer1(input)
        y = self.layer2(y)
        y = self.activation(y)

        y = torch.reshape(y, (y.shape[0], y.shape[1] * y.shape[2] * y.shape[3]))

        y = self.layer3(y)
        y = self.activation_class(y)
        return y

In [None]:
mynn = MyNN()
mynn

MyNN(
  (layer1): ConvTranspose1d(3, 16, kernel_size=(3, 3), stride=(1,))
  (layer2): ConvTranspose1d(16, 64, kernel_size=(3, 3), stride=(1,))
  (layer3): Linear(in_features=3326976, out_features=5, bias=True)
  (activation): ReLU6()
  (activation_class): Softmax(dim=1)
)

In [None]:
predict = mynn(input_tensor)
print(predict)

tensor([[0.1998, 0.1975, 0.2110, 0.1965, 0.1952]], grad_fn=<SoftmaxBackward>)


In [None]:
real_classes = torch.tensor([2], dtype=torch.long)

print(real_classes)

tensor([2])


## Loss

In [None]:
loss_function = torch.nn.CrossEntropyLoss()

In [None]:
loss_value = loss_function(predict, real_classes)

In [None]:
loss_value

tensor(1.5984, grad_fn=<NllLossBackward>)

In [None]:
class MyNN(torch.nn.Module):
    def __init__(self):
        super(MyNN, self).__init__()

        self.layer1 = torch.nn.Sequential(
            *[torch.nn.Linear(50, 50) for i in range(100)]
        )
        self.layer2 = torch.nn.Linear(50, 2)

    def forward(self, input):
        y = self.layer1(input)
        y = self.layer2(y)
        return y

In [None]:
mynn = MyNN()
mynn

In [None]:
pred = mynn(torch.rand((1, 50)))

In [None]:
loss = loss_function(pred, torch.tensor([1]))

In [None]:
loss.backward()

In [None]:
mynn.layer1[-1].weight.grad.max(), mynn.layer1[-1].weight.grad.min()

(tensor(0.0271), tensor(-0.0244))

In [None]:
mynn.layer1[0].weight.grad.max(), mynn.layer1[0].weight.grad.min()

(tensor(9.4923e-26), tensor(-7.6500e-26))

In [None]:
mynn.layer1[50].weight.grad.max(), mynn.layer1[50].weight.grad.min()

(tensor(1.7274e-14), tensor(-1.2149e-14))