![avatar](pytorch.png)

Official website: [https://pytorch.org](https://pytorch.org)

PyTorch is the mostly used deep learning library now, especially for research usage. In this tutorial, some basic knowledge and usage of PyTorch will be introduced.

Here we will start from `tensors`. I think you all are familiar with `arrays` now as Mr. Yuehao has given you a tutorial on `NumPy`. 

`Tensors` are a specialized data structure that are very similar to `arrays`. In PyTorch, we use `tensors` to encode the inputs and outputs of a model, as well as the model’s parameters.

Actually, `tensors` are similar to `NumPy`’s `ndarrays`, except that `tensors` can run on GPUs or other hardware accelerators. Besides, `tensors` are also optimized for automatic differentiation.

Install and import `PyTorch` and `NumPy` library at first.

In [1]:
# pip install torch numpy
import torch
import numpy as np

### Initializing a Tensor

In [2]:
# Directly from data
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)

print(type(data), type(x_data))

print(x_data)

<class 'list'> <class 'torch.Tensor'>
tensor([[1, 2],
        [3, 4]])


In [3]:
# From a NumPy array
np_array = np.array(data)
x_np = torch.from_numpy(np_array)

print(type(np_array))
print(type(x_np))

print(x_np)

<class 'numpy.ndarray'>
<class 'torch.Tensor'>
tensor([[1, 2],
        [3, 4]])


In [4]:
# From another tensor
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")

Ones Tensor: 
 tensor([[1, 1],
        [1, 1]]) 

Random Tensor: 
 tensor([[0.2208, 0.8925],
        [0.9794, 0.4681]]) 



In [5]:
# With random or constant values
shape = (2, 3)
rand_tensor = torch.rand(shape) # [0, 1)
ones_tensor = torch.ones(shape) 
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

Random Tensor: 
 tensor([[0.7628, 0.1688, 0.9701],
        [0.3126, 0.7747, 0.2370]]) 

Ones Tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 

Zeros Tensor: 
 tensor([[0., 0., 0.],
        [0., 0., 0.]])


### Attributes of a Tensor

In [6]:
tensor = torch.rand(3,4)
array1 = np.random.rand(3, 4)
print(array1.shape)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

(3, 4)
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


### Operations on Tensors

In [7]:
# We move our tensor to the GPU if available
if torch.cuda.is_available():
    tensor = tensor.to('cuda')
    
tensor.cpu()
tensor.to('cpu')

# device_ids = [0, 1, 2, 3]
# tensor.to('cuda:1')


tensor([[0.6757, 0.9401, 0.0712, 0.3726],
        [0.4599, 0.9171, 0.8886, 0.9471],
        [0.8193, 0.0179, 0.4294, 0.9568]])

In [8]:
# Standard numpy-like indexing and slicing:
tensor = torch.rand(3, 3)
print(tensor)
print('First row: ', tensor[0])
print('First column: ', tensor[:, 0])
print('Last column:', tensor[..., -1])
tensor[:, 1] = 0

print(tensor)

tensor[0, :] = 1

print(tensor)

tensor([[0.4974, 0.1488, 0.3186],
        [0.5636, 0.3844, 0.7200],
        [0.4794, 0.2082, 0.3376]])
First row:  tensor([0.4974, 0.1488, 0.3186])
First column:  tensor([0.4974, 0.5636, 0.4794])
Last column: tensor([0.3186, 0.7200, 0.3376])
tensor([[0.4974, 0.0000, 0.3186],
        [0.5636, 0.0000, 0.7200],
        [0.4794, 0.0000, 0.3376]])
tensor([[1.0000, 1.0000, 1.0000],
        [0.5636, 0.0000, 0.7200],
        [0.4794, 0.0000, 0.3376]])


In [9]:
# Concatenate tensors
# np.concatenate(axis=1)
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)

tensor([[1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
        [0.5636, 0.0000, 0.7200, 0.5636, 0.0000, 0.7200, 0.5636, 0.0000, 0.7200],
        [0.4794, 0.0000, 0.3376, 0.4794, 0.0000, 0.3376, 0.4794, 0.0000, 0.3376]])


In [10]:
# This computes the matrix multiplication between two tensors. y1, y2 will have the same value
y1 = tensor @ tensor.T
y2 = torch.matmul(tensor, tensor.T)
print(torch.equal(y1, y2)) # judge if y1 = y2


# This computes the element-wise product. z1, z2 will have the same value
z1 = tensor * tensor
z2 = torch.mul(tensor, tensor)
print(torch.equal(z1, z2)) # judge if y1 = y2

True
True


In [11]:
# Single-element tensors 

agg = tensor.sum()
print(agg)
agg_item = agg.item()
print(agg_item, type(agg_item))

tensor(5.1005)
5.100515842437744 <class 'float'>


### Bridge with NumPy

In [12]:
# Tensor to NumPy array

t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")

t: tensor([1., 1., 1., 1., 1.])
n: [1. 1. 1. 1. 1.]


In [13]:
# NumPy array to Tensor

n = np.ones(5)
t = torch.from_numpy(n)

print(n)
print(t)

[1. 1. 1. 1. 1.]
tensor([1., 1., 1., 1., 1.], dtype=torch.float64)


### Automatic Differentiation

When training neural networks, the most frequently used algorithm is **back propagation**. In this algorithm, parameters (model weights) are adjusted according to the **gradient** of the loss function with respect to the given parameter.

To compute those gradients, PyTorch has a built-in differentiation engine called `torch.autograd`. It supports automatic computation of gradient for any computational graph.

Consider the simplest one-layer neural network, with input `x`, parameters `w` and `b`, and some loss function. It can be defined in PyTorch in the following manner:

In [14]:
x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

This code defines the following **computational graph**:

![avatar](comp-graph.png)

In this network, `w` and `b` are parameters, which we need to optimize. Thus, we need to be able to compute the gradients of loss function with respect to those variables. In order to do that, we set the `requires_grad` property of those tensors.

To optimize weights of parameters in the neural network, we need to
compute the derivatives of our loss function with respect to parameters,
namely, we need $\frac{\partial loss}{\partial w}$ and
$\frac{\partial loss}{\partial b}$ under some fixed values of
``x`` and ``y``. To compute those derivatives, we call
``loss.backward()``, and then retrieve the values from ``w.grad`` and
``b.grad``:

In [15]:
loss.backward()
print(w.grad)
print(w.grad.shape)
print(b.grad)

tensor([[0.3003, 0.1218, 0.1951],
        [0.3003, 0.1218, 0.1951],
        [0.3003, 0.1218, 0.1951],
        [0.3003, 0.1218, 0.1951],
        [0.3003, 0.1218, 0.1951]])
torch.Size([5, 3])
tensor([0.3003, 0.1218, 0.1951])


By default, all tensors with ``requires_grad=True`` are tracking their
computational history and support gradient computation. However, there
are some cases when we do not need to do that, for example, when we have
trained the model and just want to apply it to some input data, i.e. we
only want to do *forward* computations through the network. We can stop
tracking computations by surrounding our computation code with
``torch.no_grad()`` block:

In [16]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

True
False


Another way to achieve the same result is to use the ``detach()`` method
on the tensor:

In [17]:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False


There are reasons you might want to disable gradient tracking:
  - To mark some parameters in your neural network as **frozen parameters**. This is
    a very common scenario for [finetuning a pretrained network](https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html)
  - To **speed up computations** when you are only doing forward pass, because computations on tensors that do
    not track gradients would be more efficient.