# PyTorch Basics: Tensors & Gradients

In [1]:
import torch

## Tensors
At its core, PyTorch is a library for processing tensors. A tensor is a number, vector, matrix or any n-dimensional array. Let's create a tensor with a single number:

In [2]:
t1 = torch.tensor(4.)
t1

tensor(4.)

<code>4.</code> is a shorthand for <code>4.0</code>. It is used to indicate to Python (and PyTorch) that you want to create a floating point number. We can verify this by checking the <code>dtype</code> attribute of our tensor:



In [3]:
t1.dtype

torch.float32

Let's try creating slightly more complex tensors:



In [4]:
# Vector
t2 = torch.tensor([1., 2, 3, 4])
t2

tensor([1., 2., 3., 4.])

In [5]:
# Matrix
t3 = torch.tensor([[5., 6], 
                   [7, 8], 
                   [9, 10]])
t3

tensor([[ 5.,  6.],
        [ 7.,  8.],
        [ 9., 10.]])

In [6]:
# 3-dimensional array
t4 = torch.tensor([
    [[11, 12, 13], 
     [13, 14, 15]], 
    [[15, 16, 17], 
     [17, 18, 19.]]])
t4

tensor([[[11., 12., 13.],
         [13., 14., 15.]],

        [[15., 16., 17.],
         [17., 18., 19.]]])

Tensors can have any number of dimensions, and different lengths along each dimension. We can inspect the length along each dimension using the <code>.shape</code> property of a tensor.

In [7]:
print(t1)
t1.shape

tensor(4.)


torch.Size([])

In [8]:
print(t2)
t2.shape

tensor([1., 2., 3., 4.])


torch.Size([4])

In [9]:
print(t3)
t3.shape

tensor([[ 5.,  6.],
        [ 7.,  8.],
        [ 9., 10.]])


torch.Size([3, 2])

In [10]:
print(t4)
t4.shape

tensor([[[11., 12., 13.],
         [13., 14., 15.]],

        [[15., 16., 17.],
         [17., 18., 19.]]])


torch.Size([2, 2, 3])

## Tensor operations and gradients
We can combine tensors with the usual arithmetic operations. Let's look an example:

In [11]:
# Create tensors.
x = torch.tensor(3.)
w = torch.tensor(4., requires_grad=True)
b = torch.tensor(5., requires_grad=True)
x, w, b

(tensor(3.), tensor(4., requires_grad=True), tensor(5., requires_grad=True))

We've created 3 tensors <code>x</code>, <code>w</code> and <code>b</code>, all numbers. <code>w</code> and <code>b</code> have an additional parameter <code>requires_grad</code> set to <code>True</code>. We'll see what it does in just a moment.

Let's create a new tensor <code>y</code> by combining these tensors:

In [12]:
y = w * x + b
y

tensor(17., grad_fn=<AddBackward0>)

As expected, <code>y</code> is a tensor with the value <code>3 * 4 + 5 = 17</code>. What makes PyTorch special is that we can automatically compute the derivative of <code>y</code> w.r.t. the tensors that have <code>requires_grad</code> set to <code>True</code> i.e. <code>w</code> and <code>b</code>. To compute the derivatives, we can call the <code>.backward</code> method on our result <code>y</code>.

In [13]:
# Compute derivatives
y.backward()

The derivates of <code>y</code> w.r.t the input tensors are stored in the <code>.grad</code> property of the respective tensors.

In [14]:
# Display gradients
print('dy/dx:', x.grad)
print('dy/dw:', w.grad)
print('dy/db:', b.grad)

dy/dx: None
dy/dw: tensor(3.)
dy/db: tensor(1.)


As expected, <code>dy/dw</code> has the same value as <code>x</code> i.e. <code>3</code>, and <code>dy/db</code> has the value <code>1</code>. Note that <code>x.grad</code> is <code>None</code>, because <code>x</code> doesn't have <code>requires_grad</code> set to <code>True</code>.

The "grad" in <code>w.grad</code> stands for gradient, which is another term for derivative, used mainly when dealing with matrices.


## Interoperability with Numpy
<code>Numpy</code> is a popular open source library used for mathematical and scientific computing in Python. It enables efficient operations on large multi-dimensional arrays, and has a large ecosystem of supporting libraries:

* <code>Matplotlib</code> for plotting and visualization
* <code>OpenCV</code> for image and video processing
* <code>Pandas</code> for file I/O and data analysis

Instead of reinventing the wheel, PyTorch interoperates really well with Numpy to leverage its existing ecosystem of tools and libraries. Here's how we create an array in Numpy:


In [15]:
import numpy as np
x = np.array([[1, 2], [3, 4.]])
x

array([[1., 2.],
       [3., 4.]])

We can convert a Numpy array to a PyTorch tensor using <code>torch.from_numpy</code>.

In [16]:
# Convert the numpy array to a torch tensor.
y = torch.from_numpy(x)
y

tensor([[1., 2.],
        [3., 4.]], dtype=torch.float64)

<code>torch.from_numpy()</code> uses the same space and the memory and doesn't creates a copy whereas <code>torch.tensor()</code> does create a copy.

Let's verify that the numpy array and torch tensor have similar data types.

In [17]:
x.dtype, y.dtype

(dtype('float64'), torch.float64)

We can convert a PyTorch tensor to a Numpy array using the <code>.numpy</code> method of a tensor.

In [18]:
# Convert a torch tensor to a numpy array
z = y.numpy()
z

array([[1., 2.],
       [3., 4.]])

The interoperability between PyTorch and Numpy is really important because most datasets you'll work with will likely be read and preprocessed as Numpy arrays.

## Further Reading
Tensors in PyTorch support a variety of operations, and what we've covered here is by no means exhaustive. You can learn more about tensors and tensor operations here: https://pytorch.org/docs/stable/tensors.html

Here are some things to try out:

1. What if one or more <code>x</code>, <code>w</code> or <code>b</code> were matrices, instead of numbers, in the above example? What would the result <code>y</code> and the gradients <code>w.grad</code> and <code>b.grad</code> look like in this case?

2. What if <code>y</code> was a matrix created using <code>torch.tensor</code>, with each element of the matrix expressed as a combination of numeric tensors <code>x</code>, <code>w</code> and <code>b</code>?

What if we had a chain of operations instead of just one i.e. <code>y = x * w + b</code>, <code>z = l * y + m</code>, <code>w = c * z + d</code> and so on? What would calling <code>w.grad do?

To learn more about matrix derivates: https://en.wikipedia.org/wiki/Matrix_calculus#Derivatives_with_matrices


## Taken from
1. PyTorch Tutorial for Deep Learning Researchers by Yunjey Choi.
2. FastAI development notebooks by Jeremy Howard.
