In [17]:
import matplotlib.pyplot as plt
%matplotlib inline

In [18]:
import torch
import numpy as np
torch.__version__

'0.4.1'

## 1. Tensor basics

PyTorch is a library for training deep neural networks, and much of it is based on the `Tensor`, an array type that is similar to NumPy arrays.

Under the hood, PyTorch runs on compiled C, and if available, CUDA and cuDNN code.

In [19]:
tensor = torch.Tensor([0, 1, 2, 3])

In [20]:
print(tensor.shape)
print(tensor.dtype)

torch.Size([4])
torch.float32


In [21]:
tensor.sum()

tensor(6.)

It's easy to convert between NumPy arrays PyTorch Tensors.

In [22]:
tensor.numpy() ## does not copy the information, will use the same memory as arrays do

array([0., 1., 2., 3.], dtype=float32)

In [24]:
# A better alternative to torch.Tensor(arr)
torch.from_numpy(np.arange(5))

tensor([0, 1, 2, 3, 4])

## 2. Using CUDA

**The following code only works if you have PyTorch set up for a GPU.**

In [8]:
torch.device("cuda")

device(type='cuda')

In [9]:
torch.cuda.device_count()

0

In [10]:
torch.cuda.is_available()

False

In [11]:
torch.has_cudnn

False

In [12]:
assert torch.cuda.is_available and torch.has_cudnn

AssertionError: 

In [17]:
x = torch.Tensor(range(5))
y = torch.Tensor(np.ones(5))

It's similarly easy to move Tensors onto a GPU.

In [32]:
x.cuda()

RuntimeError: Cannot initialize CUDA without ATen_cuda library. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason.  The CUDA library MUST be loaded, EVEN IF you don't directly use any symbols from the CUDA library! One common culprit is a lack of -Wl,--no-as-needed in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don't depend on any of their symbols.  You can check if this has occurred by using ldd on your binary to see if there is a dependency on *_cuda.so library.

In [13]:
z = x.cuda() + y.cuda()
print(z)

NameError: name 'x' is not defined

In [14]:
z.cpu()

NameError: name 'z' is not defined

In [15]:
if torch.cuda.is_available and torch.has_cudnn:
    device = torch.device('cuda')
else:
    device = torch.device("cpu")

In [16]:
z = x.to(device) + y.to(device)
print(z)

NameError: name 'x' is not defined

## 3. Exercises

1) Initialize random tensors A, B, C of size [2,3], [2,3], [3,3,2].

In [37]:
A = torch.rand(2,3)
B = torch.rand(2,3)
C = torch.rand(3,3,2)

2) Fill tensor A with all 10s

In [40]:
A = A.fill_(10) ## underscores are generally in-place

In [41]:
A

tensor([[10., 10., 10.],
        [10., 10., 10.]])

3) Fill tensor B with elements sampled from the normal distribution

In [45]:
B.fill_(torch.normal(0,1))

TypeError: normal() received an invalid combination of arguments - got (int, int), but expected one of:
 * (Tensor mean, Tensor std, torch.Generator generator, Tensor out)
 * (Tensor mean, float std, torch.Generator generator, Tensor out)
 * (float mean, Tensor std, torch.Generator generator, Tensor out)


4) Point-wise multiply A with B, and put the result into tensor B

5) Print the mean and standard deviation of the elements of B

6) Fill tensor C with elements samples from the uniform distribution U(-1,1). Print the dimensions of C.

7) Transpose the second and third dimension of tensor C, and put the result into tensor C itself (in-place). Print the dimensions of C.

8) Show the contiguity property of the tensors

In [None]:
### Tensor.is_contiguous()

9) Print the second column of the third dimension of tensor C (note zero-indexed)

10) Perform operation A+B+C (note the broadcasting)

## 4. Autograd

Autograd is a submodule in PyTorch that handles automatic differentiations and gradient computation. This allows you to simply a define model once, in a forward fashion, and the library handles the computation of all gradients in the computational graph.

Here, we create 2 Tensors, but we want PyTorch to compute gradients with respect to $x$. By default, for arbitrary computations in PyTorch, no gradiens are computed (e.g for $y$).

In [None]:
x = torch.randn(5, requires_grad=True)
y = torch.arange(5.)

In [None]:
print(x)
print(x.grad)

In [None]:
print(y)
print(y.grad)

We defined $z = x \cdot y$. Then

$$\frac{dz}{dx} = y$$

Note `z.grad_fn`, which shows $z$ was computed, capturing its dependencies in the computation graph.

In [None]:
z = (x * y).sum()
print(z)
print(z.grad)
print(z.grad_fn)

At this point, no gradients are computed yet. It is only when we call `z.backward()` that PyTorch computes the gradients, and backpropagates them to any node in the graph that required gradients (e.g. $x$).

In [None]:
z.backward()

As we can see, $x$ now has gradients associated with it, but $y$ does not.

In [None]:
print(x.grad)
print(y.grad)
print(z.grad)

With just this, we can compute a very rudimentary form of gradient descent!

In [None]:
# A very silly case of gradient descent:
learning_rate = 0.01
x = torch.tensor([1000.], requires_grad=True)
x_values = []
for i in range(1000):
    
    # Our loss function is: We want x**2 to be small
    loss = x ** 2
    loss.backward()
    
    # Have to do something a little convoluted here to subtract the 
    #   gradient -- don't worry, we'll never do this again
    x.data.sub_(x.grad.data * learning_rate) ## normally this is addition (Jason explained)
                                             ## , check from other places 
    
    # Remember to zero-out the gradient! 
    # PyTorch doesn't do it automatically.
    x.grad.data.set_(torch.Tensor([0]))
    x_values.append(x.item())

In [None]:
plt.plot(x_values)

Lastly, sometimes you want to run things *without* computing gradients:

In [None]:
x = torch.tensor([1000.], requires_grad=True)

# With gradient computation:
loss = x ** 2
print(loss.requires_grad)


# Without gradient computation:
with torch.no_grad():
    loss = x ** 2
print(loss.requires_grad)

In [None]:
## .detach 

**Highly Recommend**: https://pytorch.org/docs/stable/autograd.html