Introduction to PyTorch
=============

To familiarize ourselves with PyTorch and verify that PyTorch is installed and functioning correctly with CUDA support on Jetson TX1/TX2, first let's run some simple Python commands that load PyTorch and perform some basic operations on tensors.  Then we'll create and run a simple neural network.

Loading PyTorch
-----------------
In order to load PyTorch, first we `import torch` from a Python terminal:

In [None]:
import torch

If there's no output from the above command, the PyTorch module was installed and able to be loaded successfully.  
If the module is reported to be missing or there are otherwise errors printed, there was a problem building/installing PyTorch.

Next let's print out and confirm the version of PyTorch that was installed (the output should read `0.3.0b0+af3964a`)

In [None]:
print(torch.__version__)

Verifying CUDA
-----------------
Confirm that CUDA + cuDNN support has been installed and is available using the PyTorch `torch.cuda.is_available()` function:

In [None]:
print('CUDA available: ' + str(torch.cuda.is_available()))

Next, allocate some simple tensors and confirm that CUDA arithmatic operations are working:

In [None]:
a = torch.cuda.FloatTensor(2).zero_()
print('Tensor a = ' + str(a))

b = torch.randn(2).cuda()
print('Tensor b = ' + str(b))

c = a + b
print('Tensor c = ' + str(c))

A tensor is an N-dimensional vector or matrix containing elements of a shared data type (like `float` or `half`)  
The `cuda` qualifiers above mean that these tensors are allocated on the GPU, as opposed to CPU.

For the future, when we're using CUDA, set the default PyTorch tensor type to CUDA tensors so they are automatically allocated on GPU:

In [None]:
use_cuda = torch.cuda.is_available()

FloatTensor = torch.cuda.FloatTensor if use_cuda else torch.FloatTensor
LongTensor = torch.cuda.LongTensor if use_cuda else torch.LongTensor
ByteTensor = torch.cuda.ByteTensor if use_cuda else torch.ByteTensor
Tensor = FloatTensor

if use_cuda:
    torch.set_default_tensor_type('torch.cuda.FloatTensor')
    
# verify that the default type is CUDA tensor
d = FloatTensor([[1, 2, 3], [4, 5, 6]])
print('Tensor d = ' + str(d))

Note that now the `torch.cuda.FloatTensor` type is created without explicitly stating it, even though it was created as generic `FloatTensor`.

Next let's create a simple neural network in PyTorch to demonstrate the `torch.nn` package and to test training & inference.

Neural Network Example
-----------------

PyTorch employs a tape-based automatic differention (autograd) system, meaning that it 'records' your dynamic pipeline for backpropagation later on (training) provided a loss function.  The primary package that implements neural networks and associated layer kernels is `torch.nn`.  Let's start by importing `torch.nn` along with PyTorch's `Variable` object, which is used to wrap tensors so PyTorch can record the actions performed on them during the pipeline so they can be recalled during the backwards training pass.

In [None]:
import torch.nn as nn
from torch.autograd import Variable

Next, let's allocate some tensor variables to hold the network input and outputs.  
Note that normally a stored dataset or sensor data would be loaded here, however for simplicity, we are just setting them to be random.

In [None]:
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, dim_in, H, dim_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and expected outputs, and wrap them in Variables.
x = Variable(torch.randn(N, dim_in))
y = Variable(torch.randn(N, dim_out), requires_grad=False)

print('x = ' + str(x))

Note that the tensor type is still `cuda.FloatTensor` (on the GPU).  

Now for the fun part, we create our own simple neural network.  This is just using 1D fully-connected layers (`nn.Linear`).  
Later on for the DQN, we will utilize the more advanced `nn.Conv2D` layers that can process images (deep convolutional networks).

In [None]:
# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Variables for its weight and bias.
model = nn.Sequential(
    nn.Linear(dim_in, H),
    nn.ReLU(),
    nn.Linear(H, dim_out),
)

# Migrate the model to use CUDA
if use_cuda:
    model.cuda()
    
print(model)

To get the neural network model running on the GPU, you just have to call `model.cuda()` on it.  
Between that and having the tensors allocated with CUDA as above, that's all we need to do to have the example running on GPU.

Next, we define our loss function, which compares the output of the network to the expected output, and informs the backpropagation system of the gradient updates to apply during the training iterations.

In [None]:
# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = nn.MSELoss(size_average=False)

Finally comes the training loop.  This is where we have the current network compute it's prediction of the desired output `y` given the input `x` (the foward pass), then compute the loss versus the expected output before performing backpropagation and gradient descent to decrease the loss and improve the accuracy of the network.  What you should see from the console output, is the loss gradually decreasing over time, meaning our network is learning properly.

In [None]:
learning_rate = 1e-4

for t in range(500):
    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Variable of input data to the Module and it produces
    # a Variable of output data.
    y_pred = model(x)

    # Compute and print loss. We pass Variables containing the predicted and true
    # values of y, and the loss function returns a Variable containing the
    # loss.
    loss = loss_fn(y_pred, y)
    print('iteration = %03i, loss = ' % t + str(loss.data[0]))

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Variables with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Variable, so
    # we can access its data and gradients like we did before.
    for param in model.parameters():
        param.data -= learning_rate * param.grad.data

You can try adjusting the `learning_rate` and number of training passes (set to `500` by default for illustrative purposes) to see how it impacts the training.