## Getting Started with PyTorch on Cloud TPUs

This notebook will show you how to:

* Install PyTorch/XLA on Colab, which lets you use PyTorch with TPUs.
* Run basic PyTorch functions on TPUs, like creating and adding tensors.
* Run PyTorch modules and autograd on TPUs.
* Run PyTorch networks on TPUs.

PyTorch/XLA is a package that lets PyTorch connect to Cloud TPUs and use TPU cores as devices. Colab provides a free Cloud TPU system (a remote CPU host + four TPU chips with two cores each) and installing PyTorch/XLA only takes a couple minutes. 

Even though Colab offers eight TPU cores, this notebook only uses one for simplicity. More information about running PyTorch on TPUs can be found on [PyTorch.org](http://pytorch.org/xla/), including how to run PyTorch networks on multiple TPU cores simultaneously. Other Colab notebooks also show how to use multiple TPU cores, including [this one](https://colab.research.google.com/github/pytorch/xla/blob/master/contrib/colab/mnist-training.ipynb#scrollTo=Afwo4H7kSd8P) which trains a network on the MNIST dataset and [this one](https://colab.research.google.com/github/pytorch/xla/blob/master/contrib/colab/resnet18-training.ipynb#scrollTo=_2nL4HmloEyl) which trains a ResNet18 architecture on CIFAR10. 

These and other Colab notebooks, as well as Google Cloud Platform (GCP) tutorials, can be found [here](https://github.com/pytorch/xla/tree/master/contrib/colab). Check out our [NeurIPS 2019 Fast Neural Style Transfer demo](https://colab.research.google.com/github/pytorch/xla/blob/master/contrib/colab/style_transfer_inference.ipynb#scrollTo=EozMXwIV9iOJ), where you can apply different styles (filters) to your own images!

To use PyTorch on Cloud TPUs in your own Colab notebook you can copy this one, or copy the setup cell below and configure your Colab environment to use TPUs. 

Finally, this notebook is intended for people already familiar with PyTorch, a popular open-source deep learning framework. If you haven't used PyTorch before you might want to review the tutorials at https://pytorch.org/ before continuing.


<h3>  &nbsp;&nbsp;Use Colab Cloud TPU&nbsp;&nbsp; <a href="https://cloud.google.com/tpu/"><img valign="middle" src="https://raw.githubusercontent.com/GoogleCloudPlatform/tensorflow-without-a-phd/master/tensorflow-rl-pong/images/tpu-hexagon.png" width="50"></a></h3>

* On the main menu, click Runtime and select **Change runtime type**. Set "TPU" as the hardware accelerator.
* The cell below makes sure you have access to a TPU on Colab.


In [1]:
import os
assert os.environ['COLAB_TPU_ADDR'], 'Make sure to select TPU from Edit > Notebook settings > Hardware accelerator'

## Installing PyTorch/XLA

Run the following cell (or copy it into your own notebook!) to install PyTorch, Torchvision, and PyTorch/XLA. It will take a couple minutes to run.

The PyTorch/XLA package lets PyTorch connect to Cloud TPUs. (It's named PyTorch/XLA, not PyTorch/TPU, because XLA is the name of the TPU compiler.) In particular, PyTorch/XLA makes TPU cores available as PyTorch devices. This lets PyTorch create and manipulate tensors on TPUs.

In [None]:
!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.8.1-cp37-cp37m-linux_x86_64.whl

### Only run the below commented cell if you would like a nightly release

In [None]:
# VERSION = "20200325"  #@param ["1.5" , "20200325", "nightly"]
# !curl https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
# !python pytorch-xla-env-setup.py --version $VERSION

## Creating and Manipulating Tensors on TPUs

PyTorch uses Cloud TPUs just like it uses CPU or CUDA devices, as the next few cells will show. Each core of a Cloud TPU is treated as a different PyTorch  device.




In [3]:
# imports pytorch
import torch

# imports the torch_xla package
import torch_xla
import torch_xla.core.xla_model as xm



As mentioned above, the PyTorch/XLA package (torch_xla) lets PyTorch use TPU devices. The `xla_device()` function returns the TPU's "default" core as a device. This lets PyTorch creates tensors on TPUs:

In [5]:
# Creates a random tensor on xla:1 (a Cloud TPU core)
dev = xm.xla_device()
t1 = torch.ones(3, 3, device = dev)
print(t1)

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], device='xla:1')


See the documentation at http://pytorch.org/xla/ for a description of all public PyTorch/XLA functions. Here `xm.xla_device()` acquired the first Cloud TPU core ('xla:1'). Other cores can be directly acquired, too:

In [6]:
# Creating a tensor on the second Cloud TPU core
second_dev = xm.xla_device(n=2, devkind='TPU')
t2 = torch.zeros(3, 3, device = second_dev)
print(t2)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]], device='xla:2')


It is recommended that you use functions like `xm.xla_device()` over directly specifying TPU cores. A future Colab tutorial will show how to easily train a network using multiple cores (or you can look at [an example](https://colab.research.google.com/github/pytorch/xla/blob/master/contrib/colab/mnist-training.ipynb#scrollTo=Afwo4H7kSd8Phttps://)).


Tensors on TPUs can be manipulated like any other PyTorch tensor. The following cell adds, multiplies, and matrix multiplies two tensors on a TPU core:

In [7]:
a = torch.randn(2, 2, device = dev)
b = torch.randn(2, 2, device = dev)
print(a + b)
print(b * 2)
print(torch.matmul(a, b))

tensor([[ 0.9285, -1.1517],
        [ 0.4875,  1.0669]], device='xla:1')
tensor([[ 1.6515,  0.6532],
        [ 1.8291, -0.5493]], device='xla:1')
tensor([[-1.2652,  0.4401],
        [ 0.8757, -0.5096]], device='xla:1')


This next cell runs a 1D convolution on a TPU core:

In [8]:
# Creates random filters and inputs to a 1D convolution
filters = torch.randn(33, 16, 3, device = dev)
inputs = torch.randn(20, 16, 50, device = dev)
torch.nn.functional.conv1d(inputs, filters)

tensor([[[ 7.4347e+00, -9.3358e+00,  2.8346e+00,  ...,  4.8745e-01,
          -1.1800e+01,  2.3509e+00],
         [-5.6661e+00, -9.2697e+00, -2.7056e+00,  ..., -5.8482e+00,
          -7.8100e+00,  4.6295e+00],
         [-4.4290e+00, -6.9281e+00,  2.1860e+00,  ...,  1.2434e+01,
          -7.8181e+00, -5.3043e-01],
         ...,
         [ 1.6844e+01,  9.5925e+00, -7.4014e+00,  ..., -3.5882e+00,
           8.9396e-01, -1.1485e+01],
         [-1.2405e+01, -9.6386e+00,  5.4843e+00,  ...,  3.7790e+00,
          -3.0254e+00, -1.1809e+00],
         [ 4.5209e+00, -1.1931e+01, -2.5749e+00,  ...,  5.3200e+00,
          -2.9704e+00,  1.5700e+00]],

        [[ 2.5035e+00,  8.3513e+00, -4.9031e+00,  ...,  1.3565e+01,
          -9.2602e+00,  2.9704e+00],
         [-3.6338e+00,  5.2198e+00,  4.7569e+00,  ...,  2.1375e+00,
          -2.6774e+00, -1.1080e+01],
         [ 8.0821e-01, -4.5582e+00, -1.8723e+00,  ..., -4.4451e+00,
           1.5516e+00, -1.0374e+01],
         ...,
         [ 1.1265e+00,  3

And tensors can be transferred between CPU and TPU. In the following cell, a tensor on the CPU is copied to a TPU core, and then copied back to the CPU again. Note that PyTorch makes copies of tensors when transferring them across devices, so `t_cpu` and `t_cpu_again` are different tensors.



In [9]:
# Creates a tensor on the CPU (device='cpu' is unnecessary and only added for clarity)
t_cpu = torch.randn(2, 2, device='cpu')
print(t_cpu)

t_tpu = t_cpu.to(dev)
print(t_tpu)

t_cpu_again = t_tpu.to('cpu')
print(t_cpu_again)

tensor([[-0.9795, -0.3590],
        [-0.7553,  0.9867]])
tensor([[-0.9795, -0.3590],
        [-0.7553,  0.9867]], device='xla:1')
tensor([[-0.9795, -0.3590],
        [-0.7553,  0.9867]])


## Running PyTorch modules and autograd on TPUs

Modules and autograd are fundamental PyTorch components. 

In PyTorch, every stateful function is a module. Modules are Python classes augmented with metadata that lets PyTorch understand how to use them in a neural network. For example, linear layers are modules, as are entire networks. Since modules are stateful, they can be placed on devices, too. PyTorch/XLA lets us place them on TPU cores:


In [12]:
# Creates a linear module
fc = torch.nn.Linear(5, 2, bias=True)

# Copies the module to the XLA device (the first Cloud TPU core)
fc = fc.to(dev)

# Creates a random feature tensor
features = torch.randn(3, 5, device=dev, requires_grad=True)

# Runs and prints the module
output = fc(features)
print(output)

tensor([[-0.6566, -0.8653],
        [ 0.9762,  0.7125],
        [ 0.4230,  0.2303]], device='xla:1', grad_fn=<AddmmBackward>)


Autograd is the system PyTorch uses to populate the gradients of weights in a neural network. See [here](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#sphx-glr-beginner-blitz-autograd-tutorial-py) for details about PyTorch's autograd. When a module is run on a TPU core, its gradients are also populated on the same TPU core by autograd. The following cell demonstrates this:

In [13]:
output.backward(torch.ones_like(output))
print(fc.weight.grad)

tensor([[-4.2266, -0.8027,  2.4844, -2.2441,  0.4492],
        [-4.2266, -0.8027,  2.4844, -2.2441,  0.4492]], device='xla:1')


## Running PyTorch networks on TPUs

As mentioned above, PyTorch networks are also modules, and so they're run in the same way. The following cell runs a relatively simple PyTorch network from the [PyTorch tutorial docs](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py) on a TPU core:

In [14]:
import torch.nn as nn
import torch.nn.functional as F

# Simple example network from 
# https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


# Places network on the default TPU core
net = Net().to(dev)

# Creates random input on the default TPU core
input = torch.randn(1, 1, 32, 32, device=dev)

# Runs network
out = net(input)
print(out)

tensor([[-0.1059, -0.0371,  0.0275,  0.1023, -0.0938,  0.0578,  0.1138,  0.0771,
          0.0062,  0.1206]], device='xla:1', grad_fn=<AddmmBackward>)


As in the previous snippets, running PyTorch on a TPU just requires specifying a TPU core as a device.

## More PyTorch on TPUs!

This "Getting Started" notebook showed you how to:

* Install PyTorch/XLA on Colab, which lets you use PyTorch with TPUs.
* Run basic PyTorch functions on TPUs, like creating and adding tensors.
* Run PyTorch modules and autograd on TPUs.
* Run PyTorch networks on TPUs.

Working with tensors, running modules, and running entire networks on a Cloud TPU is as simple as installing PyTorch/XLA and telling PyTorch to use the Colab TPU as its device.

You're encouraged to try PyTorch/XLA on Colab and GCP, too! You can copy this notebook to get started, or just copy the PyTorch/XLA setup cell above. The PyTorch/XLA team would love to hear your ideas/suggestions/comments, too! For now, the best way to contact the team is with an issue on our Github: https://github.com/pytorch/xla/issues. 

Future tutorials will cover:

* Training a network using multiple TPU cores.
* Loading batched data onto TPUs.

You can already see examples of training a network on multiple TPU cores and loading batched data onto TPUs [here](https://github.com/pytorch/xla/tree/master/contrib/colab).

