## CUDA Semantics

`torch.cuda` is used to set up and run CUDA operations. It keeps track of the currently selected `GPU`, and all CUDA tensors you allocate will be default be created on that `device`. The selected device can be changed with a `torch.cuda.device` context manager.

However, once a tensor is allocted, you can do operations on it irrespective of the selected device, and the results will be always placed is on the same device as the tensor.

Cross-GPU operations are not allowed by default, with the exception of `copy_()` and other methods with copy-like functionality such as `to()` and `cuda()`. Unless you enable `peer-to-peer` memory access, any attempts to launch ops on tensors spread accross different devices will raise an error.

In [1]:
import torch
cuda = torch.device('cuda')

x = torch.tensor([1., 2.], device=cuda)
# x.device is device(type='cuda', index=0)
y = torch.tensor([1., 2.]).cuda()

with torch.cuda.device(0):
    # allocates a tensor on GPU 1
    a = torch.tensor([1., 2.], device=cuda)

    # transfers a tensor from CPU to GPU 1
    b = torch.tensor([1., 2.]).cuda()
    # a.device and b.device are device(type='cuda', index=1)

    # You can also use ``Tensor.to`` to transfer a tensor:
    b2 = torch.tensor([1., 2.]).to(device=cuda)
    # b.device and b2.device are device(type='cuda', index=1)

    c = a + b
    # c.device is device(type='cuda', index=1)

    z = x + y
    # z.device is device(type='cuda', index=0)

    # even within a context, you can specify the device
    # (or give a GPU index to the .cuda call)
    d = torch.randn(2, device=cuda)
    e = torch.randn(2).to(cuda)
    f = torch.randn(2).cuda(cuda)
    # d.device, e.device, and f.device are all device(type='cuda', index=2)