In [4]:
import torch
import numpy as np
torch.manual_seed(7)

<torch._C.Generator at 0x7fe2e4cc7270>

# 1. Creating Tensors
## 1.1 directly from data
 **torch.tensor(data, dtype=None, device=None, requires_grad=False, pin_memory=False)**

In [5]:
data = [[0.1, 2], [-2, 3], [0.5, 0.2]]
torch.tensor(data)

tensor([[ 0.1000,  2.0000],
        [-2.0000,  3.0000],
        [ 0.5000,  0.2000]])

## 1.2 from numpy
 **torch.from_numpy(ndarray, dtype=None, device=None, requires_grad=False, pin_memory=False)**

In the section above on broadcasting, it was mentioned that PyTorch’s broadcast semantics are compatible with NumPy’s - but the kinship between PyTorch and NumPy goes even deeper than that.

If you have existing ML or scientific code with data stored in NumPy ndarrays, you may wish to express that same data as PyTorch tensors, whether to take advantage of PyTorch’s GPU acceleration, or its efficient abstractions for building ML models. It’s easy to switch between ndarrays and PyTorch tensors:

In [28]:
data = np.array(data)
torch.from_numpy(data)

tensor([[ 0.1000,  2.0000],
        [-2.0000,  3.0000],
        [ 0.5000,  0.2000]], dtype=torch.float64)

PyTorch creates a tensor of the same shape and containing the same data as the NumPy array, going so far as to keep NumPy’s default 64-bit float data type.

The conversion can just as easily go the other way:

In [60]:
pytorch_rand = torch.rand(2, 3)
print(pytorch_rand)

numpy_rand = pytorch_rand.numpy()
print(numpy_rand)

tensor([[0.0896, 0.1464, 0.1304],
        [0.4130, 0.0800, 0.6579]])
[[0.08962834 0.14639658 0.13041377]
 [0.41298044 0.07995176 0.6578799 ]]


It is important to know that these converted objects are using the same underlying memory as their source objects, meaning that changes to one are reflected in the other:

In [61]:
numpy_array[1, 1] = 23
print(pytorch_tensor)

pytorch_rand[1, 1] = 17
print(numpy_rand)

tensor([[ 1.,  1.,  1.],
        [ 1., 23.,  1.]], dtype=torch.float64)
[[ 0.08962834  0.14639658  0.13041377]
 [ 0.41298044 17.          0.6578799 ]]


## 1.3 from another tensor
**torch.ones_like(input, dtype=None, layout=None, device=None, requires_grad=False)**\
**torch.zeros_like(input, dtype=None, layout=None, device=None, requires_grad=False**\
**torch.empty_like(input)**\
**torch.rand_like(input)**

The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.

In [30]:
tensor = torch.from_numpy(data)
torch.ones_like(tensor)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]], dtype=torch.float64)

In [31]:
torch.zeros_like(tensor)

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]], dtype=torch.float64)

In [41]:
torch.full_like(tensor, 3.14159)

tensor([[3.1416, 3.1416],
        [3.1416, 3.1416],
        [3.1416, 3.1416]], dtype=torch.float64)

## 1.4 with random or constant values
**torch.empty(shape)**
**torch.ones(*size, out=None, dtype=None, layout=torch.strided, device=None, required_grad=False)**\
**torch.zeros(*size, out=None, dtype=None, layout=torch.strided, device=None, required_grad=False)**\
**torch.full(size, fill_value, out=None, dtype=None, layout=torch.strided, device=None, required_grad=False)**\
**torch.arange(start=0, end, step=1, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)**\
**torch.linspace(start, end, steps=100, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)**\
**torch.logspace(start, end, steps=100, base=10.0, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)**\
**torch.eye(n, m=None, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)**\

shape is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor



In [34]:
shape = (2, 3,)
torch.rand(shape)

tensor([[0.5349, 0.1988, 0.6592],
        [0.6569, 0.2328, 0.4251]])

Speaking of the random tensor, did you notice the call to torch.manual_seed() immediately preceding it? Initializing tensors, such as a model’s learning weights, with random values is common but there are times - especially in research settings - where you’ll want some assurance of the reproducibility of your results. Manually setting your random number generator’s seed is the way to do this. Let’s look more closely:

In [74]:
torch.manual_seed(1729)
random1 = torch.rand(2, 3)
print(random1)

random2 = torch.rand(2, 3)
print(random2)

torch.manual_seed(1729)
random3 = torch.rand(2, 3)
print(random3)

random4 = torch.rand(2, 3)
print(random4)

tensor([[0.3126, 0.3791, 0.3087],
        [0.0736, 0.4216, 0.0691]])
tensor([[0.2332, 0.4047, 0.2162],
        [0.9927, 0.4128, 0.5938]])
tensor([[0.3126, 0.3791, 0.3087],
        [0.0736, 0.4216, 0.0691]])
tensor([[0.2332, 0.4047, 0.2162],
        [0.9927, 0.4128, 0.5938]])


In [37]:
torch.empty(shape)

tensor([[0., 0., 0.],
        [0., 0., -0.]])

In [35]:
torch.ones(shape)

tensor([[1., 1., 1.],
        [1., 1., 1.]])

In [36]:
torch.zeros(shape)

tensor([[0., 0., 0.],
        [0., 0., 0.]])

In [40]:
torch.full(shape, 3)

tensor([[3, 3, 3],
        [3, 3, 3]])

In [43]:
torch.arange(1, 2.5, 0.5)

tensor([1.0000, 1.5000, 2.0000])

In [44]:
torch.linspace(-10, 10, 5)

tensor([-10.,  -5.,   0.,   5.,  10.])

In [45]:
torch.logspace(0.1, 1, 5)

tensor([ 1.2589,  2.1135,  3.5481,  5.9566, 10.0000])

In [50]:
torch.eye(4)

tensor([[1., 0., 0., 0.],
        [0., 1., 0., 0.],
        [0., 0., 1., 0.],
        [0., 0., 0., 1.]])

##  1.5 according to probability distribution
**torch.normal(mean, std, out=None)**\
**正态分布: torch.randn(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)**\
**均匀分布: torch.rand(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)**\
**均匀分布: torch.randint(low=0, high, size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)**\
**随机排列: torch.randperm(n, out=None, dtype=torch.int64, layout=torch.strided, device=None, requires_grad=False)**\
**伯努利分布: torch.bernoulli(input, *, generator=None, out=None)**



In [55]:
torch.normal(torch.arange(1., 11.), torch.arange(1, 0, -0.1))

tensor([-0.2514,  0.3043,  3.3566,  3.5052,  4.0550,  5.6841,  7.0085,  7.8919,
         9.1132, 10.0214])

In [149]:
torch.normal(torch.tensor(1.), torch.arange(1, 0, -0.1))

tensor([ 0.2239,  1.4996,  1.1819, -0.5354,  0.8123,  1.1935,  1.1532,  0.7853,
         0.7847,  0.7890])

In [157]:
torch.normal(torch.arange(1., 11.), torch.tensor(0.4))

tensor([0.9784, 3.1832, 2.9405, 4.5592, 5.2700, 5.6584, 7.1437, 7.6651, 9.0430,
        9.4740])

In [159]:
torch.normal(torch.tensor(-1.), torch.tensor(2))

tensor(-4.0019)

In [60]:
torch.normal(0, 1, (1, 4))

tensor([[ 0.4589, -0.1495,  2.2695, -0.0911]])

In [61]:
torch.randn(2, 3)

tensor([[ 0.9732, -1.1010, -0.7484],
        [ 1.9863, -0.2902, -2.4220]])

In [9]:
torch.manual_seed(1729)
random1 = torch.rand(2, 3)
print(random1)

random2 = torch.rand(2, 3)
print(random2)

torch.manual_seed(1729)
random3 = torch.rand(2, 3)
print(random3)

random4 = torch.rand(2, 3)
print(random4)

tensor([[0.3126, 0.3791, 0.3087],
        [0.0736, 0.4216, 0.0691]])
tensor([[0.2332, 0.4047, 0.2162],
        [0.9927, 0.4128, 0.5938]])
tensor([[0.3126, 0.3791, 0.3087],
        [0.0736, 0.4216, 0.0691]])
tensor([[0.2332, 0.4047, 0.2162],
        [0.9927, 0.4128, 0.5938]])


In [62]:
torch.rand(2, 3)

tensor([[0.1049, 0.5137, 0.2674],
        [0.4990, 0.7447, 0.7213]])

In [63]:
torch.randint(3, 10, (2, 2))

tensor([[9, 3],
        [4, 6]])

In [65]:
torch.randperm(8)

tensor([7, 4, 0, 1, 3, 2, 5, 6])

In [66]:
tensor = torch.empty(3, 3).uniform_(0, 1)
torch.bernoulli(tensor)

tensor([[0., 1., 1.],
        [0., 0., 0.],
        [0., 0., 1.]])

# 2. Attributes of a Tensor

In [8]:
tensor = torch.empty(3)
print(type(tensor))
print(tensor.dtype)
print(tensor.shape)
print(tensor.device)
print(tensor)

<class 'torch.Tensor'>
torch.float32
torch.Size([3])
cpu
tensor([8.4078e-45, 0.0000e+00, 0.0000e+00])


## 2.1 Tensor Data Types
The type of the object returned is torch.Tensor, which is an alias for torch.FloatTensor; by default, PyTorch tensors are populated with 32-bit floating point numbers. (More on data types below.)

You will probably see some random-looking values when printing your tensor. The torch.empty() call allocates memory for the tensor, but does not initialize it with any values - so what you’re seeing is whatever was in memory at the time of allocation.

In [10]:
a = torch.ones((2, 3), dtype=torch.int16)
print(a)

b = torch.rand((2, 3), dtype=torch.float64) * 20.
print(b)

c = b.to(torch.int32)
print(c)

tensor([[1, 1, 1],
        [1, 1, 1]], dtype=torch.int16)
tensor([[18.0429,  7.2532, 19.6519],
        [10.8626,  2.1505, 19.6913]], dtype=torch.float64)
tensor([[18,  7, 19],
        [10,  2, 19]], dtype=torch.int32)


* The simplest way to set the underlying data type of a tensor is with an optional argument at creation time. In the first line of the cell above, we set dtype=torch.int16 for the tensor a. When we print a, we can see that it’s full of 1 rather than 1. - Python’s subtle cue that this is an integer type rather than floating point.
* Another thing to notice about printing a is that, unlike when we left dtype as the default (32-bit floating point), printing the tensor also specifies its dtype.

* You may have also spotted that we went from specifying the tensor’s shape as a series of integer arguments, to grouping those arguments in a tuple. This is not strictly necessary - PyTorch will take a series of initial, unlabeled integer arguments as a tensor shape - but when adding the optional arguments, it can make your intent more readable.

* The other way to set the datatype is with the .to() method. In the cell above, we create a random floating point tensor b in the usual way. Following that, we create c by converting b to a 32-bit integer with the .to() method. Note that c contains all the same values as b, but truncated to integers.

* Available data types include:

  * torch.bool
  * torch.int8
  * torch.uint8
  * torch.int16
  * torch.int32
  * torch.int64
  * torch.half
  * torch.float
  * torch.double
  * torch.bfloat

# 3. tensor operation
## 3.1 Standard numpy-like indexing and slicing:

In [47]:
tensor = torch.ones(4, 4)
print(tensor)
tensor[:, ::2] = 0
print(tensor)
tensor[:2, :] = 2
print(tensor)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])
tensor([[0., 1., 0., 1.],
        [0., 1., 0., 1.],
        [0., 1., 0., 1.],
        [0., 1., 0., 1.]])
tensor([[2., 2., 2., 2.],
        [2., 2., 2., 2.],
        [0., 1., 0., 1.],
        [0., 1., 0., 1.]])


In [48]:
print(tensor[-1, ...])

tensor([0., 1., 0., 1.])


## 3.2 Joining tensors 
* torch.cat(tensors, dim=0, out=None)
* torch.stack(tensors, dim=0, out=None)
* torch.chunk(input, chunks, dim=0)
* torch.split(tensor, split_size_or_sections, dim=0)

In [13]:
tensor = torch.randn(3, 2)
tensor

tensor([[ 0.9329, -0.9412],
        [ 0.5536, -0.6499],
        [-1.5436, -0.5975]])

In [14]:
torch.cat((tensor, tensor, tensor), 0)

tensor([[ 0.9329, -0.9412],
        [ 0.5536, -0.6499],
        [-1.5436, -0.5975],
        [ 0.9329, -0.9412],
        [ 0.5536, -0.6499],
        [-1.5436, -0.5975],
        [ 0.9329, -0.9412],
        [ 0.5536, -0.6499],
        [-1.5436, -0.5975]])

In [15]:
torch.cat((tensor, tensor, tensor), 1)

tensor([[ 0.9329, -0.9412,  0.9329, -0.9412,  0.9329, -0.9412],
        [ 0.5536, -0.6499,  0.5536, -0.6499,  0.5536, -0.6499],
        [-1.5436, -0.5975, -1.5436, -0.5975, -1.5436, -0.5975]])

In [16]:
torch.stack((tensor, tensor, tensor), 0)

tensor([[[ 0.9329, -0.9412],
         [ 0.5536, -0.6499],
         [-1.5436, -0.5975]],

        [[ 0.9329, -0.9412],
         [ 0.5536, -0.6499],
         [-1.5436, -0.5975]],

        [[ 0.9329, -0.9412],
         [ 0.5536, -0.6499],
         [-1.5436, -0.5975]]])

In [17]:
torch.stack((tensor, tensor, tensor), 1)

tensor([[[ 0.9329, -0.9412],
         [ 0.9329, -0.9412],
         [ 0.9329, -0.9412]],

        [[ 0.5536, -0.6499],
         [ 0.5536, -0.6499],
         [ 0.5536, -0.6499]],

        [[-1.5436, -0.5975],
         [-1.5436, -0.5975],
         [-1.5436, -0.5975]]])

In [18]:
torch.stack((tensor, tensor, tensor), 2)

tensor([[[ 0.9329,  0.9329,  0.9329],
         [-0.9412, -0.9412, -0.9412]],

        [[ 0.5536,  0.5536,  0.5536],
         [-0.6499, -0.6499, -0.6499]],

        [[-1.5436, -1.5436, -1.5436],
         [-0.5975, -0.5975, -0.5975]]])

## 3.3 Manipulating Tensor Shapes
### 3.3.1 change tensor shape
**torch.reshape(input, shape)**\
**torch.transpose(input, dim0, dim1)**

Sometimes, you’ll need to change the shape of your tensor. Below, we’ll look at a few common cases, and how to handle them.

In [22]:
torch.chunk(tensor, 3, 0)

(tensor([[ 0.9329, -0.9412]]),
 tensor([[ 0.5536, -0.6499]]),
 tensor([[-1.5436, -0.5975]]))

In [23]:
torch.chunk(tensor, 2, 1)

(tensor([[ 0.9329],
         [ 0.5536],
         [-1.5436]]),
 tensor([[-0.9412],
         [-0.6499],
         [-0.5975]]))

In [24]:
torch.split(tensor, 1, 0)

(tensor([[ 0.9329, -0.9412]]),
 tensor([[ 0.5536, -0.6499]]),
 tensor([[-1.5436, -0.5975]]))

In [26]:
torch.split(tensor, 1, 1)

(tensor([[ 0.9329],
         [ 0.5536],
         [-1.5436]]),
 tensor([[-0.9412],
         [-0.6499],
         [-0.5975]]))

In [27]:
torch.reshape(tensor, (2, 3))

tensor([[ 0.9329, -0.9412,  0.5536],
        [-0.6499, -1.5436, -0.5975]])

In [28]:
torch.transpose(tensor, 1, 0)

tensor([[ 0.9329,  0.5536, -1.5436],
        [-0.9412, -0.6499, -0.5975]])

### 3.3.2 Changing the Number of Dimensions
**torch.squeeze(input, dim=None, out=None)**\
**torch.unsqueeze(input, dim, out=None)**

One case where you might need to change the number of dimensions is passing a single instance of input to your model. PyTorch models generally expect batches of input.

For example, imagine having a model that works on 3 x 226 x 226 images - a 226-pixel square with 3 color channels. When you load and transform it, you’ll get a tensor of shape (3, 226, 226). Your model, though, is expecting input of shape (N, 3, 226, 226), where N is the number of images in the batch. So how do you make a batch of one?

a = torch.rand(3, 226, 226)
b = a.unsqueeze(0)

print(a.shape)
print(b.shape)

The unsqueeze() method adds a dimension of extent 1. unsqueeze(0) adds it as a new zeroth dimension - now you have a batch of one!

So if that’s unsqueezing? What do we mean by squeezing? We’re taking advantage of the fact that any dimension of extent 1 does not change the number of elements in the tensor.

In [65]:
c = torch.rand(1, 1, 1, 1, 1)
print(c)

tensor([[[[[0.8847]]]]])


Continuing the example above, let’s say the model’s output is a 20-element vector for each input. You would then expect the output to have shape (N, 20), where N is the number of instances in the input batch. That means that for our single-input batch, we’ll get an output of shape (1, 20).

What if you want to do some non-batched computation with that output - something that’s just expecting a 20-element vector?

In [67]:
a = torch.rand(1, 20)
print(a.shape)
print(a)

b = a.squeeze(0)
print(b.shape)
print(b)

c = torch.rand(2, 2)
print(c.shape)

d = c.squeeze(0)
print(d.shape)

torch.Size([1, 20])
tensor([[0.3916, 0.0970, 0.7466, 0.0299, 0.0504, 0.5025, 0.8819, 0.7465, 0.4263,
         0.9760, 0.8199, 0.6420, 0.7517, 0.9189, 0.5772, 0.7149, 0.1006, 0.9046,
         0.5158, 0.0369]])
torch.Size([20])
tensor([0.3916, 0.0970, 0.7466, 0.0299, 0.0504, 0.5025, 0.8819, 0.7465, 0.4263,
        0.9760, 0.8199, 0.6420, 0.7517, 0.9189, 0.5772, 0.7149, 0.1006, 0.9046,
        0.5158, 0.0369])
torch.Size([2, 2])
torch.Size([2, 2])


You can see from the shapes that our 2-dimensional tensor is now 1-dimensional, and if you look closely at the output of the cell above you’ll see that printing a shows an “extra” set of square brackets [] due to having an extra dimension.

You may only squeeze() dimensions of extent 1. See above where we try to squeeze a dimension of size 2 in c, and get back the same shape we started with. Calls to squeeze() and unsqueeze() can only act on dimensions of extent 1 because to do otherwise would change the number of elements in the tensor.

Another place you might use unsqueeze() is to ease broadcasting. Recall the example above where we had the following code:

In [68]:
a = torch.ones(4, 3, 2)

c = a * torch.rand(3, 1) # 3rd dim = 1, 2nd dim identical to a
print(c)

tensor([[[0.0187, 0.0187],
         [0.8123, 0.8123],
         [0.0012, 0.0012]],

        [[0.0187, 0.0187],
         [0.8123, 0.8123],
         [0.0012, 0.0012]],

        [[0.0187, 0.0187],
         [0.8123, 0.8123],
         [0.0012, 0.0012]],

        [[0.0187, 0.0187],
         [0.8123, 0.8123],
         [0.0012, 0.0012]]])


The net effect of that was to broadcast the operation over dimensions 0 and 2, causing the random, 3 x 1 tensor to be multiplied element-wise by every 3-element column in a.

What if the random vector had just been 3-element vector? We’d lose the ability to do the broadcast, because the final dimensions would not match up according to the broadcasting rules. unsqueeze() comes to the rescue:

In [70]:
a = torch.ones(4, 3, 2)
b = torch.rand(3)     # trying to multiply a * b will give a runtime error
c = b.unsqueeze(1)       # change to a 2-dimensional tensor, adding new dim at the end
print(c.shape)
print(a * c)             # broadcasting works again!

torch.Size([3, 1])
tensor([[[0.1476, 0.1476],
         [0.1303, 0.1303],
         [0.8201, 0.8201]],

        [[0.1476, 0.1476],
         [0.1303, 0.1303],
         [0.8201, 0.8201]],

        [[0.1476, 0.1476],
         [0.1303, 0.1303],
         [0.8201, 0.8201]],

        [[0.1476, 0.1476],
         [0.1303, 0.1303],
         [0.8201, 0.8201]]])


In [72]:
output3d = torch.rand(6, 20, 20)
print(output3d.shape)

input1d = output3d.reshape(6 * 20 * 20)
print(input1d.shape)

# can also call it as a method on the torch module:
print(torch.reshape(output3d, (6 * 20 * 20,)).shape)

torch.Size([6, 20, 20])
torch.Size([2400])
torch.Size([2400])


The (6 * 20 * 20,) argument in the final line of the cell above is because PyTorch expects a tuple when specifying a tensor shape - but when the shape is the first argument of a method, it lets us cheat and just use a series of integers. Here, we had to add the parentheses and comma to convince the method that this is really a one-element tuple.

When it can, reshape() will return a view on the tensor to be changed - that is, a separate tensor object looking at the same underlying region of memory. This is important: That means any change made to the source tensor will be reflected in the view on that tensor, unless you clone() it.

There are conditions, beyond the scope of this introduction, where reshape() has to return a tensor carrying a copy of the data.

## 3.4 Altering Tensors in Place
There are times, though, that you may wish to alter a tensor in place - for example, if you’re doing an element-wise computation where you can discard intermediate values. For this, most of the math functions have a version with an appended underscore (_) that will alter a tensor in place.

In [38]:
a = torch.tensor([0, math.pi / 4, math.pi / 2, 3 * math.pi / 4])
print('a:')
print(a)
print(torch.sin(a))   # this operation creates a new tensor in memory
print(a)              # a has not changed

b = torch.tensor([0, math.pi / 4, math.pi / 2, 3 * math.pi / 4])
print('\nb:')
print(b)
print(torch.sin_(b))  # note the underscore
print(b)              # b has changed

a:
tensor([0.0000, 0.7854, 1.5708, 2.3562])
tensor([0.0000, 0.7071, 1.0000, 0.7071])
tensor([0.0000, 0.7854, 1.5708, 2.3562])

b:
tensor([0.0000, 0.7854, 1.5708, 2.3562])
tensor([0.0000, 0.7071, 1.0000, 0.7071])
tensor([0.0000, 0.7071, 1.0000, 0.7071])


Note that these in-place arithmetic functions are methods on the torch.Tensor object, not attached to the torch module like many other functions (e.g., torch.sin()). As you can see from a.add_(b), the calling tensor is the one that gets changed in place.

There is another option for placing the result of a computation in an existing, allocated tensor. Many of the methods and functions we’ve seen so far - including creation methods! - have an out argument that lets you specify a tensor to receive the output. If the out tensor is the correct shape and dtype, this can happen without a new memory allocation:

In [73]:
a = torch.rand(2, 2)
b = torch.rand(2, 2)
c = torch.zeros(2, 2)
old_id = id(c)

print(c)
d = torch.matmul(a, b, out=c)
print(c)                # contents of c have changed

assert c is d           # test c & d are same object, not just containing equal values
assert id(c), old_id    # make sure that our new c is the same object as the old one

torch.rand(2, 2, out=c) # works for creation too!
print(c)                # c has changed again
assert id(c), old_id    # still the same object!

tensor([[0., 0.],
        [0., 0.]])
tensor([[0.0428, 0.1386],
        [0.0923, 0.3381]])
tensor([[0.2175, 0.3894],
        [0.4475, 0.7179]])


## 3.5 index
* torch.index_select(input, dim, index, out=None)
* torch.masked_select(input, mask, out=None)
Most binary operations on tensors will return a third, new tensor. When we say c = a * b (where a and b are tensors), the new tensor c will occupy a region of memory distinct from the other tensors.


In [32]:
tensor = torch.randn((3, 4))
tensor

tensor([[ 2.1382,  0.3900, -0.9062,  0.8873],
        [-0.7923, -1.5249, -0.1997, -0.0821],
        [-1.0587,  0.3344, -0.7804,  0.0804]])

In [33]:
indices = torch.tensor([0, 2])
torch.index_select(tensor, 0, indices)

tensor([[ 2.1382,  0.3900, -0.9062,  0.8873],
        [-1.0587,  0.3344, -0.7804,  0.0804]])

In [34]:
mask = tensor.ge(0.5)
mask

tensor([[ True, False, False,  True],
        [False, False, False, False],
        [False, False, False, False]])

In [35]:
torch.masked_select(tensor, mask)

tensor([2.1382, 0.8873])

# 4. More about tensors
## 4.1 Tensor Broadcasting

In [13]:
rand = torch.rand(2, 4)
doubled = rand * (torch.ones(1, 4) * 2)

print(rand)
print(doubled)

tensor([[0.5062, 0.8469, 0.2588, 0.2707],
        [0.4115, 0.6839, 0.0703, 0.5105]])
tensor([[1.0124, 1.6939, 0.5177, 0.5413],
        [0.8230, 1.3677, 0.1405, 1.0210]])


In [13]:
rand = torch.rand(2, 4)
doubled = rand * (torch.ones(1, 4) * 2)

print(rand)
print(doubled)

tensor([[0.5062, 0.8469, 0.2588, 0.2707],
        [0.4115, 0.6839, 0.0703, 0.5105]])
tensor([[1.0124, 1.6939, 0.5177, 0.5413],
        [0.8230, 1.3677, 0.1405, 1.0210]])


What’s the trick here? How is it we got to multiply a 2x4 tensor by a 1x4 tensor?

Broadcasting is a way to perform an operation between tensors that have similarities in their shapes. In the example above, the one-row, four-column tensor is multiplied by both rows of the two-row, four-column tensor.

This is an important operation in Deep Learning. The common example is multiplying a tensor of learning weights by a batch of input tensors, applying the operation to each instance in the batch separately, and returning a tensor of identical shape - just like our (2, 4) * (1, 4) example above returned a tensor of shape (2, 4).

The rules for broadcasting are:

* Each tensor must have at least one dimension - no empty tensors.
* Comparing the dimension sizes of the two tensors, going from last to first:
  * Each dimension must be equal, or
  * One of the dimensions must be of size 1, or
  * The dimension does not exist in one of the tensors
Tensors of identical shape, of course, are trivially “broadcastable”, as you saw earlier.

Here are some examples of situations that honor the above rules and allow broadcasting:



In [17]:
a = torch.ones(4, 3, 2)
a * torch.rand(3, 2) # 3rd & 2nd dims identical to a, dim 1 absent

tensor([[[0.8444, 0.2941],
         [0.3788, 0.4567],
         [0.0649, 0.6677]],

        [[0.8444, 0.2941],
         [0.3788, 0.4567],
         [0.0649, 0.6677]],

        [[0.8444, 0.2941],
         [0.3788, 0.4567],
         [0.0649, 0.6677]],

        [[0.8444, 0.2941],
         [0.3788, 0.4567],
         [0.0649, 0.6677]]])

In [19]:
a * torch.rand(3, 1) # 3rd dim = 1, 2nd dim identical to a

tensor([[[0.4945, 0.4945],
         [0.3857, 0.3857],
         [0.9883, 0.9883]],

        [[0.4945, 0.4945],
         [0.3857, 0.3857],
         [0.9883, 0.9883]],

        [[0.4945, 0.4945],
         [0.3857, 0.3857],
         [0.9883, 0.9883]],

        [[0.4945, 0.4945],
         [0.3857, 0.3857],
         [0.9883, 0.9883]]])

In [20]:
a * torch.rand(1, 2) # 3rd dim identical to a, 2nd dim = 1

tensor([[[0.4762, 0.7242],
         [0.4762, 0.7242],
         [0.4762, 0.7242]],

        [[0.4762, 0.7242],
         [0.4762, 0.7242],
         [0.4762, 0.7242]],

        [[0.4762, 0.7242],
         [0.4762, 0.7242],
         [0.4762, 0.7242]],

        [[0.4762, 0.7242],
         [0.4762, 0.7242],
         [0.4762, 0.7242]]])

Look closely at the values of each tensor above:

The multiplication operation that created b was broadcast over every “layer” of a.
* For c, the operation was broadcast over ever layer and row of a - every 3-element column is identical.
* For d, we switched it around - now every row is identical, across layers and columns.


# 4.2 More Math with Tensors
PyTorch tensors have over three hundred operations that can be performed on them.

Here is a small sample from some of the major categories of operations:

In [26]:
import math
# common functions
a = torch.rand(2, 4) * 2 - 1
print('Common functions:')
print(a)
print(torch.abs(a))
print(torch.ceil(a))
print(torch.floor(a))
print(torch.clamp(a, -0.5, 0.5))

Common functions:
tensor([[ 0.3090, -0.1713, -0.8608, -0.0704],
        [-0.1018,  0.2531,  0.8823, -0.0156]])
tensor([[0.3090, 0.1713, 0.8608, 0.0704],
        [0.1018, 0.2531, 0.8823, 0.0156]])
tensor([[1., -0., -0., -0.],
        [-0., 1., 1., -0.]])
tensor([[ 0., -1., -1., -1.],
        [-1.,  0.,  0., -1.]])
tensor([[ 0.3090, -0.1713, -0.5000, -0.0704],
        [-0.1018,  0.2531,  0.5000, -0.0156]])


In [28]:
# trigonometric functions and their inverses
angles = torch.tensor([0, math.pi / 4, math.pi / 2, 3 * math.pi / 4])
sines = torch.sin(angles)
inverses = torch.asin(sines)
print('Sine and arcsine:')
print(angles)
print(sines)
print(inverses)

Sine and arcsine:
tensor([0.0000, 0.7854, 1.5708, 2.3562])
tensor([0.0000, 0.7071, 1.0000, 0.7071])
tensor([0.0000, 0.7854, 1.5708, 0.7854])


In [29]:
# bitwise operations
print('Bitwise XOR:')
b = torch.tensor([1, 5, 11])
c = torch.tensor([2, 7, 10])
print(torch.bitwise_xor(b, c))

Bitwise XOR:
tensor([3, 2, 1])


In [34]:
# comparisons:
print('Broadcasted, element-wise equality comparison:')
d = torch.tensor([[1., 2.], [3., 4.]])
e = torch.ones(1, 2)  # many comparison ops support broadcasting!
print(torch.eq(d, e)) # returns a tensor of type bool

Broadcasted, element-wise equality comparison:
tensor([[ True, False],
        [False, False]])


In [36]:
# reductions:
print('Reduction ops:')
print(torch.max(d))        # returns a single-element tensor
print(torch.max(d).item()) # extracts the value from the returned tensor
print(torch.mean(d))       # average
print(torch.std(d))        # standard deviation
print(torch.prod(d))       # product of all numbers
print(torch.unique(torch.tensor([1, 2, 1, 2, 1, 2]))) # filter unique elements

Reduction ops:
tensor(4.)
4.0
tensor(2.5000)
tensor(1.2910)
tensor(24.)
tensor([1, 2])


In [37]:
# vector and linear algebra operations
v1 = torch.tensor([1., 0., 0.])         # x unit vector
v2 = torch.tensor([0., 1., 0.])         # y unit vector
m1 = torch.rand(2, 2)                   # random matrix
m2 = torch.tensor([[3., 0.], [0., 3.]]) # three times identity matrix

print('Vectors & Matrices:')
print(torch.cross(v2, v1)) # negative of z unit vector (v1 x v2 == -v2 x v1)
print(m1)
m3 = torch.matmul(m1, m2)
print(m3)                  # 3 times m1
print(torch.svd(m3))       # singular value decomposition

Vectors & Matrices:
tensor([ 0.,  0., -1.])
tensor([[0.5461, 0.5396],
        [0.3053, 0.1973]])
tensor([[1.6382, 1.6188],
        [0.9160, 0.5919]])
torch.return_types.svd(
U=tensor([[-0.9060, -0.4232],
        [-0.4232,  0.9060]]),
S=tensor([2.5402, 0.2020]),
V=tensor([[-0.7369,  0.6760],
        [-0.6760, -0.7369]]))


In [39]:
a = torch.ones(2, 2)
b = torch.rand(2, 2)

print('Before:')
print(a)
print(b)
print('\nAfter adding:')
print(a.add_(b))
print(a)
print(b)
print('\nAfter multiplying')
print(b.mul_(b))
print(b)

Before:
tensor([[1., 1.],
        [1., 1.]])
tensor([[0.3285, 0.5655],
        [0.0065, 0.7765]])

After adding:
tensor([[1.3285, 1.5655],
        [1.0065, 1.7765]])
tensor([[1.3285, 1.5655],
        [1.0065, 1.7765]])
tensor([[0.3285, 0.5655],
        [0.0065, 0.7765]])

After multiplying
tensor([[1.0794e-01, 3.1982e-01],
        [4.2797e-05, 6.0299e-01]])
tensor([[1.0794e-01, 3.1982e-01],
        [4.2797e-05, 6.0299e-01]])


Note that these in-place arithmetic functions are methods on the torch.Tensor object, not attached to the torch module like many other functions (e.g., torch.sin()). As you can see from a.add_(b), the calling tensor is the one that gets changed in place.

There is another option for placing the result of a computation in an existing, allocated tensor. Many of the methods and functions we’ve seen so far - including creation methods! - have an out argument that lets you specify a tensor to receive the output. If the out tensor is the correct shape and dtype, this can happen without a new memory allocation:

In [40]:
a = torch.rand(2, 2)
b = torch.rand(2, 2)
c = torch.zeros(2, 2)
old_id = id(c)

print(c)
d = torch.matmul(a, b, out=c)
print(c)                # contents of c have changed

assert c is d           # test c & d are same object, not just containing equal values
assert id(c), old_id    # make sure that our new c is the same object as the old one

torch.rand(2, 2, out=c) # works for creation too!
print(c)                # c has changed again
assert id(c), old_id    # still the same object!

tensor([[0., 0.],
        [0., 0.]])
tensor([[0.5991, 0.7677],
        [0.8968, 1.2237]])
tensor([[0.4704, 0.6077],
        [0.4757, 0.5874]])


## 4.3 Copying Tensors
As with any object in Python, assigning a tensor to a variable makes the variable a label of the tensor, and does not copy it. For example:

In [41]:
a = torch.ones(2, 2)
b = a

a[0][1] = 561  # we change a...
print(b)       # ...and b is also altered

tensor([[  1., 561.],
        [  1.,   1.]])


But what if you want a separate copy of the data to work on? The clone() method is there for you:

In [42]:
a = torch.ones(2, 2)
b = a.clone()

assert b is not a      # different objects in memory...
print(torch.eq(a, b))  # ...but still with the same contents!

a[0][1] = 561          # a changes...
print(b)               # ...but b is still all ones

tensor([[True, True],
        [True, True]])
tensor([[1., 1.],
        [1., 1.]])


There is an important thing to be aware of when using ``clone()``. If your source tensor has autograd, enabled then so will the clone.

In many cases, this will be what you want. For example, if your model has multiple computation paths in its forward() method, and both the original tensor and its clone contribute to the model’s output, then to enable model learning you want autograd turned on for both tensors. If your source tensor has autograd enabled (which it generally will if it’s a set of learning weights or derived from a computation involving the weights), then you’ll get the result you want.

On the other hand, if you’re doing a computation where neither the original tensor nor its clone need to track gradients, then as long as the source tensor has autograd turned off, you’re good to go.

There is a third case, though: Imagine you’re performing a computation in your model’s forward() function, where gradients are turned on for everything by default, but you want to pull out some values mid-stream to generate some metrics. In this case, you don’t want the cloned copy of your source tensor to track gradients - performance is improved with autograd’s history tracking turned off. For this, you can use the .detach() method on the source tensor:

In [43]:
a = torch.rand(2, 2, requires_grad=True) # turn on autograd
print(a)

b = a.clone()
print(b)

c = a.detach().clone()
print(c)

print(a)

tensor([[0.4363, 0.6339],
        [0.3208, 0.4323]], requires_grad=True)
tensor([[0.4363, 0.6339],
        [0.3208, 0.4323]], grad_fn=<CloneBackward>)
tensor([[0.4363, 0.6339],
        [0.3208, 0.4323]])
tensor([[0.4363, 0.6339],
        [0.3208, 0.4323]], requires_grad=True)


* We create a with requires_grad=True turned on. We haven’t covered this optional argument yet, but will during the unit on autograd.
* When we print a, it informs us that the property requires_grad=True - this means that autograd and computation history tracking are turned on.
* We clone a and label it b. When we print b, we can see that it’s tracking its computation history - it has inherited a’s autograd settings, and added to the computation history.
* We clone a into c, but we call detach() first.
* Printing c, we see no computation history, and no requires_grad=True.
The detach() method detaches the tensor from its computation history. It says, “do whatever comes next as if autograd was off.” It does this without changing a - you can see that when we print a again at the end, it retains its requires_grad=True property.

# 5. Moving to GPU
One of the major advantages of PyTorch is its robust acceleration on CUDA-compatible Nvidia GPUs. (“CUDA” stands for Compute Unified Device Architecture, which is Nvidia’s platform for parallel computing.) So far, everything we’ve done has been on CPU. How do we move to the faster hardware?

First, we should check whether a GPU is available, with the is_available() method.

In [63]:
if torch.cuda.is_available():
    print('We have a GPU!')
else:
    print('Sorry, CPU only.')

Sorry, CPU only.


Once we’ve determined that one or more GPUs is available, we need to put our data someplace where the GPU can see it. Your CPU does computation on data in your computer’s RAM. Your GPU has dedicated memory attached to it. Whenever you want to perform a computation on a device, you must move all the data needed for that computation to memory accessible by that device. (Colloquially, “moving the data to memory accessible by the GPU” is shorted to, “moving the data to the GPU”.)

There are multiple ways to get your data onto your target device. You may do it at creation time:

In [45]:
if torch.cuda.is_available():
    gpu_rand = torch.rand(2, 2, device='cuda')
    print(gpu_rand)
else:
    print('Sorry, CPU only.')

Sorry, CPU only.


By default, new tensors are created on the CPU, so we have to specify when we want to create our tensor on the GPU with the optional device argument. You can see when we print the new tensor, PyTorch informs us which device it’s on (if it’s not on CPU).

You can query the number of GPUs with torch.cuda.device_count(). If you have more than one GPU, you can specify them by index: device='cuda:0', device='cuda:1', etc.

As a coding practice, specifying our devices everywhere with string constants is pretty fragile. In an ideal world, your code would perform robustly whether you’re on CPU or GPU hardware. You can do this by creating a device handle that can be passed to your tensors instead of a string:

In [46]:
if torch.cuda.is_available():
    my_device = torch.device('cuda')
else:
    my_device = torch.device('cpu')
print('Device: {}'.format(my_device))

x = torch.rand(2, 2, device=my_device)
print(x)

Device: cpu
tensor([[0.1811, 0.6962],
        [0.8073, 0.2125]])


If you have an existing tensor living on one device, you can move it to another with the to() method. The following line of code creates a tensor on CPU, and moves it to whichever device handle you acquired in the previous cell.

In [47]:
y = torch.rand(2, 2)
y = y.to(my_device)

It is important to know that in order to do computation involving two or more tensors, all of the tensors must be on the same device. The following code will throw a runtime error, regardless of whether you have a GPU device available:

In [48]:
x = torch.rand(2, 2)
y = torch.rand(2, 2, device='gpu')
z = x + y  # exception will be thrown

RuntimeError: Expected one of cpu, cuda, xpu, mkldnn, opengl, opencl, ideep, hip, msnpu, mlc, xla, vulkan, meta, hpu device type at start of device string: gpu