Python programmers already familiar with ndarray from NumPy package.
Tensor in PyTorch is similar to NumPy’s ndarray with two key additions
Support for computation on GPUs, Automatic differentiation.
These features are critical for deep learning. 

A tensor represents a array of numerical values. With one axis, a tensor corresponds to a vector.
With two axes, a tensor corresponds to a matrix. 
Tensors with more than two axes do not have special mathematical names.

This notebook will walk you through the basics of working with tensors in PyTorch. 

The first thing to do is to import pytorch. 

In [1]:
import torch

# Tensor Creation

We can use arange to create a row vector x containing the first 12 integers.

In [2]:
x = torch.arange(12)
x

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

They are created as floats by default.
Each of the values in a tensor is called an element of the tensor. 
Unless specified, a new tensor will be designated for CPU computation.

Typically, we want our matrices initialized either with zeros, ones, or other constants.
We can create a tensor representing a tensor with all elements set to 0 and a shape of (2, 3, 4) as follows:

In [3]:
torch.zeros((2, 3, 4))

tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]])

Similarly, we can create tensors with each element set to 1 as follows:

In [4]:
torch.ones((2, 3, 4))

tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])

Often we want to randomly sample the values for each element in a tensor from some probability distribution.
For example, we typically initialize parameters in a neural network randomly.

In [5]:
torch.randn(3, 4)

tensor([[ 0.4754, -2.3061,  1.0743,  1.6016],
        [-0.1406, -0.7777,  0.0898, -0.3613],
        [ 0.3359, -0.0960,  1.0118, -0.1215]])

Each element is randomly sampled from a standard Gaussian distribution with a mean of 0 and a standard deviation of 1.

We can also specify the exact values for each element by supplying a Python list (or list of lists).

In [6]:
torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

tensor([[2, 1, 4, 3],
        [1, 2, 3, 4],
        [4, 3, 2, 1]])

Here, the outermost list corresponds to axis 0, and the inner list to axis 1.

One of the most important features of PyTorch is that it can use graphics processing units (GPUs) to accelerate its tensor operations. We can easily check whether PyTorch is configured to use GPUs:

In [7]:
if torch.cuda.is_available:
  print('PyTorch can use GPUs!')
else:
  print('PyTorch cannot use GPUs.')

PyTorch can use GPUs!


Can enable GPUs in Colab via Runtime ->￼ Change Runtime Type ->￼ Hardware Accelerator ->￼ GPU.

PyTorch tensors have a device attribute specifying where the tensor is stored.
Either CPU, or CUDA for NVIDA GPUs.

In [8]:
# Construct a tensor on the CPU
x0 = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
print('x0 device:', x0.device)

x0 device: cpu


Tensor construction on GPU:

In [9]:
y = torch.tensor([[1, 2, 3], [4, 5, 6]], device='cuda')
print('y device:', y.device)

y device: cuda:0


We can also use the methods .cuda() and .cpu() methods to move tensors between CPU and GPU.


In [10]:
# Move it to the GPU using .cuda()
x1 = x0.cuda()
print('x1 device:', x1.device)

# Move it back to the CPU using .cpu()
x2 = x1.cpu()
print('x2 device:', x2.device)

x1 device: cuda:0
x2 device: cpu


# Tensor Datatypes

PyTorch provides a set of numeric datatypes for tensors.
* torch.float32 or torch.float: 32-bit floating-point
* torch.float64 or torch.double: 64-bit, double-precision floating-point 
* torch.float16 or torch.half: 16-bit, half-precision floating-point
* torch.int8: signed 8-bit integers
* torch.uint8: unsigned 8-bit integers
* torch.int16 or torch.short: signed 16-bit integers
* torch.int32 or torch.int: signed 32-bit integers
* torch.int64 or torch.long: signed 64-bit integers

In the examples above, you may have noticed that some of our tensors contained floating-point values, while others contained integer values.
PyTorch tries to guess a datatype when you create a tensor.

In [11]:
# Let torch choose the datatype
x0 = torch.tensor([1, 2])   # List of integers
x1 = torch.tensor([1., 2.]) # List of floats
x2 = torch.tensor([1., 2])  # Mixed list

Each tensor has a dtype attribute that you can use to check its data type:

In [12]:
print('List of integers:', x0.dtype)
print('List of floats:', x1.dtype)
print('Mixed list:', x2.dtype)

List of integers: torch.int64
List of floats: torch.float32
Mixed list: torch.float32


Functions that construct tensors typically have a dtype argument that you can use to explicitly specify a datatype.

In [13]:
y0 = torch.tensor([1, 2], dtype=torch.float32)  # 32-bit float
y1 = torch.tensor([1, 2], dtype=torch.int32)    # 32-bit integer
print('32-bit float: ', y0.dtype)
print('32-bit integer: ', y1.dtype)

32-bit float:  torch.float32
32-bit integer:  torch.int32


We can cast a tensor to another datatype using the .to() method.

In [14]:
x0 = torch.ones(1, 2, dtype=torch.int16)
x1 = x0.to(torch.float32)
x2 = x0.to(torch.float64)
print('x0:', x0.dtype)
print('x1:', x1.dtype)
print('x2:', x2.dtype)

x0: torch.int16
x1: torch.float32
x2: torch.float64


# Tensor Reshaping

We can access a tensor’s shape (the length along each axis) by inspecting its shape property.

In [15]:
x = torch.arange(12)
x.shape

torch.Size([12])

numel function outputs the total number of elements in a tensor.

In [16]:
x.numel()

12

To change the shape of a tensor without altering either the number of elements or their values, we can invoke the reshape function.

In [17]:
X = x.reshape(3, 4)
X

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

New tensor contains the same values as a matrix of 3 rows & 4 columns.

Reshaping by manually specifying every dimension is unnecessary.
Tensors can automatically work out one dimension given the rest. 
We invoke this by placing -1 for the dimension that we would like tensors to automatically infer.
Instead of calling x.reshape(3, 4), we could have equivalently called x.reshape(-1, 4) or x.reshape(3, -1).

In [18]:
x.reshape(-1, 4)
X

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

In [19]:
x.reshape(3, -1)
X

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

Another common reshape operation you might want to perform is transposing a matrix. 
The reshape() function takes elements in row-major order, so you cannot transpose matrices with .reshape(). 

In [20]:
x = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(x)
print(x.reshape(3, 2))

tensor([[1, 2, 3],
        [4, 5, 6]])
tensor([[1, 2],
        [3, 4],
        [5, 6]])


The simplest function to swap axes of a tensor is .t(), specificially for transposing matrices.

In [21]:
print(x)
print(x.t())

tensor([[1, 2, 3],
        [4, 5, 6]])
tensor([[1, 4],
        [2, 5],
        [3, 6]])


For tensors with more than two dimensions, or the .permute() method to arbitrarily permute dimensions.

In [22]:
# Create a tensor of shape (2, 3, 4)
x0 = torch.tensor([
     [[1,  2,  3,  4],
      [5,  6,  7,  8],
      [9, 10, 11, 12]],
     [[13, 14, 15, 16],
      [17, 18, 19, 20],
      [21, 22, 23, 24]]])
print('shape:', x0.shape)
x1 = x0.permute(1, 2, 0)
print('shape:', x1.shape)

shape: torch.Size([2, 3, 4])
shape: torch.Size([3, 4, 2])


# Tensor Elementwise Operations

Some of the most useful operations are the elementwise operations. 
These apply a standard scalar operation to each element of an array. 

In [23]:
x = torch.tensor([1.0, 2, 4, 8])
torch.exp(x)

tensor([2.7183e+00, 7.3891e+00, 5.4598e+01, 2.9810e+03])

For functions that take two arrays as inputs, elementwise operations apply on each pair of corresponding elements from the two arrays.

In [24]:
x = torch.tensor([1.0, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])
x + y, x - y, x * y, x / y, x**y  # The ** operator is exponentiation

(tensor([ 3.,  4.,  6., 10.]),
 tensor([-1.,  0.,  2.,  6.]),
 tensor([ 2.,  4.,  8., 16.]),
 tensor([0.5000, 1.0000, 2.0000, 4.0000]),
 tensor([ 1.,  4., 16., 64.]))

We can construct a binary tensor via logical statements. 

If X and Y are equal at a position, the corresponding entry in the new tensor takes a value of 1; otherwise that position takes 0.

In [25]:
X = torch.arange(12, dtype=torch.float32).reshape((3, 4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

X == Y

tensor([[False,  True, False,  True],
        [False, False, False, False],
        [False, False, False, False]])

# Tensor Broadcasting

We performed elementwise operations on two tensors of the same shape.
We can perform elementwise operations even when shapes differ.
The broadcasting mechanism works in the following way: 
Expands one or both arrays by copying elements so that the two tensors have the same shape. 
Carries out the elementwise operations on the resulting arrays.

In most cases, we broadcast along an axis where an array has length 1.


In [26]:
a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
a, b

(tensor([[0],
         [1],
         [2]]), tensor([[0, 1]]))

Shapes of matrices a, b: 3 X 1 and 1 X 2 respectively do not match up.

For elementwise operations PyTorch broadcasts both matrices into a larger 3 X 2 matrix as follows:

In [27]:
a + b

tensor([[0, 1],
        [1, 2],
        [2, 3]])

For matrix a it replicates the columns and for matrix b it replicates the rows.

# Tensor Indexing and Slicing

Accessing elements works like Python array with first element at index 0.

We can use negative indices to access elements according to their position to the end of the list.

In [28]:
X[-1], X[1:3]

(tensor([ 8.,  9., 10., 11.]), tensor([[ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.]]))

[-1] selects the last element and [1:3] selects the second, third elements.

We can also write elements of a matrix by specifying indices.

In [29]:
X[1, 2] = 9
X

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  9.,  7.],
        [ 8.,  9., 10., 11.]])

We can assign multiple elements the same value using indexing.

In [30]:
X[0:2, :] = 12
X

tensor([[12., 12., 12., 12.],
        [12., 12., 12., 12.],
        [ 8.,  9., 10., 11.]])

[0:2,  :] accesses the first and second rows, where : takes all the elements along column axis.

# Tensor Concatenation

We can also stack multiple tensors together to form a larger tensor.

In [31]:
X = torch.arange(12, dtype=torch.float32).reshape((3, 4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
torch.cat((X, Y), dim=0), torch.cat((X, Y), dim=1)

(tensor([[ 0.,  1.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.],
         [ 2.,  1.,  4.,  3.],
         [ 1.,  2.,  3.,  4.],
         [ 4.,  3.,  2.,  1.]]),
 tensor([[ 0.,  1.,  2.,  3.,  2.,  1.,  4.,  3.],
         [ 4.,  5.,  6.,  7.,  1.,  2.,  3.,  4.],
         [ 8.,  9., 10., 11.,  4.,  3.,  2.,  1.]]))

# Tensor Reduction

We can calculate the sum of all elements of PyTorch tensors.

In [32]:
x = torch.arange(4, dtype=torch.float32)
x, x.sum()

(tensor([0., 1., 2., 3.]), tensor(6.))

We can express sums over the elements of tensors of arbitrary shape. 

In [33]:
A = torch.arange(20, dtype=torch.float32).reshape(5, 4)
A.shape, A.sum()

(torch.Size([5, 4]), tensor(190.))

We specify axis=0 to reduce the row dimension & sum up row elements.

In [34]:
A_sum_axis0 = A.sum(axis=0)
A_sum_axis0, A_sum_axis0.shape

(tensor([40., 45., 50., 55.]), torch.Size([4]))

We specify axis=1 to reduce the column dimension & sum up column elements. 

In [35]:
A_sum_axis1 = A.sum(axis=1)
A_sum_axis1, A_sum_axis1.shape

(tensor([ 6., 22., 38., 54., 70.]), torch.Size([5]))

Sometimes it can be useful to keep the number of axes unchanged with keepdims parameter.

In [36]:
sum_A = A.sum(axis=1, keepdims=True)
sum_A

tensor([[ 6.],
        [22.],
        [38.],
        [54.],
        [70.]])

# Tensor Matrix Operations

Most fundamental linear algebra operations is the dot product of two vectors.

Dot product is a sum over the products of the elements at the same position in the two vectors.


In [37]:
x = torch.arange(4, dtype=torch.float32)
y = torch.ones(4, dtype=torch.float32)
x, y, torch.dot(x, y)

(tensor([0., 1., 2., 3.]), tensor([1., 1., 1., 1.]), tensor(6.))

Dot products only works for vectors. it will give an error for tensors of dimensions > 1
Instead we use mm function for matrix-matrix products:

In [38]:
x = torch.tensor([[1,2],[3,4]], dtype=torch.float32)
y = torch.tensor([[5,6],[7,8]], dtype=torch.float32)
print(torch.mm(x, y))

tensor([[19., 22.],
        [43., 50.]])


To compute matrix-vector products we can use torch.mv; or we can use torch.matmul.

In [39]:
v = torch.tensor([9,10], dtype=torch.float32)
print(torch.mv(x, v))
print(torch.matmul(x, v))

tensor([29., 67.])
tensor([29., 67.])


# Tensor Norms

In linear algebra, a vector norm is a function that maps a vector to a scalar.
The norm of a vector ￼ tells us how large is the magnitude of its components. 
￼
L1 norm is the sum of the absolute values of the vector elements. 
L2￼ norm is the square root of the sum of the squares of the vector elements.

In deep learning, we often use squared ￼ norm to define our loss function.

In [40]:
u = torch.tensor([3.0, -4.0])
torch.norm(u)

tensor(5.)

We also frequently use the  norm as it is less influenced by outliers.

In [41]:
torch.abs(u).sum()

tensor(7.)

# Tensor Memory-management

Running operations can cause new memory to be allocated to host results.

In [42]:
before = id(Y)
Y = Y + X
id(Y) == before

False

Running Y = Y + X, we will find that id(Y) points to a different location.

In deep learning, we update millions of parameters multiple times per second.

* Unnecessary memory allocation all the time. 
* Can cause parts of our code to inadvertently reference stale parameters.

We can assign the result of an operation to a previously allocated array with slice notation.

In [43]:
before = id(X)
X[:] =  X + Y
id(X) == before

True

If the value of X is not reused in subsequent computations, in-place operations reduce the memory overhead.

# Tensor Conversion

Converting to a NumPy tensor, or vice versa.

In [44]:
A = X.numpy()
B = torch.tensor(A)
type(A), type(B)

(numpy.ndarray, torch.Tensor)

To convert a size-1 tensor to a Python scalar, we can invoke the item function or Python’s built-in functions.

In [45]:
a = torch.tensor([3.5])
a, a.item(), float(a), int(a)

(tensor([3.5000]), 3.5, 3.5, 3)