## Tensors

> Understanding tensors, the basics data structure in PyTorch 
>
> \- Deep Learning with PyTorch, by Eli Stevens, Luca Antiga and Thomas Viehmann -

In this notebook we will provide information on what tensors are and how you can use them to prepare your data. Tensors are an important part in using PyTorch to develop and train neural networks. A lot has been written on tensors and the PyTorch framework. We collected information from many sources and tried to summarize that information in this notebook while also providing relevant exercises to test your knowledge on the topic.

As this is meant only as a refresher on the topic, this is not a complete summary. So feel free to consult the mentioned sources or look up other sources if certain parts are not clear or you are interested in more background information.

**From Input to Tensor**

The world as floating-point numbers. To be able to create an artificial neural network that can learn from any form of input and come up with the wanted output, we will need to transform the input into a form that the neural network can digest. When using the PyTorch framework to build networks, this means we need to turn our input into floating-point numbers which we group together in so called *tensors*. 

![title](images/world_as_floating_points.png)

Image is taken from the book *Deep Learning with PyTorch*.

You could think of tensors like multidimensional arrays. The number of dimensions of a specific tensor is the number of indexes used to refer to the scalar value inside that tensor. Some important features of tensors are:

- The ability to be stored on graphics processing units (GPU) for parallel and fast computations.
- Operation distribution across multiple devices or machines.
- Keeping track of the graph of computations that created them (comes in handy with backpropagation). 

**Note:** The information and exercises below are taken from the following link: [Basics of PyTorch](https://deeplearning.neuromatch.io/tutorials/W1D1_BasicsAndPytorch/student/W1D1_Tutorial1.html#section-2-1-creating-tensors). You will find more information on PyTorch there but this notebook only focuses on tensors.

### Lets import PyTorch (and NumPy)

In [1]:
import torch
import numpy as np

### Creating Tensors

#### Construct tensors directly

You can create a tensor directly from some common python iterables, such as lists or tuples. Nested iterables can also be handled as long as the dimensions are compatible.

In [2]:
# tensor from a list
a = torch.tensor([0, 1, 2])

#tensor from a tuple of tuples
b = ((1.0, 1.1), (1.2, 1.3))
b = torch.tensor(b)

# tensor from a numpy array
c = np.ones([2, 3])
c = torch.tensor(c)

print(f"Tensor a: {a}")
print(f"Tensor b: {b}")
print(f"Tensor c: {c}")

Tensor a: tensor([0, 1, 2])
Tensor b: tensor([[1.0000, 1.1000],
        [1.2000, 1.3000]])
Tensor c: tensor([[1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)


#### Some common tensor constructors:

The numerical arguments you pass to these constructors determine the shape of the output tensor.

In [4]:
x = torch.ones(5, 3)
y = torch.zeros(2)
z = torch.empty(1, 1, 5)
print(f"Tensor x: {x}")
print(f"Tensor y: {y}")
print(f"Tensor z: {z}")

Tensor x: tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
Tensor y: tensor([0., 0.])
Tensor z: tensor([[[0., 0., 0., 0., 0.]]])


Notice that `.empty()` does not return zeros, but seemingly random numbers. Unlike `.zeros()`, which initialises the elements of the tensor with zeros, `.empty()` just allocates the memory. It is hence a bit faster if you are looking to just create a tensor.

#### Creating random tensors and tensor like other tensors:

You can also use constructors for random numbers.

In [5]:
# Uniform distribution
a = torch.rand(1, 3)

# Normal distribution
b = torch.randn(3, 4)

print(f"Tensor a: {a}")
print(f"Tensor b: {b}")

Tensor a: tensor([[0.5958, 0.4116, 0.7309]])
Tensor b: tensor([[-0.4440, -0.3617,  0.3509,  0.8056],
        [ 0.1305, -0.6926,  0.3647,  0.9627],
        [-0.7011, -0.6628,  0.7141,  0.3672]])


Or constructors that allow you to construct a tensor, just like the once earlier, but with dimensions equal to another tensor.

In [6]:
c = torch.zeros_like(a)
d = torch.rand_like(c)

print(f"Tensor c: {c}")
print(f"Tensor d: {d}")

Tensor c: tensor([[0., 0., 0.]])
Tensor d: tensor([[0.1130, 0.0768, 0.4136]])


#### Reproducibility

With PyTorch Random Number Generator (RNG), you can use the `torch.manual_seed()` to seed the RNG for all devices (both CPU and GPU):

In [7]:
import torch
torch.manual_seed(0)

<torch._C.Generator at 0x155119868d0>

for custom operators, you might need to set python seed as well:

In [8]:
import random
random.seed(0)

For random number generators in other libraries (e.g., NumPy):

In [9]:
import numpy as np
np.random.seed(0)

#### Numpy-like number ranges:

The `.arange()` and `.linspace()` behave how you would expect them to if you are familiar with NumPy.

In [10]:
a = torch.arange(0, 10, step=1)
b = np.arange(0, 10, step=1)

c = torch.linspace(0, 5, steps=11)
d = np.linspace(0, 5, num=11)

print(f"Tensor a: {a}\n")
print(f"Numpy array b: {b}\n")
print(f"Tensor c: {c}\n")
print(f"Numpy array d: {d}\n")

Tensor a: tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Numpy array b: [0 1 2 3 4 5 6 7 8 9]

Tensor c: tensor([0.0000, 0.5000, 1.0000, 1.5000, 2.0000, 2.5000, 3.0000, 3.5000, 4.0000,
        4.5000, 5.0000])

Numpy array d: [0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5 5. ]



#### Exercises

**Creating Tensors**

Below you will find some incomplete code. Fill in the missing code to construct the specified tensors.

We want the tensors:

1) 20 by 21 tensor consisting of ones
2) a tensor with elements equal to the elements of numpy array `Z`
3) a tensor with the same number of elements as `A` but with values `~U(0,1)`*
4) a 1D tensor containing the even numbers between 4 and 40 inclusive. 

*U($\alpha$,$\beta$) denotes the uniform distribution from $\alpha$ to $\beta$, with $\alpha$,$\beta$$\in$ $\mathbb{R}$

**You will find the answers at the end of this notebook.

In [18]:
def tensor_creation(Z):
  """
  A function that creates various tensors.

  Args:
    Z: numpy.ndarray
      An array of shape (3,4)

  Returns:
    A : Tensor
      20 by 21 tensor consisting of ones
    B : Tensor
      A tensor with elements equal to the elements of numpy array Z
    C : Tensor
      A tensor with the same number of elements as A but with values ∼U(0,1)
    D : Tensor
      A 1D tensor containing the even numbers between 4 and 40 inclusive.
  """
  #################################################
  ## TODO for students: fill in the missing code
  #################################################
  A = torch.ones(20, 21)
  B = torch.tensor(Z)
  C = torch.rand_like(B, dtype=torch.float64)
  D = torch.arange(4, 41, step=2)

  return A, B, C, D


# numpy array to copy later
Z = np.vander([1, 2, 3], 4)

# Uncomment below to check your function!
A, B, C, D = tensor_creation(Z)
print(f"Z = \n {Z}")
print(f"Shape Z = {Z.shape}")
print()
print(f"A = \n {A}")
print(f"Shape A = {A.shape}")
print()
print(f"B = \n {B}")
print()
print(f"C = \n {C}")
print(f"Shape C = {C.shape}")
print()
print(f"D = \n {D}")
print(f"Shape D = {D.shape}")

Z = 
 [[ 1  1  1  1]
 [ 8  4  2  1]
 [27  9  3  1]]
Shape Z = (3, 4)

A = 
 tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1.

### Operations in PyTorch

#### Tensor-Tensor Operations

You can perform operations on tensors using methods under torch.

In [None]:
a = torch.ones(5, 3)
b = torch.rand(5, 3)
c = torch.empty(5, 3)
d = torch.empty(5, 3)

# this only works if c and d already exist
torch.add(a, b, out=c)

# Pointwise Multiplication of a and b
torch.multiply(a, b, out=d)

print(c)
print(d)

However, in PyTorch, most common Python operators are overridden. The common standard arithmetic operators ($+$,$-$,$*$,$/$, and $**$) have all been lifted to elementwise operations.

In [19]:
x = torch.tensor([1, 2, 4, 8])
y = torch.tensor([1, 2, 3, 4])
x + y, x - y, x * y, x / y, x**y  # The `**` is the exponentiation operator

(tensor([ 2,  4,  7, 12]),
 tensor([0, 0, 1, 4]),
 tensor([ 1,  4, 12, 32]),
 tensor([1.0000, 1.0000, 1.3333, 2.0000]),
 tensor([   1,    4,   64, 4096]))

#### Tensor Methods

There are also a number of common arithmetic operations built in with tensors. You can find a full list of all methods by clicking on [this link](https://pytorch.org/docs/stable/tensors.html).

All of these operations will have a similar syntax to their NumPy equivalents.

In [20]:
x = torch.rand(3, 3)
print(x)
print("\n")
# sum() - note the axis is the axis you move across when summing
print(f"Sum of every element of x: {x.sum()}")
print(f"Sum of the columns of x: {x.sum(axis=0)}")
print(f"Sum of the rows of x: {x.sum(axis=1)}")
print("\n")

print(f"Mean value of all elements of x {x.mean()}")
print(f"Mean values of the columns of x {x.mean(axis=0)}")
print(f"Mean values of the rows of x {x.mean(axis=1)}")

tensor([[0.2783, 0.4820, 0.8198],
        [0.9971, 0.6984, 0.5675],
        [0.8352, 0.2056, 0.5932]])


Sum of every element of x: 5.477059841156006
Sum of the columns of x: tensor([2.1106, 1.3860, 1.9805])
Sum of the rows of x: tensor([1.5800, 2.2631, 1.6340])


Mean value of all elements of x 0.6085622310638428
Mean values of the columns of x tensor([0.7035, 0.4620, 0.6602])
Mean values of the rows of x tensor([0.5267, 0.7544, 0.5447])


#### Matrix Operations

The `@` symbol is overridden to represent matrix multiplication. You can also use `torch.matmul()` to multiply tensors. For dot multiplication, you can use `torch.dot()`, or manipulate the axes of your tensors and do matrix multiplication (we will cover that in the next section).

Transposes of 2D tensors are obtained using `torch.t()` or `Tensor.T`. Note the lack of brackets for `Tensor.T` - it is an attribute, not a method.

In [21]:
# matrix multiplication

# vector x vector
tensor1 = torch.randn(3)
tensor2 = torch.randn(3)
print(torch.matmul(tensor1, tensor2).size())
print((tensor1 @ tensor2).size())
print(torch.matmul(tensor1, tensor2))
print(tensor1 @ tensor2)

print()
# matrix x vector
tensor1 = torch.randn(3, 4)
tensor2 = torch.randn(4)
print(torch.matmul(tensor1, tensor2).size())
print((tensor1 @ tensor2).size())
print(torch.matmul(tensor1, tensor2))
print(tensor1 @ tensor2)

print()
# batched matrix x broadcasted vector
tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(4)
print(torch.matmul(tensor1, tensor2).size())
print((tensor1 @ tensor2).size())
print(torch.matmul(tensor1, tensor2))
print(tensor1 @ tensor2)

print()
# batched matrix x batched matrix
tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(10, 4, 5)
print(torch.matmul(tensor1, tensor2).size())
print((tensor1 @ tensor2).size())
print(torch.matmul(tensor1, tensor2))
print(tensor1 @ tensor2)

print()
# batched matrix x broadcasted matrix
tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(4, 5)
print(torch.matmul(tensor1, tensor2).size())
print((tensor1 @ tensor2).size())
print(torch.matmul(tensor1, tensor2))
print(tensor1 @ tensor2)

torch.Size([])
torch.Size([])
tensor(2.3173)
tensor(2.3173)

torch.Size([3])
torch.Size([3])
tensor([-2.5860,  1.1349,  0.3903])
tensor([-2.5860,  1.1349,  0.3903])

torch.Size([10, 3])
torch.Size([10, 3])
tensor([[-0.5891,  0.5291,  0.0727],
        [-0.3065, -0.4123, -0.1747],
        [ 1.0066,  0.3399,  1.0905],
        [ 0.1664, -0.4457, -0.3017],
        [-0.0118,  0.5034,  0.2226],
        [ 0.6001,  0.3434,  1.0768],
        [-0.3405, -0.3924,  0.9940],
        [-0.0932, -1.0798,  0.4816],
        [ 0.2131, -0.8353, -1.5588],
        [ 0.6071,  0.3234, -0.3470]])
tensor([[-0.5891,  0.5291,  0.0727],
        [-0.3065, -0.4123, -0.1747],
        [ 1.0066,  0.3399,  1.0905],
        [ 0.1664, -0.4457, -0.3017],
        [-0.0118,  0.5034,  0.2226],
        [ 0.6001,  0.3434,  1.0768],
        [-0.3405, -0.3924,  0.9940],
        [-0.0932, -1.0798,  0.4816],
        [ 0.2131, -0.8353, -1.5588],
        [ 0.6071,  0.3234, -0.3470]])

torch.Size([10, 3, 5])
torch.Size([10, 3, 5])
tenso

In [22]:
# dot product

torch.dot(torch.tensor([2, 3]), torch.tensor([2, 1]))

tensor(7)

In [23]:
# transpose

x = torch.randn(())
print(x)
print(torch.t(x))
print(x.T)

print()
x = torch.randn(3)
print(x)
print(torch.t(x))
print(x.T)

print()
x = torch.randn(2, 3)
print(x)
print(torch.t(x))
print(x.T)

tensor(0.5694)
tensor(0.5694)
tensor(0.5694)

tensor([-1.0221,  0.9506,  0.5398])
tensor([-1.0221,  0.9506,  0.5398])
tensor([-1.0221,  0.9506,  0.5398])

tensor([[-0.4088, -0.4296, -0.6978],
        [ 0.1943, -0.2045,  0.4200]])
tensor([[-0.4088,  0.1943],
        [-0.4296, -0.2045],
        [-0.6978,  0.4200]])
tensor([[-0.4088,  0.1943],
        [-0.4296, -0.2045],
        [-0.6978,  0.4200]])


  print(x.T)
  print(x.T)


#### Exercises

**Simple tensor operations**

Below are two expressions involving operations on matrices:

$$\mathbb{A}= \begin{bmatrix} 2 & 4 \\ 5 & 7\end{bmatrix}\begin{bmatrix} 1 & 1 \\ 2 & 3 \end{bmatrix} + \begin{bmatrix} 10 & 10 \\ 12 & 1\end{bmatrix}$$

$$\text{and}$$

$$\mathbb{b}= \begin{bmatrix} 3 \\ 5 \\ 7 \end{bmatrix} . \begin{bmatrix} 2 \\ 4 \\ 8 \end{bmatrix}$$

The code block below that computes these expressions using PyTorch is incomplete - fill in the missing lines.

**You will find the answers at the end of this notebook.

In [28]:
def simple_operations(a1: torch.Tensor, a2: torch.Tensor, a3: torch.Tensor):
  """
  Helper function to demonstrate simple operations
  i.e., Multiplication of tensor a1 with tensor a2 and then add it with tensor a3

  Args:
    a1: Torch tensor
      Tensor of size ([2,2])
    a2: Torch tensor
      Tensor of size ([2,2])
    a3: Torch tensor
      Tensor of size ([2,2])

  Returns:
    answer: Torch tensor
      Tensor of size ([2,2]) resulting from a1 multiplied with a2, added with a3
  """
  ################################################
  ## TODO for students:  complete the first computation using the argument matricies
  ################################################
  #
  answer = a1 @ a2 + a3
  return answer

# Computing expression 1:

# init our tensors
a1 = torch.tensor([[2, 4], [5, 7]])
a2 = torch.tensor([[1, 1], [2, 3]])
a3 = torch.tensor([[10, 10], [12, 1]])
## uncomment to test your function
A = simple_operations(a1, a2, a3)
print(A)
assert torch.equal(A,torch.tensor([[20, 24],[31, 27]]))

tensor([[20, 24],
        [31, 27]])


In [30]:
def dot_product(b1: torch.Tensor, b2: torch.Tensor):
  ###############################################
  ## TODO for students:  complete the first computation using the argument matricies
  ###############################################
  """
  Helper function to demonstrate dot product operation
  Dot product is an algebraic operation that takes two equal-length sequences
  (usually coordinate vectors), and returns a single number.
  Geometrically, it is the product of the Euclidean magnitudes of the
  two vectors and the cosine of the angle between them.

  Args:
    b1: Torch tensor
      Tensor of size ([3])
    b2: Torch tensor
      Tensor of size ([3])

  Returns:
    product: Tensor
      Tensor of size ([1]) resulting from b1 scalar multiplied with b2
  """
  # Use torch.dot() to compute the dot product of two tensors
  product = b1.dot(b2)
  return product

# Computing expression 2:
b1 = torch.tensor([3, 5, 7])
b2 = torch.tensor([2, 4, 8])
## Uncomment to test your function
b = dot_product(b1, b2)
print(b)
assert torch.equal(b, torch.tensor(82))

tensor(82)


### Manipulating Tensors in PyTorch

#### Tensor Indexing

You can access elements in a tensor by index. Like with any numpy array, the first element has index 0 and ranges are specified to include the first to second to last (n-1). You can access elements according to their relative position to the end of the list by using negative indices. Another name for indexing is slicing.

For example, `[-1]` picks the last element from a tensor; `[1:3]` selects the second and the third elements and `[:-2]` will select all elements excluding the last and second-to-last elements.

In [31]:
x = torch.arange(0, 10)
print(x)
print(x[-1])
print(x[1:3])
print(x[:-2])

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
tensor(9)
tensor([1, 2])
tensor([0, 1, 2, 3, 4, 5, 6, 7])


When you have multidimensional tensors, indexing rules work the same way as in NumPy.

In [32]:
# make a 5D tensor
x = torch.rand(1, 2, 3, 4, 5)

print(f" shape of x[0]:{x[0].shape}")
print(f" shape of x[0][0]:{x[0][0].shape}")
print(f" shape of x[0][0][0]:{x[0][0][0].shape}")

 shape of x[0]:torch.Size([2, 3, 4, 5])
 shape of x[0][0]:torch.Size([3, 4, 5])
 shape of x[0][0][0]:torch.Size([4, 5])


#### Flatten and Reshape

To reshape tensors, you can make use of various methods. It is common to have to express 2D data in 1D format or vice versa. You can achieve this with the `.flatten()` and `.reshape()` methods.

In [33]:
z = torch.arange(12).reshape(6, 2)
print(f"Original z: \n {z}")

# 2D -> 1D
z = z.flatten()
print(f"Flattened z: \n {z}")

# and back to 2D
z = z.reshape(3, 4)
print(f"Reshaped (3x4) z: \n {z}")

z = z.flatten()
print(f"Flattened z: \n {z}")

z = z.view(6, 2)
print(f"Using view to reshape to (6x2) z: \n {z}")

Original z: 
 tensor([[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11]])
Flattened z: 
 tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
Reshaped (3x4) z: 
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
Flattened z: 
 tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
Using view to reshape to (6x2) z: 
 tensor([[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11]])


You will also see the `.view()` methods used a lot to reshape tensors. There is a subtle [difference](https://stackoverflow.com/questions/49643225/whats-the-difference-between-reshape-and-view-in-pytorch) between `.view()` and `.reshape()`.

#### Squeezing Tensors

When processing batches of data, you will quite often be left with singleton dimensions. E.g., `[1,10]` or `[256, 1, 3]`. This dimension can quite easily mess up your matrix operations if you don’t plan on it being there.

In order to compress tensors along their singleton dimensions we can use the `.squeeze()` method. We can use the `.unsqueeze()` method to do the opposite.

In [34]:
x = torch.randn(1, 10)
# printing the zeroth element of the tensor will not give us the first number!

print(x.shape)
print(f"x[0]: {x[0]}")

torch.Size([1, 10])
x[0]: tensor([ 0.5369, -0.8644,  1.0802, -0.2326,  1.0315,  0.7151,  1.0977,  0.0353,
        -1.5014,  1.3895])


Because of the singleton dimension, `x[0]` gives us the first row instead.

In [35]:
# Let's get rid of that singleton dimension and see what happens now
x = x.squeeze(0)
print(x.shape)
print(f"x[0]: {x[0]}")

torch.Size([10])
x[0]: 0.5368556380271912


In [36]:
# Adding singleton dimensions works a similar way, and is often used when tensors
# being added need same number of dimensions

y = torch.randn(5, 5)
print(f"Shape of y: {y.shape}")

# lets insert a singleton dimension
y = y.unsqueeze(1)
print(f"Shape of y: {y.shape}")

Shape of y: torch.Size([5, 5])
Shape of y: torch.Size([5, 1, 5])


#### Permutations

Sometimes your dimensions will be in the wrong order. For example, you may be dealing with RGB images with dim `[3×48×64]`, but your pipeline expects the colour dimension to be the last dimension, i.e., `[48×64×3]`. To get around this you can use the `.permute()` method.

In [37]:
# `x` has dimensions [color,image_height,image_width]
x = torch.rand(3, 48, 64)

# We want to permute our tensor to be [ image_height , image_width , color ]
x = x.permute(1, 2, 0)
# permute(1,2,0) means:
# The 0th dim of my new tensor = the 1st dim of my old tensor
# The 1st dim of my new tensor = the 2nd
# The 2nd dim of my new tensor = the 0th
print(x.shape)

torch.Size([48, 64, 3])


You may also see `.transpose()` used. This works in a similar way as permute, but can only swap two dimensions at once.

#### Concatenation

In this example, two matrices are concatenated along rows (axis 0, the first element of the shape) vs. columns (axis 1, the second element of the shape). You can see that the first output tensor’s axis-0 length (`6`) is the sum of the two input tensors’ axis-0 lengths (`3+3`); while the second output tensor’s axis-1 length (`8`) is the sum of the two input tensors’ axis-1 lengths (`4+4`).

In [38]:
# Create two tensors of the same shape
x = torch.arange(12, dtype=torch.float32).reshape((3, 4))
y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])


# Concatenate along rows
cat_rows = torch.cat((x, y), dim=0)

# Concatenate along columns
cat_cols = torch.cat((x, y), dim=1)

# Printing outputs
print('Concatenated by rows: shape{} \n {}'.format(list(cat_rows.shape), cat_rows))
print('\n Concatenated by colums: shape{}  \n {}'.format(list(cat_cols.shape), cat_cols))

Concatenated by rows: shape[6, 4] 
 tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [ 2.,  1.,  4.,  3.],
        [ 1.,  2.,  3.,  4.],
        [ 4.,  3.,  2.,  1.]])

 Concatenated by colums: shape[3, 8]  
 tensor([[ 0.,  1.,  2.,  3.,  2.,  1.,  4.,  3.],
        [ 4.,  5.,  6.,  7.,  1.,  2.,  3.,  4.],
        [ 8.,  9., 10., 11.,  4.,  3.,  2.,  1.]])


#### Conversion to Other Python Objects

Converting a tensor to a numpy.ndarray, or vice versa, is easy, and the converted result does not share memory. This minor inconvenience is quite important: when you perform operations on the CPU or GPUs, you do not want to halt computation, waiting to see whether the NumPy package of Python might want to be doing something else with the same chunk of memory.

When converting to a NumPy array, the information being tracked by the tensor will be lost, i.e., the computational graph.

In [39]:
x = torch.randn(5)
print(f"x: {x}  |  x type:  {x.type()}")

y = x.numpy()
print(f"y: {y}  |  y type:  {type(y)}")

z = torch.tensor(y)
print(f"z: {z}  |  z type:  {z.type()}")

x: tensor([-0.6709,  0.8899,  1.3470, -1.7338,  0.2982])  |  x type:  torch.FloatTensor
y: [-0.67085654  0.8898792   1.3470085  -1.7338079   0.29823947]  |  y type:  <class 'numpy.ndarray'>
z: tensor([-0.6709,  0.8899,  1.3470, -1.7338,  0.2982])  |  z type:  torch.FloatTensor


To convert a size-1 tensor to a Python scalar, you can invoke the item function or Python’s built-in functions.

In [40]:
a = torch.tensor([3.5])
a, a.item(), float(a), int(a)

(tensor([3.5000]), 3.5, 3.5, 3)

#### Exercises

**Simple tensor operations**

Using a combination of the methods discussed above, complete the functions below:

**Function A**

This function takes in two 2D tensors $A$ and $B$ and returns the column sum of $A$ multiplied by the sum of all elements of $B$, i.e., a scalar, e.g.,

$$\text{If } A = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} \text{and } B = \begin{bmatrix} 1 & 2 & 3 \\ 1 & 2 & 3 \end{bmatrix} \text{then } Out = \begin{bmatrix} 2 & 2 \end{bmatrix} . 12 = \begin{bmatrix} 24 & 24 \end{bmatrix}$$

**Function B**

This functions takes in a square matric $C$ and returns a 2D tensor consisting of a flattened $C$ with the index of each element appended to this tensor in the row dimension, e.g.,

$$\text{If } C = \begin{bmatrix} 2 & 3 \\ -1 & 10 \end{bmatrix} \text{then } Out = \begin{bmatrix} 0 & 2 \\ 1 & 3 \\ 2 & -1 \\ 3 & 10 \end{bmatrix}$$

**Hint:** Pay close attention to singleton dimensions.

**Function C**

This function takes in two 2D tensors $D$ and $E$. If the dimensions allow it, this function returns the elementwise sum of D-shaped $E$, and $D$; else this function returns a 1D tensor that is the concatenation of the two tensors, e.g.,

$$\text{If } D = \begin{bmatrix} 1 & -1 \\ -1 & 3 \end{bmatrix} \text{and } E = \begin{bmatrix} 2 & 3 & 0 & 2 \end{bmatrix} \text{then } Out = \begin{bmatrix} 3 & 2 \\ -1 & 5 \end{bmatrix}$$

$$\text{If } D = \begin{bmatrix} 1 & -1 \\ -1 & 3 \end{bmatrix} \text{and } E = \begin{bmatrix} 2 & 3 & 0 \end{bmatrix} \text{then } Out = \begin{bmatrix} 1 & -1 & -1 & 3 & 2 & 3 & 0 \end{bmatrix}$$

**Hint:** `torch.numel()` is an easy way of finding the number of elements in a tensor.

**You will find the answers at the end of this notebook.

In [67]:
def functionA(my_tensor1, my_tensor2):
  """
  This function takes in two 2D tensors `my_tensor1` and `my_tensor2`
  and returns the column sum of
  `my_tensor1` multiplied by the sum of all the elmements of `my_tensor2`,
  i.e., a scalar.

  Args:
    my_tensor1: torch.Tensor
    my_tensor2: torch.Tensor

  Retuns:
    output: torch.Tensor
      The multiplication of the column sum of `my_tensor1` by the sum of
      `my_tensor2`.
  """
  ################################################
  ## TODO for students: complete functionA
  ################################################
  # TODO multiplication the sum of the tensors
  output = my_tensor1.sum(axis=1) * my_tensor2.sum()

  return output


def functionB(my_tensor):
  """
  This function takes in a square matrix `my_tensor` and returns a 2D tensor
  consisting of a flattened `my_tensor` with the index of each element
  appended to this tensor in the row dimension.

  Args:
    my_tensor: torch.Tensor

  Returns:
    output: torch.Tensor
      Concatenated tensor.
  """
  ################################################
  ## TODO for students: complete functionB
  ################################################
  # TODO flatten the tensor `my_tensor`
  my_tensor = my_tensor.flatten()
  # TODO create the idx tensor to be concatenated to `my_tensor`
  idx_tensor = torch.arange(len(my_tensor))
  # TODO concatenate the two tensors
  output = torch.cat((idx_tensor.unsqueeze(dim=1), my_tensor.unsqueeze(dim=1)), 1)

  return output


def functionC(my_tensor1, my_tensor2):
  """
  This function takes in two 2D tensors `my_tensor1` and `my_tensor2`.
  If the dimensions allow it, it returns the
  elementwise sum of `my_tensor1`-shaped `my_tensor2`, and `my_tensor2`;
  else this function returns a 1D tensor that is the concatenation of the
  two tensors.

  Args:
    my_tensor1: torch.Tensor
    my_tensor2: torch.Tensor

  Returns:
    output: torch.Tensor
      Concatenated tensor.
  """
  ################################################
  ## TODO for students: complete functionC
  ################################################
  # TODO check we can reshape `my_tensor2` into the shape of `my_tensor1`
  if len(my_tensor1.flatten()) == len(my_tensor2.flatten()):
    # TODO reshape `my_tensor2` into the shape of `my_tensor1`
    my_tensor2 = torch.reshape(my_tensor2, my_tensor1.shape)
    # TODO sum the two tensors
    output = my_tensor1 + my_tensor2
  else:
    # TODO flatten both tensors
    my_tensor1 = my_tensor1.flatten()
    my_tensor2 = my_tensor2.flatten()
    # TODO concatenate the two tensors in the correct dimension
    output = torch.cat((my_tensor1, my_tensor2), axis=0)

  return output


# Implement the functions above and then uncomment the following lines to test your code
print(functionA(torch.tensor([[1, 1], [1, 1]]), torch.tensor([[1, 2, 3], [1, 2, 3]])))
print(functionB(torch.tensor([[2, 3], [-1, 10]])))
print(functionC(torch.tensor([[1, -1], [-1, 3]]), torch.tensor([[2, 3, 0, 2]])))
print(functionC(torch.tensor([[1, -1], [-1, 3]]), torch.tensor([[2, 3, 0]])))

assert torch.equal(functionA(torch.tensor([[1, 1], [1, 1]]), torch.tensor([[1, 2, 3], [1, 2, 3]])), torch.tensor([24, 24]))
assert torch.equal(functionB(torch.tensor([[2, 3], [-1, 10]])), torch.tensor([[0, 2], [1, 3], [2, -1], [3, 10]]))
assert torch.equal(functionC(torch.tensor([[1, -1], [-1, 3]]), torch.tensor([[2, 3, 0, 2]])), torch.tensor([[3, 2], [-1, 5]]))
assert torch.equal(functionC(torch.tensor([[1, -1], [-1, 3]]), torch.tensor([[2, 3, 0]])), torch.tensor([1, -1, -1, 3, 2, 3, 0]))

tensor([24, 24])
tensor([[ 0,  2],
        [ 1,  3],
        [ 2, -1],
        [ 3, 10]])
tensor([[ 3,  2],
        [-1,  5]])
tensor([ 1, -1, -1,  3,  2,  3,  0])


### Dynamic Computation Graph and Backpropagation

The information on backpropagation and the dynamic computation graph can be found in the notebook [Introduction to PyTorch](https://colab.research.google.com/github/phlippe/uvadlc_notebooks/blob/master/docs/tutorial_notebooks/tutorial2/Introduction_to_PyTorch.ipynb). There you will also find additional information on Tensors similar to the information we already shared in this notebook. 

As we already shared, one of the features of tensors is there ability to keep track of the graph of computations that created them (which comes in handy with backpropagation). With PyTorch tensors you can automatically get **gradients/derivatives** of functions that you define. In this course we will mainly implement neural networks, which are just fancy functions. 

In those functions, the weight matrices we want to learn are called **parameters** or simply the **weights**. If a neural network outputs a single scalar value, we talk about taking the derivative of that value. However, often times we have to deal with **multiple** output values, which means that we talk about **gradients**.

We define our function by **manipulating** a given input, usually bu matrix-multiplications with weight matrices and additions with so-called bias vectors. A **computation graph** is automatically created when manipulating our input. This graph tells us how to arrive at our output from our input. PyTorch is a **define-by-run** framework, which means that we can just do our manipulations and PyTorch will keep track of the computation graph for us (dynamically).

In short: we only need to compute the output using our function and we can ask PyTorch to automatically get the **gradients**. 

The first thing we have to do is to specify which tensors require gradients. The default setting is that a newly created tensor does not require gradients.

In [68]:
x = torch.ones((3,))
print(x.requires_grad)

False


You can change this for an existing tensor using the function `requires_grad_()` (the underscore indicates that this is a in-place operation). Alternatively, when creating a tensor, you can pass the argument `requires_grad=True` to most initializers shown above.

In [69]:
x.requires_grad_(True)
print(x.requires_grad)

True


In order to get familiar with the concept of a computation graph, we will create one for the following function:

$$y = \frac{1}{|x|}\sum_i \left[(x_i + 2)^2 + 3\right]$$

You could imagine that $x$ are our parameters, and we want to optimize (either maximize or minimize) the output $y$. For this, we want to obtain the gradients $\partial y / \partial \mathbf{x}$. For our example, we'll use $\mathbf{x}=[0,1,2]$ as our input.

In [70]:
x = torch.arange(3, dtype=torch.float32, requires_grad=True) # Only float tensors can have gradients
print("X", x)

X tensor([0., 1., 2.], requires_grad=True)


Now let's build the computation graph step by step. You can combine multiple operations in a single line, but we will separate them here to get a better understanding of how each operation is added to the computation graph.

In [72]:
a = x + 2
b = a ** 2
c = b + 3
y = c.mean()
print("Y", y)

Y tensor(12.6667, grad_fn=<MeanBackward0>)
<MeanBackward0 object at 0x00000155150EFD30>


Using the statements above, we have created a computation graph that looks similar to the figure below:

<center style="width: 100%"><img src="images/pytorch_computation_graph.svg" width="200px"></center>

We calculate $a$ based on the inputs $x$ and the constant $2$, $b$ is $a$ squared, and so on. The visualization is an abstraction of the dependencies between inputs and outputs of the operations we have applied.
Each node of the computation graph has automatically defined a function for calculating the gradients with respect to its inputs, `grad_fn`. You can see this when we printed the output tensor $y$. This is why the computation graph is usually visualized in the reverse direction (arrows point from the result to the inputs). We can perform backpropagation on the computation graph by calling the function `backward()` on the last output, which effectively calculates the gradients for each tensor that has the property `requires_grad=True`:

In [73]:
y.backward()

`x.grad` will now contain the gradient $\partial y/ \partial \mathcal{x}$, and this gradient indicates how a change in $\mathbf{x}$ will affect output $y$ given the current input $\mathbf{x}=[0,1,2]$:

In [74]:
print(x.grad)

tensor([1.3333, 2.0000, 2.6667])


We can also verify these gradients by hand. We will calculate the gradients using the chain rule, in the same way as PyTorch did it:

$$\frac{\partial y}{\partial x_i} = \frac{\partial y}{\partial c_i}\frac{\partial c_i}{\partial b_i}\frac{\partial b_i}{\partial a_i}\frac{\partial a_i}{\partial x_i}$$

Note that we have simplified this equation to index notation, and by using the fact that all operation besides the mean do not combine the elements in the tensor. The partial derivatives are:

$$
\frac{\partial a_i}{\partial x_i} = 1,\hspace{1cm}
\frac{\partial b_i}{\partial a_i} = 2\cdot a_i\hspace{1cm}
\frac{\partial c_i}{\partial b_i} = 1\hspace{1cm}
\frac{\partial y}{\partial c_i} = \frac{1}{3}
$$

Hence, with the input being $\mathbf{x}=[0,1,2]$, our gradients are $\partial y/\partial \mathbf{x}=[4/3,2,8/3]$. The previous code cell should have printed the same result.

**Note:** No Exercises for this section.

### Answers

#### Exercise Creating Tensors

In [None]:
def tensor_creation(Z):
  """
  A function that creates various tensors.
  Args:
    Z: numpy.ndarray
      An array of shape (3,4)
  Returns:
    A : Tensor
      20 by 21 tensor consisting of ones
    B : Tensor
      A tensor with elements equal to the elements of numpy array  Z
    C : Tensor
      A tensor with the same number of elements as A but with values ∼U(0,1)
    D : Tensor
      A 1D tensor containing the even numbers between 4 and 40 inclusive.
  """

  A = torch.ones(20, 21)
  B = torch.tensor(Z)
  C = torch.rand_like(A)
  D = torch.arange(4, 41, step=2)

  return A, B, C, D

# numpy array to copy later
Z = np.vander([1, 2, 3], 4)

# Uncomment below to check your function!
A, B, C, D = tensor_creation(Z)
print(f"Z = \n {Z}")
print(f"Shape Z = {Z.shape}")
print()
print(f"A = \n {A}")
print(f"Shape A = {A.shape}")
print()
print(f"B = \n {B}")
print()
print(f"C = \n {C}")
print(f"Shape C = {C.shape}")
print()
print(f"D = \n {D}")
print(f"Shape D = {D.shape}")

#### Exercises Operations in PyTorch

In [None]:
def simple_operations(a1: torch.Tensor, a2: torch.Tensor, a3: torch.Tensor):
  """
  Helper function to demonstrate simple operations
  i.e., Multiplication of tensor a1 with tensor a2 and then add it with tensor a3
  Args:
    a1: Torch tensor
      Tensor of size ([2,2])
    a2: Torch tensor
      Tensor of size ([2,2])
    a3: Torch tensor
      Tensor of size ([2,2])
  Returns:
    answer: Torch tensor
      Tensor of size ([2,2]) resulting from a1 multiplied with a2, added with a3
  """
  answer = a1 @ a2 + a3
  return answer

# Computing expression 1:

# init our tensors
a1 = torch.tensor([[2, 4], [5, 7]])
a2 = torch.tensor([[1, 1], [2, 3]])
a3 = torch.tensor([[10, 10], [12, 1]])
## uncomment to test your function
A = simple_operations(a1, a2, a3)
print(A)
assert torch.equal(A,torch.tensor([[20, 24],[31, 27]]))

In [None]:
def dot_product(b1: torch.Tensor, b2: torch.Tensor):
  """
  Helper function to demonstrate dot product operation
  Dot product is an algebraic operation that takes two equal-length sequences
  (usually coordinate vectors), and returns a single number.
  Geometrically, it is the product of the Euclidean magnitudes of the
  two vectors and the cosine of the angle between them.
  Args:
    b1: Torch tensor
      Tensor of size ([3])
    b2: Torch tensor
      Tensor of size ([3])
  Returns:
    product: Tensor
      Tensor of size ([1]) resulting from b1 scalar multiplied with b2
  """
  # Use torch.dot() to compute the dot product of two tensors
  product = torch.dot(b1, b2)
  return product

# Computing expression 2:
b1 = torch.tensor([3, 5, 7])
b2 = torch.tensor([2, 4, 8])
## Uncomment to test your function
b = dot_product(b1, b2)
print(b)
assert torch.equal(b, torch.tensor(82))

#### Exercises Manipulating Tensors in PyTorch

In [None]:
def functionA(my_tensor1, my_tensor2):
  """
  This function takes in two 2D tensors `my_tensor1` and `my_tensor2`
  and returns the column sum of
  `my_tensor1` multiplied by the sum of all the elmements of `my_tensor2`,
  i.e., a scalar.
  Args:
    my_tensor1: torch.Tensor
    my_tensor2: torch.Tensor
  Returns:
    output: torch.Tensor
      The multiplication of the column sum of `my_tensor1` by the sum of
      `my_tensor2`.
  """
  # TODO multiplication the sum of the tensors
  output = my_tensor1.sum(axis=0) * my_tensor2.sum()

  return output


def functionB(my_tensor):
  """
  This function takes in a square matrix `my_tensor` and returns a 2D tensor
  consisting of a flattened `my_tensor` with the index of each element
  appended to this tensor in the row dimension.
  Args:
    my_tensor: torch.Tensor
  Returns:
    output: torch.Tensor
      Concatenated tensor.
  """
  # TODO flatten the tensor `my_tensor`
  my_tensor = my_tensor.flatten()
  # TODO create the idx tensor to be concatenated to `my_tensor`
  idx_tensor = torch.arange(0, len(my_tensor))
  # TODO concatenate the two tensors
  output = torch.cat([idx_tensor.unsqueeze(1), my_tensor.unsqueeze(1)], axis=1)

  return output


def functionC(my_tensor1, my_tensor2):
  """
  This function takes in two 2D tensors `my_tensor1` and `my_tensor2`.
  If the dimensions allow it, it returns the
  elementwise sum of `my_tensor1`-shaped `my_tensor2`, and `my_tensor2`;
  else this function returns a 1D tensor that is the concatenation of the
  two tensors.
  Args:
    my_tensor1: torch.Tensor
    my_tensor2: torch.Tensor
  Returns:
    output: torch.Tensor
      Concatenated tensor.
  """
  # TODO check we can reshape `my_tensor2` into the shape of `my_tensor1`
  if torch.numel(my_tensor1) == torch.numel(my_tensor2):
    # TODO reshape `my_tensor2` into the shape of `my_tensor1`
    my_tensor2 = my_tensor2.reshape(my_tensor1.shape)
    # TODO sum the two tensors
    output = my_tensor1 + my_tensor2
  else:
    # TODO flatten both tensors
    my_tensor1 = my_tensor1.reshape(1, -1)
    my_tensor2 = my_tensor2.reshape(1, -1)
    # TODO concatenate the two tensors in the correct dimension
    output = torch.cat([my_tensor1, my_tensor2], axis=1).squeeze()

  return output

## Implement the functions above and then uncomment the following lines to test your code
print(functionA(torch.tensor([[1, 1], [1, 1]]), torch.tensor([[1, 2, 3], [1, 2, 3]])))
print(functionB(torch.tensor([[2, 3], [-1, 10]])))
print(functionC(torch.tensor([[1, -1], [-1, 3]]), torch.tensor([[2, 3, 0, 2]])))
print(functionC(torch.tensor([[1, -1], [-1, 3]]), torch.tensor([[2, 3, 0]])))

assert torch.equal(functionA(torch.tensor([[1, 1], [1, 1]]), torch.tensor([[1, 2, 3], [1, 2, 3]])), torch.tensor([24, 24]))
assert torch.equal(functionB(torch.tensor([[2, 3], [-1, 10]])), torch.tensor([[0, 2], [1, 3], [2, -1], [3, 10]]))
assert torch.equal(functionC(torch.tensor([[1, -1], [-1, 3]]), torch.tensor([[2, 3, 0, 2]])), torch.tensor([[3, 2], [-1, 5]]))
assert torch.equal(functionC(torch.tensor([[1, -1], [-1, 3]]), torch.tensor([[2, 3, 0]])), torch.tensor([1, -1, -1, 3, 2, 3, 0]))