# A World of Tensors and Differentiable Computing
Created by [Santiago Pascual](https://scholar.google.es/citations?user=7cVOyh0AAAAJ&hl=ca) ([UPC School](https://www.talent.upc.edu/ing/estudis/formacio/curs/310400/postgrau-artificial-intelligence-deep-learning/) 2019).

Updated by [Gerard I. Gállego](https://www.linkedin.com/in/gerard-gallego/) and [José A. R. Fonollosa](https://www.kaggle.com/jarfo1)

In these lab exercises we are going to see:

1. What are tensors, concretely in the PyTorch framework.
2. How to operate with them, and typical operations for deep learning modeling.
3. Broadcasting Semantics. Added by [Jose A. R. Fonollosa](https://www.kaggle.com/jarfo1)

## What is a Tensor?


A Tensor is the generalization of a vector into k dimensions.

![](https://miro.medium.com/max/644/1*SGqhI_WpSaEr17wo8ycUhg.png)

Table taken from [1].

Because of this, a tensor is any k-dimensional structure, including matrices, vectors and scalars. PyTorch is a deep learning framework (https://pytorch.org) widely used for both research and production. As in any other deep learning framework, its core data structure is the tensor.

In [1]:
# We first import PyTorch and Numpy libraries as fundamental tools to work with arrays and tensors
import torch
import numpy
# initialize a random seed such that every execution will raise same random sequences of results
torch.manual_seed(1)

<torch._C.Generator at 0x7a997a7c7b30>

#### Creating tensors with PyTorch

In [2]:
# We can initialize an empty structure with certain dimensions:
a = torch.empty(5, 7)
# and we can check its dimensionality with the .shape attribute or .size() function
print(a.shape)
print(a.size())

torch.Size([5, 7])
torch.Size([5, 7])


Dimensions in PyTorch tensors are indexed from 0 onwards, so the first axis of size 5 is the *dim=0*.

In [3]:
# YAYY we have created a tensor of size 5x7, but what does it contain?
print(a)
# Rubbish, nonsense, random stuff, it could be zero, it could be nan it could be whatever

tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  1.4013e-45,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  1.1351e-43,  0.0000e+00, -1.7193e-31,
          4.3980e-41, -1.7193e-31]])


In [4]:
# We can easily fill the tensor with some fixed value with the .fill_(val) function
a.fill_(10)
print(a)

tensor([[10., 10., 10., 10., 10., 10., 10.],
        [10., 10., 10., 10., 10., 10., 10.],
        [10., 10., 10., 10., 10., 10., 10.],
        [10., 10., 10., 10., 10., 10., 10.],
        [10., 10., 10., 10., 10., 10., 10.]])


In [5]:
# Of course we could go k-dimensional, we just have to put more numbers in the init function
a = torch.empty(2,4,6,8)
print(a.shape)

torch.Size([2, 4, 6, 8])


#### There are functions in PyTorch to initialize some special tensors:

* **torch.randn** samples from a Gaussian distribution (mean=0, std=1)
* **torch.rand** samples from a uniform distribution [0, 1)
* **torch.ones** creates a tensor with 1s
* **torch.zeros** creates a tensor with 0s

### Exercise 1

Create a tensor *z* drawn from a Gaussian distribution of dimensions (16, 1024)

In [7]:
# TODO: Gaussian tensor 16x1024
z = torch.randn(16,1024)

#### Tensors data type

Importantly, tensors have a data type (like numeric variables are int, float, double, etc.). We can check the type
with the **tensor.dtype** attribute. We can also change the dtype of our tensor with a very simple cast following the data type name in the form of a function: **tensor.float()**, **tensor.int()**, **tensor.long()**, etc.

In [8]:
a = torch.ones(5)
print(a.dtype)

# change to float64 (aka. double)
print(a.double().dtype)

# change to float16 (aka. half)
print(a.half().dtype)

# change to int16 (aka. short)
print(a.short().dtype)

# change to int64 (aka. long)
print(a.long().dtype)


torch.float32
torch.float64
torch.float16
torch.int16
torch.int64


And the way to create a tensor with a specific data type at initialization is either by specifying the **dtype=torch.<dtype\>** during the tensor initialization, or using an explicit tensor constructor like **torch.FloatTensor()**, **torch.LongTensor()**, etc.

In [9]:
# Initialize a tensor with type short()
a = torch.empty(5, 7, dtype=torch.short)
print(a.dtype)

# Directly create a short tensor
a = torch.ShortTensor(5, 7)
print(a.dtype)

# Remember: there should be rubbish in these results, we just explicited a data type, not any value yet! (hence random memory is depicted)

torch.int16
torch.int16


#### Tips about tensor data types applied to deep learning:

Keep in mind the following relations, they might be very useful for your future selves!

* Float32 --> Data type for the neural network parameters and GPU operations!
* Long (Int64) --> Data type for text inputs (e.g. indexes of words in a dictionary)
* Float16 (Half) --> Data type for currently fastest GPU operations (with less precision) on advanced GPU implementations.

Remember: only a sticknote for your future selves, in case you have to deal with any of the above mentioned things (embeddings, fastest GPU stuff, etc.).

**TO SEE THE FULL SET OF PYTORCH TENSOR DATA TYPES, CHECK THE DOCUMENTATION AT https://pytorch.org/docs/stable/tensors.html**

#### Bringing the tensors from Python and Numpy

You may be familiarized with Numpy and Python lists. The former one is a MUST to do any scientific programming in Python, so if you need a refresh it is recommended to have a quick review: https://becominghuman.ai/an-essential-guide-to-numpy-for-machine-learning-in-python-5615e1758301 . The latter, lists, are the inherent mechanism of Python to create a sorted structure of elements (like a k-dimensional array, as we can embed lists in lists and so on).

PyTorch is very well integrated with Numpy (actually PyTorch is supposed to be an enhanced Numpy, with algebraic operations also running on GPU!) and Python. We can hence convert our Numpy and lists into PyTorch tensors VERY EASILY!

In [10]:
# Creating a 1-D tensor from the Numpy array [1, 2, 3]
a = torch.tensor(numpy.array([1, 2, 3]))

# Creating a 1-D tensor from the Python list [1, 2, 3]
a = torch.tensor([1, 2, 3])
# Values 1, 2, 3
print('Tensor a values: ', a)
# 1 dimension of size 3
print('Tensor a shape: ', a.shape)

Tensor a values:  tensor([1, 2, 3])
Tensor a shape:  torch.Size([3])


In [11]:
# k-dimensional arrays are also turned into PyTorch tensors as easily as that
A = torch.tensor(numpy.ones((16, 1024)))
print(A.dtype)
print(A.shape)

torch.float64
torch.Size([16, 1024])


#### Converting tensors back to Numpy!

Converting back to numpy arrays is as easy as getting the *.data* attribute of the tensor and calling its *.numpy()* casting function


In [12]:
A = torch.rand(10, 10)
Anpy = A.data.numpy()
print('A type: ', type(A)) # torch.Tensor
print('Anpy type: ', type(Anpy)) # numpy.ndarray

A type:  <class 'torch.Tensor'>
Anpy type:  <class 'numpy.ndarray'>


### Exercise 2

Create an **int16** *it* tensor in PyTorch (however you want) from the following numpy array *na*

In [13]:
na = 10 * numpy.random.rand(8, 8)

# TODO: create the short tensor out of 'na'
it = torch.tensor(na, dtype=torch.short)

## Operations with tensors

The documentation of PyTorch tensors can be found online in: https://pytorch.org/docs/stable/tensors.html

This section introduces two important types of operations in PyTorch:

1. In-place operations
2. Algebraic operations on tensors: transposing, squeezing/unsqueezing, slicing, chunking and concatenating

#### In-place operations

In [14]:
# In-place operations are those whose function name contain an underscore '_' as in fill_(val), add_(val), etc.
a = torch.empty(2, 2)
a.fill_(1)
print(a)


tensor([[1., 1.],
        [1., 1.]])


In [15]:
# There are operations where both inplace and normal methods can be applied
# For example to sum some value to the tensor
a.add_(1)
print(a) # prints a tensor of values "2"
# This, though, takes no effect
a.add(1)
print(a)

tensor([[2., 2.],
        [2., 2.]])
tensor([[2., 2.],
        [2., 2.]])


In [16]:
# So yes, you guessed right! We have to actually assign the result to an output tensor to actually
# get the outcome of this operation
b = a.add(1)
print(b) # NOW it prints a tensor of values "3"!

tensor([[3., 3.],
        [3., 3.]])


### Exercise 3

Do you notice an important difference between these in-place vs normal operations? Perhaps not yet... what if I tell you that I want to apply an operation upon a FloatTensor 10000x10000? Knowing that we have 32 bits per float value, compute the required memory to store that tensor in Megabytes (1 MB = 1.000.000 Bytes)

In [18]:
# TODO
total_mem = (32*10000*10000)/(8*1000000)

print(total_mem)

400.0


### Now we can be more applied to deep learning. But with tensors, raw operations.

A Neuron is defined as a linear operation of weighted sums followed by a non-linearity. We thus have a tensor of weights *w*, a scalar with the bias *b*, and a non-linearity (like ReLU *max(0, x)* that just allows the positive components to go forth in the *y* values).

![](https://www.researchgate.net/profile/Haroldo_Campos_Velho2/publication/235901708/figure/fig1/AS:669443441049602@1536619162135/Artificial-neuron-Equation-neuron-output.ppm)


In [19]:
# We will have 1 input vector with 100.000 dimensions (features)
x = torch.ones(1, 100000)

# Our weight tensor is hence, for a neuron, 100.000 x 1
w = 0.02 * torch.randn(100000, 1)
print(w)


# Let's define the function that will perform the operation of a neuron
def forward_neuron(x, w, b):
  v = x.mm(w) + b # .mm() is the matmul function (http://pytorch.org/docs/stable/torch.html#torch.mm)
  y = v.clamp(min=0) # relu is defined as a truncation of the negative activations to zero (clamp function does the trick)
  return y

# Now we can see examples of operation through the relu

# Our bias is just a scalar
bp = 10 * torch.ones(1)

print(forward_neuron(x, w, bp))

# Shifting the bias quite negatively should raise zero
bn = -10 * torch.ones(1)
print(forward_neuron(x, w, bn))

tensor([[ 0.0260],
        [ 0.0021],
        [-0.0126],
        ...,
        [ 0.0302],
        [-0.0061],
        [-0.0152]])
tensor([[2.0175]])
tensor([[0.]])


### Exercise 4

Continuing with previous in-place vs normal operations rationale, please change the *forward_neuron* function to apply the ReLU in-place. This is very useful to save memory when constructing very deep nets.



**NOTE:** for the record, this will solve the doubt you will have some day "what is this inplace=True in the *nn.ReLU(inplace=True)* object?" when you build neural networks with the PyTorch *torch.nn* API.

In [39]:
def forward_neuron(x, w, b):
  v = x.mm(w) + b # .mm() is the matmul function (http://pytorch.org/docs/stable/torch.html#torch.mm)
  # TODO: make the inplace clamping
  return v.clamp_(min=0)

print(forward_neuron(x, w, bp))

tensor([[2.0175]])


#### Transpositions and beyond

Bear in mind the following FUNDAMENTAL operations to work with deep learning:

* Tensor transposition: swapping dimensions in the tensor
* Tensor chunking: breaking down a tensor into sub-pieces through a certain dimension
* Tensor concatenation: merging different tensors into a single one.
* Tensor squeezing/unsqueezing for dimension adjustments

In [22]:
# Transposition

A = torch.empty(10, 20, 5)

# Swap axis 2 and 1
A_21 = A.transpose(2, 1)

print('{} transposed axis (2, 1) to: {}'.format(A.shape, A_21.shape))

# Swap axis 2 and 0
A_20 = A.transpose(2, 0)

print('{} transposed axis (2, 0) to: {}'.format(A.shape, A_20.shape))

torch.Size([10, 20, 5]) transposed axis (2, 1) to: torch.Size([10, 5, 20])
torch.Size([10, 20, 5]) transposed axis (2, 0) to: torch.Size([5, 20, 10])


**NOTE:** The utlity of transpositions will be seen further when dealing with different neural architecture designs.

In [23]:
# Different axis can be merged with the .view() operator
B = A.view(200, 5)
print('{} axis (0, 1) merged to: {}'.format(A.shape, B.shape))

torch.Size([10, 20, 5]) axis (0, 1) merged to: torch.Size([200, 5])


In [24]:
# Note that giving a wrong size in the dimensions for .view() raises an error
try:
  B = A.view(201, 5)
except RuntimeError:
  print('Wrong dimension sizes specified in .view()!')

Wrong dimension sizes specified in .view()!


In [25]:
# Chunking the tensor with .chunk() requires to specify how many chunks we want in which dimension
# For example for tensor A: (10, 20, 5), we can chunk it into 5 sub-tensors of shape (10, 4, 5) each
Achunks = torch.chunk(A, 5, dim=1)
for i, achunk in enumerate(Achunks):
  print('Chunk {} shape: {}'.format(i, achunk.shape))

Chunk 0 shape: torch.Size([10, 4, 5])
Chunk 1 shape: torch.Size([10, 4, 5])
Chunk 2 shape: torch.Size([10, 4, 5])
Chunk 3 shape: torch.Size([10, 4, 5])
Chunk 4 shape: torch.Size([10, 4, 5])


In [26]:
# And tensors can be merged back to a tensor Amerged with .cat() operator, specifying in which dimension do we concatenate
# So to go back to the same tensor as we had prior to chunking, we stack on dimension 1

Amerged = torch.cat(Achunks, dim=1)
print('Amerged shape: ', Amerged.shape)

Amerged shape:  torch.Size([10, 20, 5])


Finally, we may want to add additional dimensions or remove them from our tensor.
We achieve so with [.squeeze()](https://pytorch.org/docs/stable/torch.html#torch.squeeze) or [.unsqueeze()](https://pytorch.org/docs/stable/torch.html#torch.unsqueeze).



In [27]:
# Define an empty tensor to start from
A = torch.empty(2, 2)

# 1) Add an extra dimension in axis 0 (unsqueeze)
A = A.unsqueeze(0)

# 2) Add an extra dimension in axis 2
A = A.unsqueeze(2)

# 3) Add an extra dimension in axis 2 again
A = A.unsqueeze(2)

print('Current A shape after unsqueezing dimensions=(0, 2, 2): ', A.shape)

Current A shape after unsqueezing dimensions=(0, 2, 2):  torch.Size([1, 2, 1, 1, 2])


In [28]:
# 4) Remove the dimension 0 from step (1)

A = A.squeeze(0)

print('Current A shape after squeezing dim=0: ', A.shape)

Current A shape after squeezing dim=0:  torch.Size([2, 1, 1, 2])


In [29]:
# 5) We will remove all remaining dimensions of size 1 ("useless") when we do not specify the dimension

A = A.squeeze()

print('Current A after squeezing all remaining dimensions of size 1: ', A.shape)

Current A after squeezing all remaining dimensions of size 1:  torch.Size([2, 2])


### Exercise 5

Unsqueezing and squeezing dimensions can also be achieved with the [.view()](https://pytorch.org/docs/stable/tensors.html?highlight=view#torch.Tensor.view) function. "View" the tensor *A* to achieve the same shape as the one after step (3) in the previous section with a single function call to *.view()*

In [32]:
A = torch.empty(2, 2)

# TODO: use view to unsqueeze dimensions 0, 2 and 2 (as in the previous section)
A = A.view(2,2)

print('Current A shape: ', A.shape)

Current A shape:  torch.Size([2, 2])


### Exercise 6 (Broadcasting semantics)

Many PyTorch operations support [Broadcasting Semantics](https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics).

If a PyTorch operation supports broadcast, then its Tensor arguments can be
automatically expanded to be of equal sizes (without making copies of the data).

Considering two tensors with at least one dimension, they are "broadcastable" if the following conditions are fulfilled, when iterating dimensions jointly from the last one:
- Dimensions of both tensors are equal
- One of them is 1
- One of them does not exist

Check these examples:
```
A.shape = torch.Size([      1])
B.shape = torch.Size([3, 1, 7])
C = A + B
C.shape = torch.Size([3, 1, 7])
```

```
A.shape = torch.Size([5, 1, 4, 1])
B.shape = torch.Size([   3, 1, 1])
C = A + B
C.shape = torch.Size([5, 3, 4, 1])
```

```
A.shape = torch.Size([5, 2, 4, 1])
B.shape = torch.Size([   3, 1, 1])
C = A + B  # Error, the broadcasting condition is broken in the second dimension
```

In [33]:
a = torch.ones(5, 3, dtype=torch.int16)
print(f"a = {a}\n")
print(f"a.shape = {a.shape}\n")

b = torch.tensor([1, 2, 3])
print(f"b = {b}\n")
print(f"b.shape = {b.shape}\n")

print(f"a*b = {a*b}\n")
print(f"(a*b).shape = {(a*b).shape}")

a = tensor([[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]], dtype=torch.int16)

a.shape = torch.Size([5, 3])

b = tensor([1, 2, 3])

b.shape = torch.Size([3])

a*b = tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])

(a*b).shape = torch.Size([5, 3])


In [34]:
a = torch.ones(5, 3, dtype=torch.int16)
print(f"a = {a}\n")
print(f"a.shape = {a.shape}\n")

b = torch.tensor([1, 2, 3, 4, 5])
print(f"b = {b}\n")
print(f"b.shape = {b.shape}\n")

try:
    # a and b are not broadcastable, because in the trailing dimension 3 != 5
    print(f"a*b = {a*b}\n") # Error
except Exception as e:
    print(f"ERROR: {e}")

a = tensor([[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]], dtype=torch.int16)

a.shape = torch.Size([5, 3])

b = tensor([1, 2, 3, 4, 5])

b.shape = torch.Size([5])

ERROR: The size of tensor a (3) must match the size of tensor b (5) at non-singleton dimension 1


In [37]:
# TODO: add a trailing dimension to b to make a and b broadcastable
b = B.unsqueeze(1)
b = B.unsqueeze(1)

print(f"a*b = {a*b}\n")
print(f"(a*b).shape = {(a*b).shape}")

RuntimeError: ignored

### Exercise 7 (Grand Finale)

Given the tensor *A*, shuffle  each of the elements of the first dimension with the *random.shuffle* Python function.

**Clue:** use the functions *torch.chunk*, *random.shuffle* (which acts in-place over Python lists), and *torch.cat*.

In [None]:
import torch
import random
A = torch.rand(4, 2, 4)
print('A before shuffling:\n ', A)

# TODO: chunk the tensor, and convert the resulting
# tuple into a Python list
A = ...

# TODO: operate with shuffle over the list
...

# TODO: concatenate the sub-tensors in list "A" back
# to tensor "A"
A = ...

print('A after shuffling:\n ', A)

## Final Words

Congrats! You reached the end of this introductory tutorial to PyTorch most fundamental data structure. Managing dimensions, casting dtypes, in-place operations and more are the EVERYDAY to-dos of a deep learner. So get ready to master these before delving into the coolest projects ever in which you'll build deep nets. Being confident with tensor operations is very important to properly design neural networks and avoid bugs!

### References

[1] https://medium.com/datadriveninvestor/from-scalar-to-tensor-fundamental-mathematics-for-machine-learning-with-intuitive-examples-part-163727dfea8d

[2] https://pytorch.org/tutorials/beginner/former_torchies/tensor_tutorial.html

[3] https://pytorch.org/docs/stable/tensors.html
