<a href="https://colab.research.google.com/github/ahmedyusef9/deep_learning_course/blob/t01_PyTorchIntro/t01_pytorchintro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

$$
\newcommand{\bb}[1]{\boldsymbol{#1}}
$$

# Deep Learning - Tutorial 1: Introduction to PyTorch

In this tutorial, we will cover:
* Course info
* Environment setup with `conda`
* Jupyter/Colab: Using notebooks
* Pytorch:
  - Tensors basics: indexing, datatypes, math
  - Broadcasting
  - Intro to automatic differentiation

Also in this tutorial, but for self-study:
* Basic Python: Basic data types (Containers, Lists, Dictionaries, Sets, Tuples), Functions, Classes


## Administration and General Info

My info:
- Nareed H. Farhat
- nhashe01@campus.haifa.ac.il
- Office hour: Monday, 15:00-16:00, or zoom, scheduled via email.


Course:
- Course website is on moodle.
- Updates will be posted there (but emails will be sent aswell)
- Post questions regarding assignments and project on **Moodle** only! (Not email)
- For personal administrative requests/delays: email Rotem.
- For appeals/questions about grades: Email Rotem/Nareed.

Lectures
- Provide a high level presentation of most core topics of Deep Learning, including very recent topics.
- Give mathematical background and justifications.
- Supplementary material with more in-depth examples or advanced topics.

Tutorials:
- Structure is usually a short theory reminders part and then step-by-step technical implementation of a real  problem.
- Technical, meant to help you understand the implementation details behind deep learning.
- **Highly relevant** for success in the homework assignments.
<!-- - After this tutorial you should clone the [tutorials repo](https://github.com/vistalab-technion/cs236781-tutorials), install the conda env and play with the code. -->

Homework:
- Two HW assignments, quite heavy load. Best to tackle them after you have sufficient programming experience.
- Almost entirely "wet" i.e. implementation of real algorithms with real data.
- Should be done in pairs.
<!-- - Some will require use of GPUs. We will provide access to course servers - **please register**. -->
<!-- - Read the [getting started page](https://vistalab-technion.github.io/cs236781/assignments/getting-started) and [collaboration policy](https://vistalab-technion.github.io/cs236781/info/#administration) carefully! -->

## Environment setup

Two setup options are available
1. For setting up a local enviroment on your copmuter.
2. For working on Colab.

Run one of the given two setups based on chosen usage, in the toturials we'll work with (2) Colab.

### 1. Local installation and setup

To install and manage all the necessary packages and dependencies for the
course tutorials and assignments, we use [conda](https://conda.io), a popular package-manager for python.

- The tutorial notebooks come with an `environment.yml` file which defines which third-party libraries we depend on.
- Conda will use this file to create a virtual environment for you.
- This virtual environment includes python and all other packages and tools we specified, separated from any preexisting python installation you may have.

#### Installation



1. Install the python3 version of [miniconda](https://conda.io/miniconda.html).
Follow the [installation instructions](https://conda.io/docs/user-guide/install/index.html)
for your platform.

2. Install all dependencies (into a virtual env) with `conda`:

    ```shell
    conda env update -f environment.yml
    ```
    
    This will also create a new virtual env (`dl317024-tutorials`) if it doesn't already exist.

3. To activate the virtual environment (set up `$PATH`):

    ```shell
    conda activate dl317024-tutorials
    ```

You can also check what conda environments you have and which is active, run

```shell
conda env list
```

#### Short demo of environment setup

A video demonstrating enviroment installation and wokring with `conda` locally will be uploaded to moodle.

#### Running Jupyter



From a terminal, enter the folder contaning the tutorial notebooks.
1. Make sure that the active conda environment is `dl317024-tutorials`:

    ```shell
    conda activate dl317024-tutorials
    ```

2. Run jupyter with

    ```shell
    jupyter lab
    ```
    
    This will start a [jupyter lab](https://jupyterlab.readthedocs.io/en/stable/)
    server and open your browser at the local server's url. You can now start working with the notebooks.

If you're new to jupyter notebooks, you can get started by reading the
[UI guide](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html#notebook-user-interface)
and also about how to use notebooks in
[JupyterLab](https://jupyterlab.readthedocs.io/en/latest/user/notebook.html).

### 2. Using *Google Colab*

Google Colab, short for Colaboratory, is an incredible platform that provides free access to GPU (Graphics Processing Unit) and TPU (Tensor Processing Unit) resources, making it an ideal environment for running deep learning experiments without the need for specialized hardware.

#### Key Features:

1. **Free GPU/TPU Resources:**
Colab allows you to leverage powerful GPUs and TPUs at no cost. This is a game-changer for training complex deep learning models, significantly speeding up computation times.

2. **Jupyter Notebook Integration:**
Colab is based on the Jupyter Notebook, allowing you to create interactive and shareable documents that combine code, text, and visualizations.

3. **Easy Collaboration:**
You can share your Colab notebooks just like you would with Google Docs or Sheets. Collaborate in real-time with colleagues or students, making it an easy tool for group projects.

4. **Pre-installed Libraries:**
Colab comes with many popular machine learning libraries pre-installed, such as TensorFlow, PyTorch, and scikit-learn, saving you time on setup.

5. **Cloud Storage Integration:**
Easily import datasets or export your trained models using Google Drive. Colab seamlessly integrates with Google Cloud, making data management and sharing straightforward.

#### Getting Started:

To begin your deep learning journey with Colab, simply follow these steps:

1. **Open a Colab Notebook:**
   - Go to [Google Colab](https://colab.research.google.com/).
   - Create a new notebook or open an existing one.

2. **Choose Runtime Type:**
   - Navigate to "Runtime" in the menu.
   - Select "Change runtime type" to choose between CPU, GPU, or TPU.

3. **Run Code Cells:**
   - Write your Python code in cells.
   - Execute cells using the play button or `Shift + Enter`.

4. **Save and Share:**
   - Save your work on Google Drive or GitHub.
   - Share your notebook link with others for collaboration.

## Libraries used in this tutorial




*   math
*   sys
*   torch
*   torchviz
*   IPython

** Conda has all of them except torchviz, we'll install it using pip install.


In [None]:
# Run in colab setup only.

!pip install torchviz

#clear display
from IPython import display
import time
display.clear_output()

## Jupyter basics

Jupyter notebooks consist mainly of code and markdown cells.
The code cells contain code that is run by a `kernel`, an
interpreter for some programming language, python in our case.

In [None]:
# This is a code cell; it can contain arbitrary python code.

foo = 'bar'
print(foo)

def the_answer():
    return 42

# The output of the last expression in a cell is shown
2*the_answer()
the_answer()

bar


42

Variables and functions defined in a code cell are available in subsequent cells.

In [None]:
ans = the_answer()

In [None]:
ans

42

This is a markdown cell. You can use markdown syntax to format your text, and also include equations
written in $\LaTeX$:

$$
e^{i\pi} - 1 = 0
$$

Other useful things to know about:
* Managing runtimes
* Restarting kernel

## Basics of Python

### Introduction

Python is a great general-purpose programming language on its own and with the addition of a few
popular libraries such as `numpy`, `scipy`, `pandas`, `scikit-learn`, `matplotlib` and others it becomes an
effective scientific computing environment.

Today it is also the most-used language for machine learning both in research and industry.

Recently many **Deep Learning frameworks** have emerged for python.
Arguably the most notable ones in 2021 are **TensorFlow** (with the Keras frontend) and **PyTorch**.

In this course we'll use PyTorch, which is currently [the leading DL framework](https://thegradient.pub/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry) for research.

<center><img src="https://thegradient.pub/content/images/2019/10/number_medium.png" width="700"/></center>

Many of you may have some experience with Python and numpy; for the rest of you, this notebook can serve as a quick crash course both on the Python programming language and on the use of PyTorch for scientific computing with tensors.

However, we recommend getting up to speed with python using the numerous availble online resources.

Credit: Parts of the Python tutorial here were adapted from the [CS231n Python tutorial](http://cs231n.github.io/python-numpy-tutorial/) by Justin Johnson.

Python is a high-level, dynamically typed multiparadigm programming language. Python code is often said to be almost like pseudocode, since it allows you to express very powerful ideas in very few lines of code while being very readable. As an example, here is an implementation of the classic quicksort algorithm in Python:

In [None]:
def quicksort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quicksort(left) + middle + quicksort(right)

print(quicksort([3,6,8,10,1,2,1]))

Python has great [documentation](https://docs.python.org/3)! Use it often.

### Packages and modules

A python **module** is simply a python file (`.py`), which can contain functions, classes and even top-level code.

A **package** is a collection of modules within a directory. Python comes with a standard library which
includes many useful packages.

A package must be imported before use. They can be imported like so:

In [None]:
# Import packages from the python standard library
import math
import sys

### Basic data types

#### Numbers

Integers and floats work as you would expect from other languages:

In [None]:
x = 3
print(x, type(x))

In [None]:
print(x + 1)  # Addition;
print(x - 1)  # Subtraction;
print(x * 2)  # Multiplication;
print(x ** 2)  # Exponentiation;

In [None]:
x += 1
print(x)
x *= 2
print(x)

In [None]:
y = 2.5
print(type(y))
print(y, y + 1, y * 2, y ** 2, y / 2, y // 2)

Note that unlike many languages, Python does not have unary increment (x++) or decrement (x--) operators.

Python also has built-in types for long integers and complex numbers; you can find all of the details in the [documentation](https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-long-complex).

#### Booleans

Python implements all of the usual operators for Boolean logic, but uses English words rather than symbols (`&&`, `||`, etc.):

In [None]:
t, f = True, False

Now we let's look at the operations:

In [None]:
print(t and f) # Logical AND
print(t or f ) # Logical OR
print(not t  ) # Logical NOT
print(t != f ) # Logical XOR

#### Strings

In [None]:
hello = 'hello'   # String literals can use single quotes
world = "world"   # or double quotes; it does not matter.
hello, len(hello)

In [None]:
# String concatenation
'aaa ' + 'bbb'

There are several way to created formatted strings, here are a couple:

In [None]:
s = 'hello'
a = [1,2,3]

# sprintf style string formatting
print('%s %s: pi=%.5f' % (s, a, math.pi))

# formatting with f-string literals (python 3.6+)
print(f'{s} {a}: pi={math.pi:.5f}')

String objects have a bunch of useful methods; for example:

In [None]:
s = "hello"
print(s.capitalize() ) # Capitalize a string; prints "Hello"
print(s.upper()      ) # Convert a string to uppercase; prints "HELLO"
print(s.rjust(7)     ) # Right-justify a string, padding with spaces; prints "  hello"
print(s.center(7)    ) # Center a string, padding with spaces; prints " hello "
print(s.replace('l', '(ell)'))  # Replace all instances of one substring with another
print('  world '.strip())  # Strip leading and trailing whitespace; prints "world"

You can find a list of all string methods in the [documentation](https://docs.python.org/3/library/stdtypes.html#string-methods).

### Containers

Python includes several built-in container types: lists, dictionaries, sets, and tuples.

#### Lists

A list is the Python equivalent of an array, but is resizeable and can contain elements of different types:

In [None]:
xs = [3, 1, 2]   # Create a list
print(xs)
print(xs[2], xs[-1]) # Negative indices count from the end of the list; prints "2"

In [None]:
xs[2] = 'foo'    # Lists can contain elements of different types
print(xs)

In [None]:
xs.append('bar') # Add a new element to the end of the list
print(xs)

In [None]:
x = xs.pop()     # Remove and return the last element of the list
x, xs

#### Slicing

In addition to accessing list elements one at a time, Python provides concise syntax to access sublists; this is known as slicing:

In [None]:
nums = list(range(5))
nums

In [None]:
nums[2:4]    # Get a slice from index 2 to 4 (exclusive)

In [None]:
nums[2:]     # Get a slice from index 2 to the end

In [None]:
nums[:2]     # Get a slice from the start to index 2 (exclusive)

In [None]:
nums[:]      # Get a slice of the whole list

In [None]:
nums[:-1]    # Slice indices can be negative

In [None]:
nums[0:4:2]  # Can also specify slice step size

In [None]:
nums[2:4] = [8, 9] # Assign a new sublist to a slice

In [None]:
# Delete elements from a list
nums[0:1] = []
del nums[-1]
nums

#### Loops

You can loop over the elements of a list like this:

In [None]:
animals = ['cat', 'dog', 'monkey']
for animal in animals:
    print(animal)

If you want access to the index of each element within the body of a loop, use the built-in `enumerate` function:

In [None]:
animals = ['cat', 'dog', 'monkey']
for idx, animal in enumerate(animals):
    print(f'#{idx+1}: {animal}')

#### List comprehensions

When programming, frequently we want to transform one type of data into another. As a simple example, consider the following code that computes square numbers:

In [None]:
nums = [0, 1, 2, 3, 4]
squares = []
for x in nums:
    squares.append(x ** 2)
squares

You can make this code simpler using a list comprehension:

In [None]:
squares = [x ** 2 for x in nums]
squares

List comprehensions can also contain conditions:

In [None]:
even_squares = [x ** 2 for x in nums if x % 2 == 0]
even_squares

List comprehensions can be nested:

In [None]:
nums2 = [-1, 1]
[x * y for x in nums for y in nums2]

#### Dictionaries

A dictionary stores (key, value) pairs. In other languages this is known as a `Map` or `Hash`.

In [None]:
d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data
print(d['cat'])       # Get an entry from a dictionary
print('cat' in d)     # Check if a dictionary has a given key

In [None]:
d['fish'] = 'wet'    # Set an entry in a dictionary
d

In [None]:
# Trying to access a non-existing key raises a KeyError
try:
    d['monkey']
except KeyError as e:
    print(e, file=sys.stderr)

In [None]:
print(d.get('monkey', 'N/A'))  # Get an element with a default
print(d.get('fish', 'N/A'))    # Get an element with a default

In [None]:
del d['fish']        # Remove an element from a dictionary
d

In [None]:
# Iteration over keys
d = {'person': 2, 'cat': 4, 'spider': 8}
for animal in d:
    print(f'A {animal} has {d[animal]} legs')

In [None]:
# Iterate over key-value pairs
d = {'person': 2, 'cat': 4, 'spider': 8}
for animal, num_legs in d.items():
    print(f'A {animal} has {num_legs} legs')

In [None]:
# Create a dictionary using the built-in dict() function
dict(foo=1, bar=2, baz=3)

#### Dictionary comprehensions

These are similar to list comprehensions, but allow you to easily construct dictionaries. For example:

In [None]:
nums = [0, 1, 2, 3, 4]
even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}
even_num_to_square

#### Sets

A set is an unordered collection of distinct elements

In [None]:
animals = {'cat', 'dog'}
print(animals)
print('cat' in animals )  # Check if an element is in a set
print('fish' in animals) # prints "False"

In [None]:
animals.add('fish') # Add an element to a set
print('fish' in animals)
len(animals) # Number of elements in a set

In [None]:
animals.add('cat')       # Adding an element that is already in the set does nothing
animals

_Loops_: Iterating over a set has the same syntax as iterating over a list; however since sets are unordered, you cannot make assumptions about the order in which you visit the elements of the set:

In [None]:
animals = {'cat', 'dog', 'fish'}
for idx, animal in enumerate(animals):
    print(f'#{idx}: {animal}')

#### Set comprehensions

Like lists and dictionaries, we can easily construct sets using set comprehensions:

In [None]:
from math import sqrt
s = {int(sqrt(x)) for x in range(37)}
s

#### Tuples

A tuple is an **immutable** ordered list of values.

In [None]:
t = (1, 2, 'three')
t

It can be used in ways similar to a list:

In [None]:
t[0:1], t[1:3], t[-1], len(t)

A tuple can be used a key in a dictionary and as an element of a sets, while **lists cannot**.

In [None]:
d = {(x, x + 1): x for x in range(10)}  # Create a dictionary with tuple keys
d

A tuple (and also a list) can be **unpacked**:

In [None]:
one, two, three = t
one, two, three

Note that when retuning multiple values from a function (or code block in a jupyter notebook, as above)
your values get wrapped in a tuple, and the tuple is what's returned.
Unpacking the return value of a function can make it seem as if multiple values were returned.

### Functions

Python functions are defined using the `def` keyword. For example:

In [None]:
def sign(x):
    if x > 0:
        return 'positive'
    elif x < 0:
        return 'negative'
    else:
        return 'zero'

for x in [-1, 0, 1]:
    print(sign(x))

We will often define functions to take optional keyword arguments, like this:

In [None]:
def hello(name, loud=False):
    if loud:
        print('HELLO, %s' % name.upper())
    else:
        print('Hello, %s!' % name)

hello('Bob')
hello('Fred', loud=True)

#### Positional and Keyword arguments

Python functions are very flexible in the way they accept arguments. Both positional (regular) and keyword
arguments are supported and can be mixed in the same definition. Additionally, extra arguments can be passed in with the `*args` and `**kwargs` constructs.

Here's a function with three positional arguments and three keyword arguments which also accepts extra
positional and keyword arguments.

In [None]:
def myfunc(a1, a2, a3, *extra_args, kw1='foo', kw2='bar', kw3=3, **extra_kwargs):
    print(f'Got positional args: {(a1, a2, a3)}')
    print(f'Got keyword args   : {dict(kw1=kw1, kw2=kw3, kw3=kw3)}')
    print(f'Got extra positional args: {extra_args}')
    print(f'Got extra keyword args: {extra_kwargs}')

It can be called in many ways:

In [None]:
myfunc(1,2,3,4,5,6)

In [None]:
my_args = [1,2,3,4]
myfunc(*my_args)

In [None]:
myfunc(1,2,3, kw3=3, kw2=2, foo='bar')

In [None]:
my_kwargs = dict(kw1=1, kw2=2, kw3=3, kw4=4)
myfunc(1,2,3, **my_kwargs)

Note that keyword args can be omitted, while positional args cannot:

In [None]:
try:
    myfunc(1,2)
except TypeError as e:
    print(e, file=sys.stderr)

### Classes

The syntax for defining classes in Python is straightforward:

In [None]:
class Greeter:

    # Constructor
    def __init__(self, name):
        self.name = name  # Create an instance variable

    # Instance method
    def greet(self, loud=False):
        if loud:
            print('HELLO, %s!' % self.name.upper())
        else:
            print('Hello, %s' % self.name)

g = Greeter('Fred')  # Construct an instance of the Greeter class
g.greet()            # Call an instance method
g.greet(loud=True)   # Call an instance method

Classes can implement special **magic functions** that enable them to be integrated nicely with other python code. Magic functions have special names that start and end with `__`.

For example, here's a class that can be indexed with `[]` and iterated over with a `for` loop.

In [None]:
class ExampleCollection(object):
    def __init__(self):
        self.items = [100, 200, 300]

    def __len__(self):
        return len(self.items)

    def __getitem__(self, idx):
        return self.items[idx]

    def __iter__(self):
        class ExampleIter():
            def __init__(self, collection):
                self.idx = 0
                self.collection = collection

            def __next__(self):
                if self.idx >= len(self.collection):
                    raise StopIteration()
                x = self.collection[self.idx]
                self.idx += 1
                return x

        return ExampleIter(self)


In [None]:
example = ExampleCollection()
print('length=', len(example)) # invokes __len__
print('example[0]=', example[0]) # invokes __getitem__

In [None]:
for x in example: # invokes __iter__ and it's __next__
    print(x)

Many other magic functions exist. Consult the docs and see if you can catch 'em all!

## PyTorch

PyTorch is a relatively new yet widely used deep learning framework for python.

It can also be used as a general scientific computing library, as it provides a high-performance multidimensional array object, with GPU support.

We'll refer to such n-dimentional arrays as **tensors** in accordance with the deep learning terminology.
Crucially, pytorch supports **automatic differentiation** through arbitrary computations performed on its tensors.

During the course we'll use it extensively and learn many parts of its API.
You should also familiarize yourself with the [PyTorch Documentation](https://pytorch.org/docs/stable/) as it will greatly assist you when implementing your own models.

This notebook will show only **a small part** of PyTorch's API, the `Tensor` class.
This class is very similar to numpy's `ndarray`, and provides much of the same functionality.
However, it also has two important distinctions:
- Support for GPU computations.
- Can store extra data needed for implementing **automatic differentiation** used for back propagation:
    - A tensor of the same dimentions containing the gradient of this tensor w.r.t. some scalar (e.g. loss).
    - A node representing an operation in the computational graph that produced this tensor.

In the next tutorials we will examine these concepts further.

This notebook will show some brief examples, just to get a feel for it and compare it to the usual numpy `ndarray`s.

You will be using both PyTorch tensors and numpy `ndarray`s extensively throughout the course homework assignments, and in general when implementing deep learning algorithms.
Although we'll mainly use **PyTorch** tensors for implementing our Deep Learning systems, it's still important to be proficient with `numpy`, since:
1. They concepts are very similar. Understanting one will help you quickly be proficient with the other.
1. You'll find that you need to switch between the two when working with read DL systems.

To use pytorch, we first need to import the `torch` package:

In [None]:
import torch

In [None]:
torch.__version__

'2.5.0+cu121'

### Tensors: n-dimentional arrays

A tensor represents an n-dimentional grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The name "tensor" is a generalization of concepts you already know.
For instance, a vector is a 1-D tensor, and a matrix a 2-D tensor.

- The **rank** of the tensor is the number of dimensions it has.
- The **shape** of a tensor is a tuple of integers giving the number of elements along each dimension.

Most common functions you know from numpy can be used on tensors as well.
Actually, since numpy arrays are so similar to tensors, we can convert most tensors to numpy arrays (and back) but we don't need it too often.

We can initialize tensors from nested Python lists, and access elements using square brackets:

In [None]:
a = torch.tensor([1, 2, 3.])  # Create a rank 1 array
a

tensor([1., 2., 3.])

In [None]:
# Indexing always returns tensors
a[0]

tensor(1.)

In [None]:
# Use .item() to get a python scalar
a[0].item()

1.0

Two very important properties of any tensor are its `shape` and `dtype`.

In [None]:
def print_arr(arr, pre_text=''):
    print(f'shape={tuple(arr.shape)} dtype={arr.dtype}:')
    print(f'{pre_text}{arr}\n')

In [None]:
print_arr(a)

shape=(3,) dtype=torch.float32:
tensor([1., 2., 3.])



In [None]:
a[0] = 5                 # Change an element of the array
a

tensor([5., 2., 3.])

You can obtain the shape of a tensor in the same way as in numpy (`x.shape`), or using the `.size` method:

In [None]:
# Create a tensor with random values between 0 and 1 with the shape [2, 3, 4]
a = torch.rand(2, 3, 4)
print(a)

tensor([[[0.4950, 0.2661, 0.2261, 0.4103],
         [0.9319, 0.1015, 0.8076, 0.6862],
         [0.4194, 0.2464, 0.2088, 0.4357]],

        [[0.6235, 0.6372, 0.8816, 0.5087],
         [0.2533, 0.9897, 0.9401, 0.7171],
         [0.1993, 0.6503, 0.3754, 0.8313]]])


In [None]:
shape = a.shape
print("Shape:", a.shape)

size = a.size()
print("Size:", size)

dim1, dim2, dim3 = a.size()
print("Size:", dim1, dim2, dim3)

Shape: torch.Size([2, 3, 4])
Size: torch.Size([2, 3, 4])
Size: 2 3 4


In [None]:
b = torch.tensor([[[1,2,3],[4,5,6]], [[11, 22, 33], [44, 55, 66]]])   # Create a rank 3 array
print_arr(b)

shape=(2, 2, 3) dtype=torch.int64:
tensor([[[ 1,  2,  3],
         [ 4,  5,  6]],

        [[11, 22, 33],
         [44, 55, 66]]])



In a general n-dimensional tensor, `b[i, j, k, ...]` accesses a specific element based on its position in each dimension, where i, j, k, etc., are zero-based indices for each dimension.

For an n-dimensional tensor, you interpret each index as follows:
*   `i`: Index in the first dimension (e.g., row in 2D, depth in 3D).
*   `j`: Index in the second dimension (e.g., column in 2D, row in 3D).
*   `k`: Index in the third dimension, and so forth.



For example, in a 3D tensor (e.g., a tensor with shape `[depth, rows, columns]`):

`b[i, j, k]` accesses the element located at the i-th depth slice, j-th row, and k-th column.

In [None]:
# Given b is a 3D tensor with shape [depth, rows, columns].

# Access the element at depth=0, row=0, column=0
print(b[0, 0, 0])  # Accesses the element in the first depth slice, first row, first column

# Access the element at depth=0, row=1, column=2
print(b[0, 1, 2])  # Accesses the element in the first depth slice, second row, third column

# Access the element at depth=1, row=0, column=1
print(b[1, 0, 1])  # Accesses the element in the second depth slice, first row, second column

tensor(1)
tensor(6)
tensor(22)


The function `torch.Tensor` allocates memory for the desired tensor, but reuses any values that have already been in the memory.
To directly assign values to the tensor during initialization, there are many alternatives including:

* `torch.zeros`: Creates a tensor filled with zeros
* `torch.ones`: Creates a tensor filled with ones
* `torch.rand`: Creates a tensor with random values uniformly sampled between 0 and 1
* `torch.randn`: Creates a tensor with random values sampled from a normal distribution with mean 0 and variance 1
* `torch.full`: Creates a tensor of the specified size filled with a specified value.
* `torch.arange`: Creates a tensor containing the values $N,N+1,N+2,...,M$
* `torch.Tensor` (input list): Creates a tensor from the list elements you provide
* `torch.eye`: Creates an identity matrix, i.e., a tensor with ones on the diagonal and zeros elsewhere.


In [None]:
torch.zeros((2, 2))  # Create an array of all zeros

tensor([[0., 0.],
        [0., 0.]])

In [None]:
torch.ones((1, 10))   # Create an array of all ones

tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

In [None]:
torch.full((3, 3), 7.2) # Create a constant array

tensor([[7.2000, 7.2000, 7.2000],
        [7.2000, 7.2000, 7.2000],
        [7.2000, 7.2000, 7.2000]])

In [None]:
torch.eye(4, dtype=torch.int) # Create an identity matrix of integers

tensor([[1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 0],
        [0, 0, 0, 1]], dtype=torch.int32)

In [None]:
t = torch.rand((4,4,3)) # Create a 3d-array filled with U[0,1] random values
t

tensor([[[0.3995, 0.9339, 0.3191],
         [0.4145, 0.7216, 0.2637],
         [0.9357, 0.4412, 0.9199],
         [0.9731, 0.0884, 0.1306]],

        [[0.3607, 0.5795, 0.9463],
         [0.5126, 0.3313, 0.9309],
         [0.3766, 0.1333, 0.5225],
         [0.9910, 0.0877, 0.8627]],

        [[0.3386, 0.9977, 0.7395],
         [0.1263, 0.3367, 0.5406],
         [0.4243, 0.9504, 0.0715],
         [0.8552, 0.4077, 0.7534]],

        [[0.7129, 0.9001, 0.5558],
         [0.6862, 0.2317, 0.6383],
         [0.6033, 0.0994, 0.1707],
         [0.8280, 0.5969, 0.4900]]])

#### Array rank

In `torch` **rank** means **number of dimensions**.

**rank-0** arrays are scalars.

In [None]:
a0 = torch.tensor(17)
print_arr(a0)

shape=() dtype=torch.int64:
17



In [None]:
# Get scalar as a python float
a0.item()

17

**rank-1** arrays of length `n` have a shape of `(n,)`.

In [None]:
# A rank-1 array
a1 = torch.tensor([1,2,3])

print_arr(a1)

shape=(3,) dtype=torch.int64:
tensor([1, 2, 3])



In [None]:
# A rank-1 array scalar
print_arr(torch.tensor([3.14]))

shape=(1,) dtype=torch.float32:
tensor([3.1400])



**rank-2** arrays have a shape of `(n,m)`.

In [None]:
a2 = torch.tensor([[1,2,3], [4,5,6]])

print_arr(a2)

shape=(2, 3) dtype=torch.int64:
tensor([[1, 2, 3],
        [4, 5, 6]])



A column vector is also rank-2!

In [None]:
a_col = a1.reshape(-1, 1)

print_arr(a_col)

shape=(3, 1) dtype=torch.int64:
tensor([[1],
        [2],
        [3]])



And a row vector is also rank-2:

In [None]:
a_row = a1.reshape(1, -1)

print_arr(a_row)

shape=(1, 3) dtype=torch.int64:
tensor([[1, 2, 3]])



**rank-k** arrays have a shape of `(n1,...,nk)`.

In [None]:
print_arr(torch.zeros((2,3,4)))

shape=(2, 3, 4) dtype=torch.float32:
tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]])



In [None]:
print_arr(torch.ones((2,2,2,2)))

shape=(2, 2, 2, 2) dtype=torch.float32:
tensor([[[[1., 1.],
          [1., 1.]],

         [[1., 1.],
          [1., 1.]]],


        [[[1., 1.],
          [1., 1.]],

         [[1., 1.],
          [1., 1.]]]])



### Tensor math

#### Elementwise operations
Basic mathematical functions **operate elementwise** on arrays, and are available both as operator overloads and as functions in the `torch` module:

In [None]:
x = torch.tensor([[1,2],[3,4]], dtype=torch.float)
y = torch.tensor([[5,6],[7,8]], dtype=torch.float)

# Elementwise basic math
print_arr(x + y)
print_arr(torch.add(x, y))

shape=(2, 2) dtype=torch.float32:
tensor([[ 6.,  8.],
        [10., 12.]])

shape=(2, 2) dtype=torch.float32:
tensor([[ 6.,  8.],
        [10., 12.]])



In [None]:
print_arr(x - y)
print_arr(torch.sub(x, y))

shape=(2, 2) dtype=torch.float32:
tensor([[-4., -4.],
        [-4., -4.]])

shape=(2, 2) dtype=torch.float32:
tensor([[-4., -4.],
        [-4., -4.]])



In [None]:
print_arr(x * y)
print_arr(torch.mul(x, y))

shape=(2, 2) dtype=torch.float32:
tensor([[ 5., 12.],
        [21., 32.]])

shape=(2, 2) dtype=torch.float32:
tensor([[ 5., 12.],
        [21., 32.]])



In [None]:
print_arr(x / y)
print_arr(torch.div(x, y))

shape=(2, 2) dtype=torch.float32:
tensor([[0.2000, 0.3333],
        [0.4286, 0.5000]])

shape=(2, 2) dtype=torch.float32:
tensor([[0.2000, 0.3333],
        [0.4286, 0.5000]])



In [None]:
# Elementwise functions
print_arr(torch.sqrt(x))

shape=(2, 2) dtype=torch.float32:
tensor([[1.0000, 1.4142],
        [1.7321, 2.0000]])



In [None]:
print_arr(torch.exp(x))

shape=(2, 2) dtype=torch.float32:
tensor([[ 2.7183,  7.3891],
        [20.0855, 54.5981]])



In [None]:
print_arr(torch.log(x))

shape=(2, 2) dtype=torch.float32:
tensor([[0.0000, 0.6931],
        [1.0986, 1.3863]])



There are of course many more elementwise operations inmplemented by `torch`.

#### Inner and outer products

Other commonly used operations include matrix multiplications, which are essential for neural networks.
Quite often, we have an input vector $\mathbf{x}$, which is transformed using a learned weight matrix $\mathbf{W}$.
There are multiple ways and functions to perform matrix multiplication, some of which:

* `torch.matmul`: Performs the matrix product over two tensors, where the specific behavior depends on the dimensions.
If both inputs are matrices (2-dimensional tensors), it performs the standard matrix product.
For higher dimensional inputs, the function supports broadcasting (for details see the [documentation](https://pytorch.org/docs/stable/generated/torch.matmul.html?highlight=matmul#torch.matmul)).
Can also be written as `a @ b`, similar to numpy. Or as `a dot b` for 1d tensors only.
* `torch.mm`: Performs the matrix product over two matrices, but doesn't support broadcasting (see [documentation](https://pytorch.org/docs/stable/generated/torch.mm.html?highlight=torch%20mm#torch.mm))
* `torch.bmm`: Performs the matrix product with a support batch dimension.
If the first tensor $T$ is of shape ($b\times n\times m$), and the second tensor $R$ ($b\times m\times p$), the output $O$ is of shape ($b\times n\times p$), and has been calculated by performing $b$ matrix multiplications of the submatrices of $T$ and $R$: $O_i = T_i @ R_i$
* `torch.einsum`: Performs matrix multiplications and more (i.e. sums of products) using the Einstein summation convention.
Explanation of the Einstein sum can be found in assignment 1.

Usually, we use `torch.matmul` or `torch.bmm`. We can try a matrix multiplication with `torch.matmul` below.

In [None]:
v = torch.tensor([9, 10])
w = torch.tensor([11, 12])

# Inner product of vectors
# This computes the dot product of vectors v and w:
# (9 * 11) + (10 * 12) = 99 + 120 = 219
print(torch.matmul(v, w))  # torch.matmul works for 1D tensors by performing an inner product
print(torch.dot(v, w))      # torch.dot is specifically for 1D tensors (vectors) and performs the same inner product
print(v @ w)                # @ is syntactic sugar for matmul and works for inner product in 1D


tensor(219)
tensor(219)
tensor(219)


In [None]:
v = torch.tensor([[0, 1, 2],
                  [3, 4, 5]])

w = torch.tensor([[0, 1, 2],
                  [3, 4, 5],
                  [6, 7, 8]])

# Matrix multiplication of 2D tensors v and w
# v has shape (2, 3) and w has shape (3, 3)
# The result will have shape (2, 3) because it multiplies the 2x3 matrix by a 3x3 matrix.
# Each element in the result is computed as the dot product of the rows of v and columns of w:
#
# For example, the top-left element (0,0) of the result is:
# (0 * 0) + (1 * 3) + (2 * 6) = 0 + 3 + 12 = 15
#
# Similarly, other elements are calculated by performing a dot product of each row of v with each column of w.
print(torch.matmul(v, w))

tensor([[15, 18, 21],
        [42, 54, 66]])


Rank-1 arrays arrays are somewhat special in that `torch` can treat them both as column or as row vectors.
Arrays of different rank have different semantics when using them in vector-vector or vector-matrix products, so always make sure you know what shapes you're working with:

In [None]:
print_arr(a1, 'a1\t\t')
print_arr(a_row, 'a_row\t\t')
print_arr(a_col, 'a_col\t\t')

shape=(3,) dtype=torch.int64:
a1		tensor([1, 2, 3])

shape=(1, 3) dtype=torch.int64:
a_row		tensor([[1, 2, 3]])

shape=(3, 1) dtype=torch.int64:
a_col		tensor([[1],
        [2],
        [3]])



In [None]:
# Inner products, but output dimenstions are different
print_arr(torch.matmul(a1, a1), 'a1 @ a1 =\t')
assert torch.matmul(a1, a1) == a1 @ a1

print_arr(torch.matmul(a_row, a1), 'a_row @ a1 =\t')
assert torch.matmul(a_row, a1) == a_row @ a1

print_arr(torch.matmul(a1, a_col), 'a1 @ a_col =\t')
assert torch.matmul(a1, a_col) == a1 @ a_col

print_arr(torch.matmul(a_row, a_col), 'a_row @ a_col =\t')
assert torch.matmul(a_row, a_col) == a_row @ a_col

# Outer product
print_arr(torch.matmul(a_col, a_row), 'a_col @ a_row =\n')
assert torch.all(torch.matmul(a_col, a_row) == a_col @ a_row)

shape=() dtype=torch.int64:
a1 @ a1 =	14

shape=(1,) dtype=torch.int64:
a_row @ a1 =	tensor([14])

shape=(1,) dtype=torch.int64:
a1 @ a_col =	tensor([14])

shape=(1, 1) dtype=torch.int64:
a_row @ a_col =	tensor([[14]])

shape=(3, 3) dtype=torch.int64:
a_col @ a_row =
tensor([[1, 2, 3],
        [2, 4, 6],
        [3, 6, 9]])



#### Non-elementwise operations

Torch provides many useful functions for performing computations on arrays.

In [None]:
x = torch.tensor([[1,2,3],[3,4,5]], dtype=torch.float)
print_arr(x)

shape=(2, 3) dtype=torch.float32:
tensor([[1., 2., 3.],
        [3., 4., 5.]])



In [None]:
print_arr(torch.sum(x))  # Compute sum of all elements
print_arr(torch.mean(x, dim=0))  # Compute mean of each column
print_arr(torch.mean(x, dim=1))  # Compute mean of each row
print_arr(torch.prod(x, dim=1)) # Compute product of each row

shape=() dtype=torch.float32:
18.0

shape=(3,) dtype=torch.float32:
tensor([2., 3., 4.])

shape=(2,) dtype=torch.float32:
tensor([2., 4.])

shape=(2,) dtype=torch.float32:
tensor([ 6., 60.])



In [None]:
# In many cases, it's useful to aggregate but keep the original rank
print_arr(torch.mean(x, dim=0, keepdim=True))
print_arr(torch.mean(x, dim=1, keepdim=True))

print_arr(torch.prod(x, dim=0, keepdim=True))
print_arr(torch.prod(x, dim=1, keepdim=True))

shape=(1, 3) dtype=torch.float32:
tensor([[2., 3., 4.]])

shape=(2, 1) dtype=torch.float32:
tensor([[2.],
        [4.]])

shape=(1, 3) dtype=torch.float32:
tensor([[ 3.,  8., 15.]])

shape=(2, 1) dtype=torch.float32:
tensor([[ 6.],
        [60.]])



You can find the full list of mathematical functions provided by torch in the [documentation](https://pytorch.org/docs/stable/index.html).

### Indexing

We often have the situation where we need to select a part of a tensor.
Indexing works just like in numpy, so let's try it:

In [None]:
x = torch.arange(12).view(3, 4)
print("X", x)

X tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])


In [None]:
print(x[:, 1])  # Second column

tensor([1, 5, 9])


In [None]:
print(x[0])  # First row

tensor([0, 1, 2, 3])


In [None]:
print(x[:2, -1])  # First two rows, last column

tensor([3, 7])


In [None]:
print(x[1:3, :])  # Middle two rows

tensor([[ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])


**More about slicing**

Similar to Python lists, tensors can be sliced. Since they may be multidimensional, you must specify **a slice for each dimension** of the array:

In [None]:
a = torch.tensor([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

print_arr(a)

shape=(3, 4) dtype=torch.int64:
tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])



In [None]:
b = a[:2, 1:3]

print_arr(b)

shape=(2, 2) dtype=torch.int64:
tensor([[2, 3],
        [6, 7]])



A slice of an array is a **view** into the same in-memory data, so modifying it will modify the original array.

In [None]:
# Changing a view
b[0, 0] = 77777

# ...modifies original
a

tensor([[    1, 77777,     3,     4],
        [    5,     6,     7,     8],
        [    9,    10,    11,    12]])

You can also mix integer indexing with slice indexing.
However, doing so will yield an array of **lower rank** than the original array.


Two ways of accessing the data in the middle row of the array.
- Mixing integer indexing with slices yields an array of lower rank
- Using only slices yields an array of the same rank as the original array

In [None]:
a = torch.tensor([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
a

In [None]:
row_r1 = a[1, :]    # Rank 1 view of the second row of a
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
row_r3 = a[[1], :]  # Rank 2 view of the second row of a

print_arr(row_r1)
print_arr(row_r2)
print_arr(row_r3)

In [None]:
# We can make the same distinction when accessing columns of an array:
col_r1 = a[:, 1]    # Rank 1 view of the second column of a
col_r2 = a[:, 1:2]  # Rank 2 view of the second column of a

print_arr(col_r1)
print_arr(col_r2)

**Integer-array indexing**

- When you slice, the resulting array view will always be a subarray of the original array.
- Integer array indexing allows you to construct arbitrary arrays using the data from another array.


In [None]:
a = torch.tensor([[1,2], [3, 4], [5, 6]])
print_arr(a)

In [None]:
# An example of integer array indexing.
# The returned array will have shape (3,)
print_arr(a[ [0, 1, 2], [0, 1, 0] ])

In [None]:
# The above example of integer array indexing is equivalent to this:
print_arr(torch.tensor([a[0, 0], a[1, 1], a[2, 0]]))

One useful trick with integer array indexing is selecting or mutating one element from each row of a matrix:

In [None]:
# Create a new array from which we will select elements
a = torch.tensor([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
a

In [None]:
# Create an array of column indices (notice it can repeat)
col_idx = torch.tensor([0, 2, 0, 1])

# Select one element from each row of a using the indices in b
a[torch.arange(4), col_idx]

In [None]:
# Mutate one element from each row of a using the indices in b
a[torch.arange(4), col_idx] += 1000
a

**Boolean array indexing**

This type of indexing is used to select the elements of an array that satisfy some condition
(similar to MATLAB's logical indexing).

In [None]:
a = torch.tensor([[1,2], [3, 4], [5, 6]])
print_arr(a)

shape=(3, 2) dtype=torch.int64:
tensor([[1, 2],
        [3, 4],
        [5, 6]])



In [None]:
# Find the elements that are bigger than 2;
# this returns a numpy array of Booleans of the same
# shape as a, where each slot of bool_idx tells
# whether that element of a is > 2.
bool_idx = (a > 2)
bool_idx

tensor([[False, False],
        [ True,  True],
        [ True,  True]])

In [None]:
# We use boolean array indexing to construct a rank 1 array
# consisting of the elements of a corresponding to the True values
# of bool_idx
a[a>2]

tensor([3, 4, 5, 6])

### Datatypes

Every tensor is a grid of elements of the same type. `Pytorch` provides a large set of numeric datatypes that you can use to construct arrays.
Pytorch tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype.

Here’s a list of commonly supported data types in PyTorch:

**Floating Point Types**

  *  `torch.float32` or `torch.float`: 32-bit floating-point (default for floating point)
  *  `torch.float64` or `torch.double`: 64-bit floating-point
  *  `torch.float16` or `torch.half`: 16-bit floating-point (useful for mixed-precision training)

**Integer Types**

  *  `torch.int8`: 8-bit signed integer
  *  `torch.uint8`: 8-bit unsigned integer
  *  `torch.int16` or `torch.short`: 16-bit signed integer
  *  `torch.int32` or `torch.int`: 32-bit signed integer (default for integers)
  *  `torch.int64` or `torch.long`: 64-bit signed integer (often used as the default integer type)

**Boolean Type**

  *  `torch.bool`: Boolean type (stores True or False values)

**Complex Number Types**

  * `torch.complex64`: 64-bit complex number (real and imaginary parts as float32)
  * `torch.complex128`: 128-bit complex number (real and imaginary parts as float64)

**Quantized Types** (used in quantization for model optimization)

  *  `torch.qint8`: 8-bit quantized integer
  *  `torch.quint8`: 8-bit unsigned quantized integer
  * `torch.qint32`: 32-bit quantized integer

Here is an example:

In [None]:
x = torch.tensor([1, 2])  # Let torch choose the datatype
y = torch.tensor([1.0, 2.0])  # Let torch choose the datatype
z = torch.tensor([1, 2], dtype=torch.int64)  # Force a particular datatype

x.dtype, y.dtype, z.dtype

(torch.int64, torch.float32, torch.int64)

### Changing and adding dimensions

You can **transpose** dimensions within an array using arbitrary axis permutations.

In [None]:
a = torch.ones((3, 5))
print_arr(a)
print_arr(a.transpose(0, 1))
print_arr(a.T)

shape=(3, 5) dtype=torch.float32:
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])

shape=(5, 3) dtype=torch.float32:
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

shape=(5, 3) dtype=torch.float32:
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])



Use `permute()` to transpose multiple dimensions simultaneously, [documentation about premuting](https://pytorch.org/docs/stable/generated/torch.permute.html#torch.permute).

In [None]:
a = torch.ones((2, 4, 6))
a[1,2,3] = 777

print_arr(a)

print_arr(a.permute(1,0,2))

shape=(2, 4, 6) dtype=torch.float32:
tensor([[[  1.,   1.,   1.,   1.,   1.,   1.],
         [  1.,   1.,   1.,   1.,   1.,   1.],
         [  1.,   1.,   1.,   1.,   1.,   1.],
         [  1.,   1.,   1.,   1.,   1.,   1.]],

        [[  1.,   1.,   1.,   1.,   1.,   1.],
         [  1.,   1.,   1.,   1.,   1.,   1.],
         [  1.,   1.,   1., 777.,   1.,   1.],
         [  1.,   1.,   1.,   1.,   1.,   1.]]])

shape=(4, 2, 6) dtype=torch.float32:
tensor([[[  1.,   1.,   1.,   1.,   1.,   1.],
         [  1.,   1.,   1.,   1.,   1.,   1.]],

        [[  1.,   1.,   1.,   1.,   1.,   1.],
         [  1.,   1.,   1.,   1.,   1.,   1.]],

        [[  1.,   1.,   1.,   1.,   1.,   1.],
         [  1.,   1.,   1., 777.,   1.,   1.]],

        [[  1.,   1.,   1.,   1.,   1.,   1.],
         [  1.,   1.,   1.,   1.,   1.,   1.]]])



Note that an element `[x,y,z]` moves to position `[y,x,z]` after a transpose with this permutation (1,0,2).

Another important feature is **reshaping** an array into different dimensions. More about [reshaping](https://pytorch.org/docs/stable/generated/torch.reshape.html#torch.reshape).

In [None]:
a = torch.randint(0, 100, (3, 6))
print_arr(a)

# Reshape `a` from its original shape (3, 6) to a new shape (2, 9)
# This keeps the total number of elements the same (3*6 = 2*9 = 18)
reshaped_a = torch.reshape(a, (2, 9))
print_arr(reshaped_a)

shape=(3, 6) dtype=torch.int64:
tensor([[29, 98,  4, 72, 44, 47],
        [30, 46, 41, 11, 22, 60],
        [40, 56, 87, 85, 91,  7]])

shape=(2, 9) dtype=torch.int64:
tensor([[29, 98,  4, 72, 44, 47, 30, 46, 41],
        [11, 22, 60, 40, 56, 87, 85, 91,  7]])



When reshaping, we need to make sure to preserve the same number of elements.
Use `-1` in one of the dimensions to tell numpy to "figure it out".

You can also combine multiple arrays with **concatenation** along an arbitrary axis.

In [None]:
# Define tensor `a` with shape (2, 2)
# Define tensor `b` with shape (1, 2)

a = torch.tensor([[1, 2], [3, 4]])
b = torch.tensor([[5, 6]])

print_arr(a)
print_arr(b)

shape=(2, 2) dtype=torch.int64:
tensor([[1, 2],
        [3, 4]])

shape=(1, 2) dtype=torch.int64:
tensor([[5, 6]])



In [None]:
# Concatenate `a` and `b` along dimension 0 (row-wise)
# This stacks `b` as an additional row below `a`, resulting in a tensor with shape (3, 2)
print_arr(torch.cat((a, b), dim=0))

shape=(3, 2) dtype=torch.int64:
tensor([[1, 2],
        [3, 4],
        [5, 6]])



In [None]:
# Transpose `b` to shape (2, 1) so it can be concatenated with `a` along dimension 1 (column-wise)
# Concatenate `a` and the transposed `b` along dimension 1, resulting in a tensor with shape (2, 3)
print_arr(torch.cat((a, b.T), dim=1))

shape=(2, 3) dtype=torch.int64:
tensor([[1, 2, 5],
        [3, 4, 6]])



### Broadcasting

Broadcasting is a powerful mechanism that allows pytorch to work with arrays of **different shapes** when performing arithmetic operations. This mechanism also exists in numpy with the same semantics.

Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

For example, suppose that we want to add a constant vector to each row of a matrix.

In [None]:
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = torch.tensor([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = torch.tensor([1, 0, 1])

print_arr(x,'x=\n')
print_arr(v, '\nv=')

shape=(4, 3) dtype=torch.int64:
x=
tensor([[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12]])

shape=(3,) dtype=torch.int64:

v=tensor([1, 0, 1])



**Naïve approach**: Use a loop.

In [None]:
y = torch.empty_like(x)   # Create an empty matrix with the same shape as x

# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
    y[i, :] = x[i, :] + v

y

tensor([[ 2,  2,  4],
        [ 5,  5,  7],
        [ 8,  8, 10],
        [11, 11, 13]])

This works; however computing explicit loops in Python is **slow**.

**Naïve approach 2**: adding the vector v to each row of the matrix `x` is equivalent to forming a matrix `vv` by stacking multiple copies of `v` vertically, then performing elementwise summation of `x` and `vv`.

We could implement this approach like this:

In [None]:
vv = torch.tile(v, (4, 1))  # Stack 4 copies of v on top of each other
vv

tensor([[1, 0, 1],
        [1, 0, 1],
        [1, 0, 1],
        [1, 0, 1]])

In [None]:
y = x + vv  # Add x and vv elementwise
y

tensor([[ 2,  2,  4],
        [ 5,  5,  7],
        [ 8,  8, 10],
        [11, 11, 13]])

Nice, but:
- A new array was allocated and memory was copied.
- We had to explicitly define how many times to replicate v.

**Broadcasting** allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:

In [None]:
x

tensor([[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12]])

In [None]:
v

tensor([1, 0, 1])

In [None]:
x = torch.tensor([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = torch.tensor([1, 0, 1])

# Add v to each row of x using broadcasting
y = x + v

print(f'shapes: x={tuple(x.shape)}, v={tuple(v.shape)}\n')
print_arr(y)

shapes: x=(4, 3), v=(3,)

shape=(4, 3) dtype=torch.int64:
tensor([[ 2,  2,  4],
        [ 5,  5,  7],
        [ 8,  8, 10],
        [11, 11, 13]])



The line `y = x + v` works even though `x` has shape `(4, 3)` and `v` has shape `(3,)` due to broadcasting; this line works **as if** v actually had shape `(4, 3)`, where each row was a copy of `v`, and the sum was performed elementwise.

Broadcasting two tensors together follows these rules:

1. All input tensors with rank smaller than the input tensor of largest rank, have **1’s prepended to their shapes**.
  
  * Example:

    Tensor `x` has shape `(4, 3)`.
    Tensor `v` has shape `(3)` (1D).

    To align their ranks, Tensor `v`'s shape becomes `(1, 3)` by prepending a dimension of size `1`, so now both tensors have two dimensions.

2. The shape of the output tensor is determined by taking the maximum size in each dimension across all input tensors.

  * Example:

    After aligning ranks, where `x` has shape `(4, 3)` and `v` has shape `(1, 3)`.
    The output shape will be `(4, 3)`, as this is the largest dimension size in each corresponding position.

3. For broadcasting to work, each dimension of the input tensors must either: **Match the size of the output shape in that dimension**, or **Have a size of 1, which allows it to be "stretched" to match the output size**.

  * Example:

    For `x` with shape `(4, 3)` and `v` with shape `(1, 3)`, both can broadcast to an output shape of `(4, 3)`.
    The first dimension of `v` is `1`, so it stretches to `4` to match `x`.

4. If a tensor has a dimension size of 1, it means there is only one value along that dimension. Broadcasting will automatically repeat this value across the expanded dimension to match the size of the other tensors.

 * Example:

    For `v` with shape `(1, 3)`, the single row (size 1) will be repeated `4` times to match the `(4, 3)` shape of `x`.

Our example in short:
- `x` has shape `(4,3)`
- `v` has shape `(3,)`.

Following the Broadcasting logic, we can say the following is equivalent to what happened:
1. `v` has less dims than `x` so a dimension of `1` is **prepended** -> `v` is now `(1, 3)`.
1. Output shape will be `(max(1,4), max(3,3)) = (4,3)`.
1. Dim 1 of `v` matches exactly (3): so it's clear which data to use.
1. Dim 0 is exactly 1 for `v` and 4 for `x`: we can use the first data entry (row 0) of `v` for each time any of its rows is accessed. This is effectively like converting `v` from `(1,3)` to `(4,3)` by replicating.

Broadcasting is incredibly useful and necessary for writing **vectorized** code,
i.e. code that avoids explicit python loops which are very slow.
Instead, this approach leveraged the underlying C implementation.

#### Another Example: Calculating an outer-product with elementsize product and broadcasting:

(go over alone)

In [None]:
# Compute outer product of the vectors
v = torch.tensor([1,2,3])  # v has shape (3,)
w = torch.tensor([4,5])    # w has shape (2,)
print_arr(v)
print_arr(w)

To compute an outer product, we first reshape `v` to be a column
vector of shape `(3, 1)`; we can then broadcast it against `w` to yield
an output of shape `(3, 2)`, which is the outer product of `v` and `w`:

In [None]:
# (3,1) * (2,) -> (3,1) * (1, 2) -> (3, 2) * (3, 2)
torch.reshape(v, (3, 1)) * w # note that * is elementwise!

In [None]:
torch.reshape(v, (3, 1))

In [None]:
# Multiply a matrix by a constant:
x = torch.ones((2,3))

# x has shape (2, 3). Numpy treats scalars as arrays of shape ();
# these can be broadcast together to shape (2, 3).

# (2,3) * () -> (2,3) * (1,1) -> (2,3) * (2,3)
x * 2

Broadcasting typically makes your code more concise and faster, so you should strive to use it where possible.

### Dynamic Computation Graph and Backpropagation



One of the main reasons for using PyTorch in Deep Learning projects is that we can automatically get **gradients/derivatives** of functions that we define.
We will mainly use PyTorch for implementing neural networks, and they are just fancy functions.
If we use weight matrices in our function that we want to learn, then those are called the **parameters** or simply the **weights**.

If our neural network would output a single scalar value, we would talk about taking the **derivative**, but you will see that quite often we will have **multiple** output variables ("values"); in that case we talk about **gradients**.
It's a more general term.

Given an input $\mathbf{x}$, we define our function by **manipulating** that input, usually by matrix-multiplications with weight matrices and additions with so-called bias vectors.
As we manipulate our input, we are automatically creating a **computational graph**.
This graph shows how to arrive at our output from our input.
PyTorch is a **define-by-run** framework; this means that we can just do our manipulations, and PyTorch will keep track of that graph for us.
Thus, we create a dynamic computation graph along the way.

So, to recap: the only thing we have to do is to compute the **output**, and then we can ask PyTorch to automatically get the **gradients**.

> **Why do we want gradients?**

> Consider that we have defined a function, a neural net, that is supposed to compute a certain output $y$ for an input vector $\mathbf{x}$.
We then define an **error measure** that tells us how wrong our network is; how bad it is in predicting output $y$ from input $\mathbf{x}$.
Based on this error measure, we can use the gradients to **update** the weights $\mathbf{W}$ that were responsible for the output, so that the next time we present input $\mathbf{x}$ to our network, the output will be closer to what we want.

The first thing we have to do is to specify which tensors require gradients.
By default, when we create a tensor, it does not require gradients.

In [None]:
x = torch.ones((3,))
print(x.requires_grad)

False


In [None]:
x

tensor([1., 1., 1.])

We can change this for an existing tensor using the function `requires_grad_()` (underscore indicating that this is a in-place operation).
Alternatively, when creating a tensor, you can pass the argument
`requires_grad=True` to most initializers we have seen above.

In [None]:
x.requires_grad_(True)
print(x.requires_grad)

# alternative: set from creation e.g. x = torch.ones((3,), requires_grad=True)

True


In order to get familiar with the concept of a computation graph, we will create one for the following function:

$$y = \frac{1}{|x|}\sum_i \left[(x_i + 2)^2 + 3\right]$$

You could imagine that $x$ are our parameters, and we want to optimize (either maximize or minimize) the output $y$.
For this, we want to obtain the gradients $\partial y / \partial \mathbf{x}$.
For our example, we'll use $\mathbf{x}=[0,1,2]$ as our input.

In [None]:
x = torch.arange(3, dtype=torch.float32, requires_grad=True)  # Only float tensors can have gradients
print("X", x)

X tensor([0., 1., 2.], requires_grad=True)


In [None]:
a = x + 2
b = a**2
c = b + 3
y = c.mean()
print("Y", y)

# y = (((x+2)**2)+3).mean()

Y tensor(12.6667, grad_fn=<MeanBackward0>)


Now let's build the computation graph step by step.
You can combine multiple operations in a single line, but we will
separate them here to get a better understanding of how each operation
is added to the computation graph.

In [None]:
a = x + 2
b = a**2
c = b + 3
y = c.mean()
print("Y", y)

Y tensor(12.6667, grad_fn=<MeanBackward0>)


Using the statements above, we have created a computation graph that looks similar to the figure below:

<center style="width: 100%"><img src="https://github.com/Lightning-AI/lightning-tutorials/raw/main/course_UvA-DL/01-introduction-to-pytorch/pytorch_computation_graph.svg" width="200px"></center>

We calculate $a$ based on the inputs $x$ and the constant $2$, $b$ is $a$ squared, and so on.
The visualization is an abstraction of the dependencies between inputs and outputs of the operations we have applied.
Each node of the computation graph has automatically defined a function for calculating the gradients with respect to its inputs, `grad_fn`.
You can see this when we printed the output tensor $y$.
This is why the computation graph is usually visualized in the reverse direction (arrows point from the result to the inputs).
We can perform backpropagation on the computation graph by calling the
function `backward()` on the last output, which effectively calculates
the gradients for each tensor that has the property
`requires_grad=True`:

In [None]:
y.backward()

In [None]:
x

tensor([0., 1., 2.], requires_grad=True)

`x.grad` will now contain the gradient $\partial y/ \partial \mathcal{x}$, and this gradient indicates how a change in $\mathbf{x}$ will affect output $y$ given the current input $\mathbf{x}=[0,1,2]$:

In [None]:
print(x.grad)

tensor([1.3333, 2.0000, 2.6667])


We can also verify these gradients by hand.
We will calculate the gradients using the chain rule, in the same way as PyTorch did it:

$$\frac{\partial y}{\partial x_i} = \frac{\partial y}{\partial c_i}\frac{\partial c_i}{\partial b_i}\frac{\partial b_i}{\partial a_i}\frac{\partial a_i}{\partial x_i}$$

Note that we have simplified this equation to index notation, and by using the fact that all operation besides the mean do not combine the elements in the tensor.
The partial derivatives are:

$$
\frac{\partial a_i}{\partial x_i} = 1,\hspace{1cm}
\frac{\partial b_i}{\partial a_i} = 2\cdot a_i\hspace{1cm}
\frac{\partial c_i}{\partial b_i} = 1\hspace{1cm}
\frac{\partial y}{\partial c_i} = \frac{1}{3}
$$

Hence, with the input being $\mathbf{x}=[0,1,2]$, our gradients are $\partial y/\partial \mathbf{x}=[4/3,2,8/3]$.
The previous code cell should have printed the same result.

### GPU support



A crucial feature of PyTorch is the support of GPUs, short for Graphics Processing Unit.
A GPU can perform many thousands of small operations in parallel, making it very well suitable for performing large matrix operations in neural networks.
When comparing GPUs to CPUs, we can list the following main differences (credit: [Kevin Krewell, 2009](https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/))

<center style="width: 100%"><img src="https://github.com/Lightning-AI/lightning-tutorials/raw/main/course_UvA-DL/01-introduction-to-pytorch/comparison_CPU_GPU.png" width="700px"></center>

CPUs and GPUs have both different advantages and disadvantages, which is why many computers contain both components and use them for different tasks.
In case you are not familiar with GPUs, you can read up more details in this [NVIDIA blog post](https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/) or [here](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html).

GPUs can accelerate the training of your network up to a factor of $100$ which is essential for large neural networks.
PyTorch implements a lot of functionality for supporting GPUs (mostly those of NVIDIA due to the libraries [CUDA](https://developer.nvidia.com/cuda-zone) and [cuDNN](https://developer.nvidia.com/cudnn)).
First, let's check whether you have a GPU available:

In [None]:
gpu_avail = torch.cuda.is_available()
print(f"Is the GPU available? {gpu_avail}")

Is the GPU available? False


If you have a GPU on your computer but the command above returns False, make sure you have the correct CUDA-version installed.
The `dl2020` environment comes with the CUDA-toolkit 10.1, which is selected for the Lisa supercomputer.
Please change it if necessary (CUDA 10.2 is currently common).
On Google Colab, make sure that you have selected a GPU in your runtime setup (in the menu, check under `Runtime -> Change runtime type`).

By default, all tensors you create are stored on the CPU.
We can push a tensor to the GPU by using the function `.to(...)`, or `.cuda()`.
However, it is often a good practice to define a `device` object in your code which points to the GPU if you have one, and otherwise to the CPU.
Then, you can write your code with respect to this device object, and it allows you to run the same code on both a CPU-only system, and one with a GPU.
Let's try it below.
We can specify the device as follows:

In [None]:
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
print("Device", device)

Device cpu


Now let's create a tensor and push it to the device:

In [None]:
x = torch.zeros(2, 3)
x = x.to(device)
print("X", x)

X tensor([[0., 0., 0.],
        [0., 0., 0.]])


In case you have a GPU, you should now see the attribute `device='cuda:0'` being printed next to your tensor.
The zero next to cuda indicates that this is the zero-th GPU device on your computer.
PyTorch also supports multi-GPU systems, but this you will only need once you have very big networks to train (if interested, see the [PyTorch documentation](https://pytorch.org/docs/stable/distributed.html#distributed-basics)).
We can also compare the runtime of a large matrix multiplication on the CPU with a operation on the GPU:

In [None]:
x = torch.randn(5000, 5000)

# CPU version
start_time = time.time()
_ = torch.matmul(x, x)
end_time = time.time()
print(f"CPU time: {(end_time - start_time):6.5f}s")

# GPU version
if torch.cuda.is_available():
    x = x.to(device)
    # CUDA is asynchronous, so we need to use different timing functions
    start = torch.cuda.Event(enable_timing=True)
    end = torch.cuda.Event(enable_timing=True)
    start.record()
    _ = torch.matmul(x, x)
    end.record()
    torch.cuda.synchronize()  # Waits for everything to finish running on the GPU
    print(f"GPU time: {0.001 * start.elapsed_time(end):6.5f}s")  # Milliseconds to seconds

Depending on the size of the operation and the CPU/GPU in your system, the speedup of this operation can be >50x.
As `matmul` operations are very common in neural networks, we can already see the great benefit of training a NN on a GPU.
The time estimate can be relatively noisy here because we haven't run it for multiple times.
Feel free to extend this, but it also takes longer to run.

**Credits**

Parts of this tutorial were adapted from [CS236781 Technion Tutorial 00](https://github.com/vistalab-technion/cs236781-tutorials/tree/master/t00%20-%20python%2C%20numpy%2C%20torch) which was written by [Aviv A. Rosenberg](https://avivr.net).<br>


Some images in this tutorial were taken and/or adapted from the following sources:
- https://thegradient.pub/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry

The Python section here was adapted from:
- [CS231n Python tutorial](http://cs231n.github.io/python-numpy-tutorial/) by Justin Johnson.