A.S. Lundervold, 29.01.2024

# The building blocks of neural networks, Part 1: Tensors and tensor operations

> This two-part tutorial is meant to increase your familiarity with the basics of PyTorch and the basic building blocks of artificial neural networks. 

> This notebook is partly based on Chapter 2 of Chollet's text book "Deep learning with Python", 2nd edition: https://livebook.manning.com/book/deep-learning-with-python-second-edition.

> Note that this notebook is accompanying what we did in the lectures, and is not complete in itself. It is meant as a reference for you to look at later.

As deep neural networks consist of a set of chained operations on what's called _tensors_, we'll take a closer look at _tensors_ and _tensor operations_. 

**Plan:**

1. Define tensors
2. Vocabulary and examples
3. Tensor operations
4. A quick linear algebra refresher: geometric interpretations of tensor operations


**Takeaway**:

> Our main takeaway will be that **deep neural networks can be viewed as a long chain of geometric transformations**

# Setup

In [2]:
# This is a quick check of whether the notebook is currently running on Google Colaboratory
# or on Kaggle, as that makes some difference for the code below.
try:
    import colab
    colab=True
except:
    colab=False

import os
kaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')

In [3]:
if (colab or kaggle):
    !pip3 install torch torchvision

In [4]:
%matplotlib inline
import numpy as np, matplotlib.pyplot as plt, pandas as pd, sklearn.datasets
from pathlib import Path

Set up data directories:

In [5]:
NB_DIR = Path.cwd()
# Change this if you want to store the images that are downloaded
# below elsewhere on your computer.
if colab:
    from google.colab import drive
    drive.mount("/content/gdrive")
    DATADIR = Path("/content/gdrive/MyDrive/Colab Notebooks/data")
    DATADIR.mkdir(exist_ok=True)
if not colab:
    DATADIR = Path.home()/'data'
    DATADIR.mkdir(exist_ok=True)

In [6]:
import torch
import torchvision
import torch.nn.functional as F
import torchvision.transforms as transforms

# Load some data

In [7]:
transform = transforms.Compose([
    transforms.ToTensor()
])

mnist = torchvision.datasets.MNIST(root=DATADIR, train=True, download=True, transform=transform)
cifar = torchvision.datasets.CIFAR10(root=DATADIR, train=True, download=True, transform=transform)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to C:\Users\prebe\data\MNIST\raw\train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 14654355.66it/s]


Extracting C:\Users\prebe\data\MNIST\raw\train-images-idx3-ubyte.gz to C:\Users\prebe\data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to C:\Users\prebe\data\MNIST\raw\train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 14305112.64it/s]

Extracting C:\Users\prebe\data\MNIST\raw\train-labels-idx1-ubyte.gz to C:\Users\prebe\data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz





Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to C:\Users\prebe\data\MNIST\raw\t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 11694635.02it/s]


Extracting C:\Users\prebe\data\MNIST\raw\t10k-images-idx3-ubyte.gz to C:\Users\prebe\data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to C:\Users\prebe\data\MNIST\raw\t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 2224489.58it/s]

Extracting C:\Users\prebe\data\MNIST\raw\t10k-labels-idx1-ubyte.gz to C:\Users\prebe\data\MNIST\raw






Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to C:\Users\prebe\data\cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:13<00:00, 12993458.73it/s]


Extracting C:\Users\prebe\data\cifar-10-python.tar.gz to C:\Users\prebe\data


In [8]:
housing = sklearn.datasets.fetch_california_housing()

housing_df = pd.DataFrame(housing.data, columns=housing.feature_names)

# Tensors

Tensors are multidimensional arrays. 

## Vocabulary: the rank of a tensor

The **rank** of a tensor, often referred to as its number of dimensions in PyTorch (denoted as `ndim`), represents the number of axes it has. For example, a matrix with rows and columns is a rank-2 tensor.

The **shape** of a tensor indicates the size along each of its axes. For instance, a tensor with a shape of (3, 2) signifies 3 elements along one axis (rows) and 2 along the other (columns).

A tensor's **data type** refers to the type of data stored in the tensor. Unlike general-purpose arrays, tensors require a uniform data type for all elements. This uniformity, particularly when coupled with GPUs or other accelerators, significantly enhances the efficiency of linear algebra computations, as it optimizes memory usage and allows for more effective parallel processing.

## Rank 0 tensors: scalars

Rank 0 tensors stores scalars (integers or floats).

In [9]:
data = 42

tns = torch.tensor(data)

In [10]:
tns

tensor(42)

In [11]:
tns.dtype

torch.int64

In [12]:
tns.shape

torch.Size([])

In [13]:
tns.ndim

0

## Rank 1 tensors: vectors

Rank 1 tensors are _vectors_ or _arrays of numbers_.

In [14]:
data = [12,13,14]

tns = torch.tensor(data)

In [15]:
tns

tensor([12, 13, 14])

In [16]:
tns.dtype

torch.int64

In [17]:
tns.shape

torch.Size([3])

In [18]:
tns.ndim

1

> **Warning:** When speaking about vectors, the **dimension of a vector** is the number of entries in the vector. This can be a bit confusing. The number of dimensions of the vector [1,2,3] is 3, while its dimension as a tensor is 1. It is therefore safer to use the word **rank** when referring to the tensor dimension (as in, [1,2,3] is a rank 1 tensor). In other words, it has only one **axis**.

### Examples

You'll often deal with rank 1 tensors. For example, each instance of a tabular data set can be represented as vectors containing all the corresponding feature values. 

In [20]:
housing_df.head(1)

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23


## Rank 2 tensors: matrices

Rank 2 tensors are what corresponds to standard 2D matrices or arrays:

In [21]:
data = [[12,13,14], [15,16,17]]

tns = torch.tensor(data)

In [22]:
tns

tensor([[12, 13, 14],
        [15, 16, 17]])

In [23]:
tns.dtype

torch.int64

In [24]:
tns.shape

torch.Size([2, 3])

In [25]:
tns.ndim

2

### Examples

A common way to end up with rank 2 tensors in machine learning is as representation of tabular data sets. Each data instance consists of a vector (rank 1 tensor) containing feature values (think price, color, age, etc), and a batch of data is a number of such instances. 

You end up with a matrix where the first axis is the sample axis and the second axis is the feature axis: (`samples`, `features`)

In [26]:
housing_df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


In [27]:
tns = torch.tensor(housing.data)

In [28]:
tns

tensor([[   8.3252,   41.0000,    6.9841,  ...,    2.5556,   37.8800,
         -122.2300],
        [   8.3014,   21.0000,    6.2381,  ...,    2.1098,   37.8600,
         -122.2200],
        [   7.2574,   52.0000,    8.2881,  ...,    2.8023,   37.8500,
         -122.2400],
        ...,
        [   1.7000,   17.0000,    5.2055,  ...,    2.3256,   39.4300,
         -121.2200],
        [   1.8672,   18.0000,    5.3295,  ...,    2.1232,   39.4300,
         -121.3200],
        [   2.3886,   16.0000,    5.2547,  ...,    2.6170,   39.3700,
         -121.2400]], dtype=torch.float64)

In [29]:
tns.shape

torch.Size([20640, 8])

In [30]:
tns.dtype

torch.float64

In [31]:
tns.ndim

2

Another way to end up with rank 2 tensors are time series or sequence data: (`timesteps`,`features`).

## Tensors of rank 3 and more

If you stack several rank 2 tensors you'll obtain a rank 3 tensor. If you stack rank 3 tensors, you'll have a rank 4 tensor. And so on.

In [32]:
tns1 = torch.tensor(
            [[1,2,3], [4,5,6]]
        )

tns2 = torch.tensor(
            [[7,8,9], [10,11,12]]
        )

In [33]:
tns1.shape

torch.Size([2, 3])

In [34]:
tns2.shape

torch.Size([2, 3])

In [35]:
tns1.ndim, tns2.ndim

(2, 2)

In [36]:
tns = torch.stack((tns1, tns2))

In [37]:
tns

tensor([[[ 1,  2,  3],
         [ 4,  5,  6]],

        [[ 7,  8,  9],
         [10, 11, 12]]])

In [38]:
tns.shape

torch.Size([2, 2, 3])

In [39]:
tns.ndim

3

In [40]:
tns4 = torch.stack((tns, tns))

In [41]:
tns4.ndim

4

### Examples

If you're dealing with **time series or sequence data** (i.e., where each instance is a rank 2 tensor (`timesteps`,`features`), you'll end up with rank 3 tensors: (`samples`, `timesteps`, `features`). 

Individual **images** are also typically represented as rank 3 tensors. The three axes of an image tensor is height, width and channel (typically color channel): (`height`, `width`, `channels`). For color images, the three channels are typically R, G and B. For grayscale images one typically inserts a single channel axis.

If you're dealing with image data consisting of multiple images then you'll typically represent your data as rank 4 tensors: (`samples`, `height`, `width`, `channels`)

In [42]:
cifar.data[0].shape

(32, 32, 3)

In [43]:
cifar.data[0].ndim

3

In [44]:
mnist.data[0].shape

torch.Size([28, 28])

In [45]:
mnist_example = mnist.data[0].unsqueeze(-1)

In [46]:
mnist_example.shape

torch.Size([28, 28, 1])

In [47]:
mnist_example.ndim

3

A batch of images will then be a rank 4 tensor:

In [48]:
cifar.data.shape

(50000, 32, 32, 3)

In [49]:
cifar.data.ndim

4

#### 3D images

<img src="https://upload.wikimedia.org/wikipedia/commons/c/c5/MRI_brain_sagittal_section.jpg">

#### Video

A video is a series of image frames. As the images are rank 3 tensors, a video can be represented as a rank 4 tensor by stacking the frames. A batch of videos will then be a rank 5 tensor: 

`(samples, frames, height, width, color_depth)` 

# Tensor operations and linear algebra

All the operations in a deep neural network are based on a few simple tensor operations, like addition, multiplication and simple nonlinear functions applied to tensors. 

Since tensors are multidimensional arrays. Therefore, **linear algebra** is at the heart of deep learning. As linear algebra is very **geometric**, this gives us a geometric point of view for deep learning. 

We discussed this in class. Here's a simple example: 

## Example: matrix multiplication

Consider a neural network layer transforming a 2D input $(x, y)$ to another 2D output. This transformation can be represented by multiplying the input vector by a 2x2 matrix $W$ (the weights of the layer).

Given an input vector $[x, y]$ and a weight matrix $W = \begin{bmatrix} w_{11} & w_{12} \\ w_{21} & w_{22} \end{bmatrix}$, the output is computed as:

$$
\text{Output} = \begin{bmatrix} w_{11} & w_{12} \\ w_{21} & w_{22} \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} w_{11} \cdot x + w_{12} \cdot y \\ w_{21} \cdot x + w_{22} \cdot y \end{bmatrix}
$$

This operation can be understood as transforming the 2D input in a new space defined by the weight matrix $W$. The transformation can include operations like rotation, scaling, and shearing, determined by the values of $w_{11}, w_{12}, w_{21},$ and $w_{22}$.


Here's a visualization of this operation:

In [50]:
import ipywidgets as widgets
from IPython.display import display

In [51]:
def plot_vectors(input_vector, transformed_vector, weight_matrix_str):
    """Plot the original and transformed vectors."""
    plt.figure(figsize=(6, 6))
    plt.plot([0, input_vector[0]], [0, input_vector[1]], 'bo-', label='Original Vector')
    plt.plot([0, transformed_vector[0]], [0, transformed_vector[1]], 'ro-', label='Transformed Vector')
    plt.xlim(-10, 10)
    plt.ylim(-10, 10)
    plt.axhline(0, color='grey', lw=1)
    plt.axvline(0, color='grey', lw=1)
    plt.grid(True)
    plt.legend()
    plt.text(1, 5, weight_matrix_str, fontsize=12, bbox=dict(facecolor='white', alpha=0.8))
    plt.show()

def plot_transformed_point(w11, w12, w21, w22, input_x, input_y):
    """Calculate the transformed vector and plot it."""
    weight_matrix = np.array([[w11, w12], [w21, w22]])
    input_vector = np.array([input_x, input_y])
    transformed_vector = np.dot(weight_matrix, input_vector)
    weight_matrix_str = f"Weight Matrix:\n[[{w11:.2f}, {w12:.2f}]\n [{w21:.2f}, {w22:.2f}]]"
    plot_vectors(input_vector, transformed_vector, weight_matrix_str)

In [52]:
def create_slider(description, value=1.0, min=-2.0, max=2.0, step=0.1):
    """Create a slider with the given properties."""
    return widgets.FloatSlider(value=value, min=min, max=max, step=step, description=description)

# Create a dictionary to store the sliders
sliders = {
    'w11': create_slider('Weight w11:'),
    'w12': create_slider('Weight w12:', value=0.0),
    'w21': create_slider('Weight w21:', value=0.0),
    'w22': create_slider('Weight w22:'),
    'input_x': create_slider('Input x:', min=-5.0, max=5.0),
    'input_y': create_slider('Input y:', min=-5.0, max=5.0),
}

# Assembling the UI
ui = widgets.VBox(list(sliders.values()))
out = widgets.interactive_output(plot_transformed_point, sliders)

In [53]:
display(ui, out)

VBox(children=(FloatSlider(value=1.0, description='Weight w11:', max=2.0, min=-2.0), FloatSlider(value=0.0, de…

Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '<Figure size 600x600 with 1 Axes>', 'i…