A.S. Lundervold, 14.10.2022

# The building blocks of neural networks, Part 2: Tensors and tensor operations

> This two-part tutorial is meant to increase your familiarity with the basics of PyTorch and the basic building blocks of artificial neural networks. 

> This notebook is partly based on Chapter 2 of Chollet's text book "Deep learning with Python", 2nd edition: https://livebook.manning.com/book/deep-learning-with-python-second-edition.

As deep neural networks consist of a set of chained operations on what's called _tensors_, we'll take a closer look at _tensors_ and _tensor operations_. 

**Plan:**

1. Define tensors
2. Vocabulary and examples
3. Tensor operations
4. A quick linear algebra refresher: geometric interpretations of tensor operations


**Takeaway**:

> Our main takeaway will be that **deep neural networks can be viewed as a long chain of geometric transformations**

# Setup

In [1]:
# This is a quick check of whether the notebook is currently running on Google Colaboratory, 
# as that makes some difference for the code below.
try:
    import colab
    colab=True
except:
    colab=False

In [2]:
#if colab:
#    !pip3 install -U torch torchvision

In [3]:
%matplotlib inline
import numpy as np, matplotlib.pyplot as plt, pandas as pd, sklearn.datasets
from pathlib import Path

Set up data directories:

In [4]:
NB_DIR = Path.cwd()
# Change this if you want to store the images that are downloaded
# below elsewhere on your computer.
if colab:
    from google.colab import drive
    drive.mount("/content/gdrive")
    DATADIR = Path("/content/gdrive/MyDrive/Colab Notebooks/data")
    DATADIR.mkdir(exist_ok=True)
if not colab:
    DATADIR = Path.home()/'data'
    DATADIR.mkdir(exist_ok=True)

In [5]:
import torch
import torchvision
import torch.nn.functional as F
import torchvision.transforms as transforms

# Load some data

In [6]:
transform = transforms.Compose([
    transforms.ToTensor()
])

mnist = torchvision.datasets.MNIST(root=DATADIR, train=True, download=True, transform=transform)
cifar = torchvision.datasets.CIFAR10(root=DATADIR, train=True, download=True, transform=transform)

Files already downloaded and verified


In [7]:
housing = sklearn.datasets.fetch_california_housing()

housing_df = pd.DataFrame(housing.data, columns=housing.feature_names)

# Tensors

Tensors are multidimensional arrays. 

## Vocabulary: the rank of a tensor

The **rank** of a tensor is the **number of axes**. In Pytorch this is called the **number of dimensions**, or `ndim`. 

The **shape** of a tensor is the number of dimensions along each axis.

The **data type** of a tensor is the data type of the data in the tensor. As opposed to more general arrays (like f.ex. NumPy arrays), a tensor has to have the same datatype for all its items. This combined with GPUs or other accelerators make linear algebra computations immensely more efficient using tensors. 

## Rank 0 tensors: scalars

Rank 0 tensors stores scalars (integers or floats).

In [8]:
data = 42

tns = torch.tensor(data)

In [9]:
tns

tensor(42)

In [10]:
tns.dtype

torch.int64

In [11]:
tns.shape

torch.Size([])

In [12]:
tns.ndim

0

...

## Rank 1 tensors: vectors

Rank 1 tensors are _vectors_ or _arrays of numbers_.

In [13]:
data = [12,13,14]

tns = torch.tensor(data)

In [14]:
tns

tensor([12, 13, 14])

In [15]:
tns.dtype

torch.int64

In [16]:
tns.shape

torch.Size([3])

In [17]:
tns.ndim

1

> **Warning:** When speaking about vectors, the **dimension of a vector** is the number of entries in the vector. This can be a bit confusing. The number of dimensions of the vector [1,2,3] is 3, while its dimension as a tensor is 1. It is therefore safer to use the word **rank** when referring to the tensor dimension (as in, [1,2,3] is a rank 1 tensor). In other words, it has only one **axis**.

### Examples

You'll often deal with rank 1 tensors. For example, each instance of a tabular data set can be represented as vectors containing all the corresponding feature values. 

In [18]:
housing_df.head(1)

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23


## Rank 2 tensors: matrices

Rank 2 tensors are what corresponds to standard 2D matrices or arrays:

In [19]:
data = [[12,13,14], [15,16,17]]

tns = torch.tensor(data)

In [20]:
tns

tensor([[12, 13, 14],
        [15, 16, 17]])

In [21]:
tns.dtype

torch.int64

In [22]:
tns.shape

torch.Size([2, 3])

In [23]:
tns.ndim

2

### Examples

A common way to end up with rank 2 tensors in machine learning is as representation of tabular data sets. Each data instance consists of a vector (rank 1 tensor) containing feature values (think price, color, age, etc), and a batch of data is a number of such instances. 

You end up with a matrix where the first axis is the sample axis and the second axis is the feature axis: (`samples`, `features`)

In [24]:
housing_df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


In [25]:
tns = torch.tensor(housing.data)

In [26]:
tns

tensor([[   8.3252,   41.0000,    6.9841,  ...,    2.5556,   37.8800,
         -122.2300],
        [   8.3014,   21.0000,    6.2381,  ...,    2.1098,   37.8600,
         -122.2200],
        [   7.2574,   52.0000,    8.2881,  ...,    2.8023,   37.8500,
         -122.2400],
        ...,
        [   1.7000,   17.0000,    5.2055,  ...,    2.3256,   39.4300,
         -121.2200],
        [   1.8672,   18.0000,    5.3295,  ...,    2.1232,   39.4300,
         -121.3200],
        [   2.3886,   16.0000,    5.2547,  ...,    2.6170,   39.3700,
         -121.2400]], dtype=torch.float64)

In [27]:
tns.shape

torch.Size([20640, 8])

In [28]:
tns.dtype

torch.float64

In [29]:
tns.ndim

2

Another way to end up with rank 2 tensors are time series or sequence data: (`timesteps`,`features`).

## Tensors of rank 3 and more

If you stack several rank 2 tensors you'll obtain a rank 3 tensor. If you stack rank 3 tensors, you'll have a rank 4 tensor. And so on.

In [30]:
tns1 = torch.tensor(
            [[1,2,3], [4,5,6]]
        )

tns2 = torch.tensor(
            [[7,8,9], [10,11,12]]
        )

In [31]:
tns1.shape

torch.Size([2, 3])

In [32]:
tns2.shape

torch.Size([2, 3])

In [33]:
tns1.ndim, tns2.ndim

(2, 2)

In [34]:
tns = torch.stack((tns1, tns2))

In [35]:
tns

tensor([[[ 1,  2,  3],
         [ 4,  5,  6]],

        [[ 7,  8,  9],
         [10, 11, 12]]])

In [36]:
tns.shape

torch.Size([2, 2, 3])

In [37]:
tns.ndim

3

In [38]:
tns4 = torch.stack((tns, tns))

In [39]:
tns4.ndim

4

### Examples

If you're dealing with **time series or sequence data** (i.e., where each instance is a rank 2 tensor (`timesteps`,`features`), you'll end up with rank 3 tensors: (`samples`, `timesteps`, `features`). 

Individual **images** are also typically represented as rank 3 tensors. The three axes of an image tensor is height, width and channel (typically color channel): (`height`, `width`, `channels`). For color images, the three channels are typically R, G and B. For grayscale images one typically inserts a single channel axis.

If you're dealing with image data consisting of multiple images then you'll typically represent your data as rank 4 tensors: (`samples`, `height`, `width`, `channels`)

In [40]:
cifar.data[0].shape

(32, 32, 3)

In [41]:
cifar.data[0].ndim

3

In [42]:
mnist.data[0].shape

torch.Size([28, 28])

In [43]:
mnist_example = mnist.data[0].unsqueeze(-1)

In [44]:
mnist_example.shape

torch.Size([28, 28, 1])

In [45]:
mnist_example.ndim

3

A batch of images will then be a rank 4 tensor:

In [46]:
cifar.data.shape

(50000, 32, 32, 3)

In [47]:
cifar.data.ndim

4

#### 3D images

<img src="https://upload.wikimedia.org/wikipedia/commons/c/c5/MRI_brain_sagittal_section.jpg">

#### Video

A video is a series of image frames. As the images are rank 3 tensors, a video can be represented as a rank 4 tensor by stacking the frames. A batch of videos will then be a rank 5 tensor: 

`(samples, frames, height, width, color_depth)` 

# Tensor operations: A quick linear algebra refresher and deep learning from a geometric point of view

All the operations in a deep neural network are based on a few simple tensor operations, like addition, multiplication and simple nonlinear functions applied to tensors. 

Since tensors are multidimensional arrays. Therefore, **linear algebra** is at the heart of deep learning. As linear algebra is very **geometric**, this gives us a geometric point of view for deep learning. 

To see what I mean by this, let's quickly refresh of your linear algebra knowledge!

> See the accompanying slides here: [URL](https://alexander.lundervold.com/slides/PCS956-DL-linalg/PCS-DL-linalg.html)