# 00. PyTorch Fundamentals

https://www.learnpytorch.io/00_pytorch_fundamentals/

## Table of Contents

- [All Links in Document](#links)
- [Theoretical Introduction](#theory)
- [Introduction to Tensors](#intro)
- [Random Tensors](#random)
- [Zeros and Ones](#zeroone)
- [Range of Tensors and Tensor-like](#rangelike)
- [Tensor Data Types](#datatypes)
- [Manipulating Tensors](#manipulate)
- [Tensor Aggregation](#tensoragg)
- [Indexing](#indexing)
- [Reproducability](#reproduce)
- [GPU Usage](#gpus)
- [Exercises](#exercises)

## All Links in Document <a name="links" />

- https://www.learnpytorch.io/00_pytorch_fundamentals/
- https://paperswithcode.com/trends
- What's a Tensor? https://www.youtube.com/watch?v=f5liqUk0ZTw
- https://pytorch.org/docs/stable/tensors
- https://www.mathsisfun.com/algebra/matrix-multiplying
- http://matrixmultiplication.xyz/
- https://pytorch.org/docs/stable/notes/randomness
- https://en.wikipedia.org/wiki/Random_seed
- https://pytorch.org/get-started/locally/
- https://pytorch.org/docs/stable/notes/cuda#best-practices

## Theoretical Introduction <a name="theory" />

PyTorch is very popular! https://paperswithcode.com/trends

Machine learning can be explained as turning data into into numbers and finding patterns in those numbers. Patterns are found by the computer using code & math. Typically, Artificial Intelligence (AI) is the overarching topic or umbrella term. Within, a subsection of Machine Learning exists, and Deep Learning is another subsection within Machine Learning.

<img src="images/00_difference.png" width="750"/>

The above image shows a typical difference between traditional programming and Machine Learning. With traditional programming, the necessary inputs are given, and what needs to be done with them is designed upfront. The result is always the same, and it's akin to how a human would prepare a dish. With machine learning, it's usually up to the algorithm to discover what the best procedure to follow is. Important to note that the above example is a supervised learning method as the inputs and desired output are already given to the model.

But why use Machine Learning or Deep Learning at all? It's possible that for very complex problems, it's not possible to figure out all the rules. This is where Deep Learning can help out. On the flipside, rule 1 of Google's Machine Learning Handbook states: **"If you can build a simple rule-based system that doesn't require machine learning, do that."** Thus, it's important to keep in mind that Deep Learning is not a panacea for every situation. So what should Deep Learning be used for? Examples are:
- **Problems with long lists of rules** - when the traditional approach fails, machine learning/deep learning may help
- **Continually changing environments** - deep learning can adapt ('learn') to new scenarios
- **Discovering insights within large collections of data** - can you imagine trying to hand-craft rules for what 101 different kinds of food look like?

It's also important to zoom in on when Deep Learning tends to not be useful. Examples are:
- **When you need explainability** - the patterns learned by a deep learning model are typically uninterpretable by a human
- **When the traditional approach is a better option** - if you can accomplish what you need with a simple rule-based system
- **When errors are unacceptable** - since the outputs of deep learning model aren't always predictable
- **When you don't have much data** - deep learning models usually require a fairly large amount of data to produce great results

While the terms are occasionally used interchangeably, there are some general differences between Machine Learning and Deep Learning. It's also important to note that the fields keep developing and changing. Typically, traditional Machine Learning algorithms are used on structured data such as tables. Typically, Deep Learning tends to be more useful for unstructured data, such as audio files or texts. Note that there are always exceptions to the rules.

Examples of Machine Learning algorithms are:
- Random Forest
- Gradient Boosted Models
- Naive Bayes
- Nearest Neighbour
- Support Vector Machine
- ...many more

Examples of Deep Learning algorithms are:
- Neural Networks
- Fully Connected Neural Network
- Convolutional Neural Network
- Recurrent Neural Network
- Transformer
- ...many more

Depending on how a specific problem is represented, many algorithms can be used for both structured or unstructured data. Working with PyTorch in this research will mainly focus on Neural Networks, Fully Connected Neural Networks, and Convolutional Neural Networks.

<img src="images/00_neural.png" width="750"/>

The above image shows the typical workflow of a typical Neural Network. Input is first given, which can come in various forms. The example image shows unstructured data in the form of a tweet. This data somehow needs to be transformed into numbers through encoding, resulting in arrays (or tensors) of numbers. The Neural Network, through its network of nodes and layers then learns about the data and finds patterns, features, weights, etc. Output is also given in numerical form, requiring a transformation to become legible by human interpretation.

<img src="images/00_anatomy.png" width="750"/>

It's also important to zoom in on the anatomy of a Neural Network. It is highly customizable, but a Neural Network typically consists of an input layer, various hidden layers, and an output layer. As the name suggests, the input layer is responsible for accepting input and transforming the data into a usable format for the Neural Network. In the example image, the input layer has two neurons. The hidden layers are responsible for learning patterns in the data. In the example image, this it shows one hidden layer with 3 neurons. It's possible to have many more layers. The output layer outputs the results of the Neural Network. The image shows an output layer with a single neuron.

Some examples of Deep Learning use cases are:
- Recommendation
- Translation
- Speech Recognition
- Computer Vision
- Natural Language Processing (NLP)

Translation and Speech Recognition are also referred to as "sequence to sequence", as a certain sequence (text or audio) is given, which is then transformed into another sequence. Computer Vision and Natural Language Processing use cases fall under the classification or regression categories. For Computer Vision, the coordinates of where to draw the the box corners of an area wherein a certain object is being detected. Classification would then be used to determine what object is visible within the box. Classification is also used in Natural Language Processing. An example is determining whether an e-mail is spam or not.

What exactly is a tensor? Tensors are the numerical encodings of the input of a model, and the representation outputs of a model. This means that they can be almost anything (numerically), and they are designed for computers to understand. A tensor usually consists of an array of arrays containing said numerical data.

<img src="images/00_tensors.png" width="750"/>

Dan Fleisch's "What's a Tensor?": https://www.youtube.com/watch?v=f5liqUk0ZTw

## Introduction to Tensors <a name="intro" />

In [1]:
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
print(torch.__version__)

1.13.1+cu117


In [2]:
torch.cuda.is_available()

True

In [3]:
# Scalar
scalar = torch.tensor(7)
scalar

tensor(7)

In [4]:
# Number of dimensions of the scalar
scalar.ndim

0

In [5]:
# Item inside the scalar as int
scalar.item()

7

In [6]:
# Vector
vector = torch.tensor([7, 7, 7])
vector

tensor([7, 7, 7])

In [7]:
# Number of dimensions of the vector
vector.ndim

1

In [8]:
# Shape of the vector
vector.shape

torch.Size([3])

In [9]:
# Vector
matrix = torch.tensor([[7, 8],[9, 10]])
matrix

tensor([[ 7,  8],
        [ 9, 10]])

In [10]:
# Number of dimensions of the matrix
matrix.ndim

2

In [11]:
print(matrix[0], matrix[1])

tensor([7, 8]) tensor([ 9, 10])


In [12]:
# Shape of the matrix
matrix.shape

torch.Size([2, 2])

In [13]:
# Tensor
tensor = torch.tensor([[[1, 2, 3], [4, 5, 6], [7, 8, 9]]])
tensor

tensor([[[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]])

In [14]:
# Dimensions of the tensor
tensor.ndim

3

In [15]:
tensor.shape

torch.Size([1, 3, 3])

<img src="images/00_tensor_dimensions.png" width=750 />

<img src="images/00_svmt.png" width="500"/>

<img src="images/00_svmt_explanation.png" width="750"/>

In [16]:
tt1 = torch.tensor(
    [
        [[1, 2, 3], [4, 5, 6], [7, 8, 9]],
        [[9, 8, 7], [6, 5, 4], [3, 2, 1]]
    ],
)
tt1

tensor([[[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]],

        [[9, 8, 7],
         [6, 5, 4],
         [3, 2, 1]]])

In [17]:
tt2 = torch.tensor(
    [
        [[1, 2, 3], [4, 5, 6]],
        [[9, 8, 7], [6, 5, 4]]
    ],
)
tt2

tensor([[[1, 2, 3],
         [4, 5, 6]],

        [[9, 8, 7],
         [6, 5, 4]]])

## Random Tensors <a name="random" />

Why random tensors?

Random tensors are important as many Neural Networks start with random numbers during the learning process and then adjust those random numbers to better represent the data.

In [18]:
# Creating a random tensor manually of size (3, 4)
ts_rnd = torch.rand(3, 4)
ts_rnd

tensor([[0.6494, 0.0301, 0.6100, 0.7627],
        [0.9845, 0.9777, 0.5928, 0.9440],
        [0.5815, 0.7746, 0.9446, 0.6265]])

In [19]:
ts_rnd.ndim

2

In [20]:
# Creating a random tensor with similar shape to an image
# The size argument's parameters represent height, width, and colour channels (RGB)
# size= is optional but more clear
ts_img = torch.rand(size=(224, 224, 3))
ts_img.shape, ts_img.ndim

(torch.Size([224, 224, 3]), 3)

<img src="images/00_rgb.png" width="750"/>

## Zeros and Ones <a name="zeroone" />

In [21]:
# Create tensors of all zeros
ts_zero = torch.zeros(size=(3, 4))
ts_zero

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [22]:
ts_zero * ts_rnd

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [23]:
# Create tensors of all ones
ts_one = torch.ones(size=(3, 4))
ts_one

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [24]:
# Standard data type is torch.float32 when the numbers are floats, unless manually changed otherwise
ts_one.dtype

torch.float32

## Range of Tensors and Tensors-like <a name="rangelike" />

In [25]:
# torch.range() is deprecated 
# Using torch.arange() instead
one_to_tensor = torch.arange(1, 11)
one_to_tensor

tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [26]:
# Using start, end, and step parameters
step_to_thousand = torch.arange(start=0, end=1000, step=77)
step_to_thousand

tensor([  0,  77, 154, 231, 308, 385, 462, 539, 616, 693, 770, 847, 924])

In [27]:
# Using start, end, and step parameters is optional but more clear
step_to_thousand = torch.arange(0, 1000, 77)
step_to_thousand

tensor([  0,  77, 154, 231, 308, 385, 462, 539, 616, 693, 770, 847, 924])

In [28]:
# Creating a 0-filled tensor with equal shape to another tensor
# input= is optional but more clear
ten_zeros = torch.zeros_like(input=one_to_tensor)
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

## Tensor Data Types <a name="datatypes" />

Available tensor data types: https://pytorch.org/docs/stable/tensors

In [29]:
# Default datatype is torch.float32 when then numbers are floats
# Behaviour applies even when dtype is set to None
ts_float32 = torch.tensor([3.0, 6.0, 9.0], dtype=None, device=None, requires_grad=False)
ts_float32, ts_float32.dtype

(tensor([3., 6., 9.]), torch.float32)

- dtype = What datatype the tensor will be
- device = What device is used to create the tensor (CPU by default)
- requires_grad = If you want PyTorch to keep track of the gradients during certain numerical calculations (explained later)

Tensor datatypes is one of the common 3 errors to run into when working with tensors. These 3 common errors are:
- Tensor is not of the right datatype
- Tensor is not of the right size
- Tensor is not on the right device

Tensors have to be of the same data types. Attempting to perform a computation on a float32 tensor with a float16 tensor may result in errors. Computations also require tensors to be of the same size. Tensors also need to exist on the same device. If one tensor is created on the CPU and another is created on the GPU, they will not be able to interact.

In [30]:
# Conversion of tensor of one datatype to another
ts_float16 = ts_float32.type(torch.float16)
ts_float16

tensor([3., 6., 9.], dtype=torch.float16)

In [31]:
# Different datatypes don't necessarily result in an error!
ts_float16 * ts_float32

tensor([ 9., 36., 81.])

In [32]:
# Default datatype is torch.int64 when then numbers are integers
# Behaviour applies even when dtype is set to None
ts_int64 = torch.tensor([3, 6, 9], dtype=None)
ts_int64, ts_int64.dtype

(tensor([3, 6, 9]), torch.int64)

In [33]:
# Different datatypes don't necessarily result in an error!
ts_int64 * ts_float32

tensor([ 9., 36., 81.])

In [34]:
# Getting information of a tensor
ts_info = torch.rand(3, 4)

print(f"Datatype: {ts_info.dtype}")
print(f"Shape: {ts_info.shape}")
print(f"Size: {ts_info.size()}")
print(f"Device: {ts_info.device}")

Datatype: torch.float32
Shape: torch.Size([3, 4])
Size: torch.Size([3, 4])
Device: cpu


## Manipulating Tensors <a name="manipulate" />

Tensor manipulations include, but are not limited to:
- Addition
- Subtraction
- Multiplication (element-wise)
- Division
- Matrix multiplication

In [35]:
# Addition
ts_add = torch.tensor([1, 2, 3])
ts_add + 100

tensor([101, 102, 103])

In [36]:
# Subtraction
ts_sub = torch.tensor([101, 102, 103])
ts_sub - 100

tensor([1, 2, 3])

In [37]:
# Multiplication
ts_mul = torch.tensor([1, 2, 3])
ts_mul * 10

tensor([10, 20, 30])

In [38]:
# Division
# Note that division by default results in a float again
ts_div = torch.tensor([10, 20, 30])
ts_div / 10

tensor([1., 2., 3.])

In [39]:
# PyTorch in-built functions
print(torch.add(ts_add, 100))
print(torch.sub(ts_sub, 100))
print(torch.mul(ts_mul, 10))
print(torch.div(ts_div, 10))

tensor([101, 102, 103])
tensor([1, 2, 3])
tensor([10, 20, 30])
tensor([1., 2., 3.])


There are two main ways of performing multiplication in Neural Networks and Deep Learning:

- Element-wise multiplication
- Matrix multiplication (dot product)

In [40]:
# Element-wise operation:
ts_add * ts_add

tensor([1, 4, 9])

Matrix multiplication is more intricate and requires explanation. In matrix multiplication, the first matrix is multiplied by the second matrix. This is done by multiplying each row of the first matrix by each row of the columns of the second matrix. In the above example, as the first matrix has 2 rows, and the second matrix has 2 columns, this results in 4 total operations. It's important to note that **the number of columns of the first matrix MUST equal the number of rows of the second matrix**.

<img src="images/00_dotproduct.svg" />

The above example is the multiplication of a **2x3** matrix (2 rows and 3 columns) matrix with a **3x2** matrix (3 rows and 2 columns).

In general: To multiply an **m**×**n** matrix by an **n**×**p** matrix, the **n**s must be the same,
and the result is an **m**×**p** matrix.

<img src="images/00_mnp.svg" />

For these operations, the dot product is used. The operations of the above example would be as follows:
- (1, 2, 3) x (7, 9, 11) = 1x7 + 2x9 + 3x11 = 58
- (1, 2, 3) x (8, 10, 12) = 1x8 + 2x10 + 3x12 = 64
- (4, 5, 6) x (7, 9, 11) = 4x7 + 5x9 + 6x11 = 139
- (4, 5, 6) x (8, 10, 12) = 4x8 + 5x10 + 6x12 = 154

Matrix multipication on mathisfun: https://www.mathsisfun.com/algebra/matrix-multiplying

In [41]:
# Matrix multiplication recreation of the above example
ts_m1 = torch.tensor([[1, 2, 3], [4, 5, 6]])
ts_m2 = torch.tensor([[7, 8], [9, 10], [11, 12]])
ts_m1, ts_m2

(tensor([[1, 2, 3],
         [4, 5, 6]]),
 tensor([[ 7,  8],
         [ 9, 10],
         [11, 12]]))

In [42]:
# Matrix multiplication
torch.matmul(ts_m1, ts_m2)

tensor([[ 58,  64],
        [139, 154]])

Two more examples of matrix multiplication:

<img src="images/00_matmul1.png" />

In [43]:
# Matrix multiplication recreation of the above example
ts_m1 = torch.tensor([[1, 2, 3]])
ts_m2 = torch.tensor([[4], [5], [6]])
ts_m1, ts_m2

(tensor([[1, 2, 3]]),
 tensor([[4],
         [5],
         [6]]))

In [44]:
# Matrix multiplication
torch.matmul(ts_m1, ts_m2)

tensor([[32]])

<img src="images/00_matmul2.png" />

In [45]:
# Matrix multiplication recreation of the above example
ts_m1 = torch.tensor([[4], [5], [6]])
ts_m2 = torch.tensor([[1, 2, 3]])
ts_m1, ts_m2

(tensor([[4],
         [5],
         [6]]),
 tensor([[1, 2, 3]]))

In [46]:
# Matrix multiplication
torch.matmul(ts_m1, ts_m2)

tensor([[ 4,  8, 12],
        [ 5, 10, 15],
        [ 6, 12, 18]])

In [47]:
# @ also performs matrix multiplication
ts_m1 @ ts_m2

tensor([[ 4,  8, 12],
        [ 5, 10, 15],
        [ 6, 12, 18]])

To recap, there are two main rules that matrix multiplication needs to satisfy:
- The **inner dimensions** must match
- The resulting matrix has the shape of the **outer dimensions**

Rule 1 refers to how the number of columns of matrix 1 needs to match the number of rows of matrix 2:
- (3, 2) @ (3, 2) won't work
- (2, 3) @ (3, 2) will work
- (3, 2) @ (2, 4) will work

Rule 2 refers to how the outer dimensions define the resulting matrix's shape:
- (2, 3) @ (3, 2) > (2, 2)
- (3, 2) @ (2, 4) > (3, 4)

In [48]:
# Outer dimensions example
# mm is short for matmul
torch.mm(torch.rand(3, 2), torch.rand(2, 4))

tensor([[0.5413, 0.8675, 1.0208, 0.9037],
        [0.3075, 0.4887, 0.6929, 0.6500],
        [0.2678, 0.4194, 0.7842, 0.7847]])

http://matrixmultiplication.xyz/

When dealing with shape issues (e.g., (3, 2) @ (3, 2)), it's possible to manipulate one of the tensors using transpose.

In [49]:
# Original matrix and transposed version
ts_m1, ts_m1.T

(tensor([[4],
         [5],
         [6]]),
 tensor([[4, 5, 6]]))

In [50]:
# Original matrix and transposed version shapes
ts_m1.shape, ts_m1.T.shape

(torch.Size([3, 1]), torch.Size([1, 3]))

In [51]:
# Matrix multiplication of the original matrix and the transposed version
torch.mm(ts_m1, ts_m1.T)

tensor([[16, 20, 24],
        [20, 25, 30],
        [24, 30, 36]])

## Tensor Aggregation <a name="tensoragg" />

Finding the min, max, mean, sum, etc., of tensors:

In [52]:
ts_agg = torch.arange(1, 100, 10)
ts_agg

tensor([ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91])

In [53]:
# Min value of a tensor
torch.min(ts_agg), ts_agg.min()

(tensor(1), tensor(1))

In [54]:
# Max value of a tensor
torch.max(ts_agg), ts_agg.max()

(tensor(91), tensor(91))

In [55]:
# Mean of a tensor
# Mean requires the datatype to be float or a complex datatype
# tensor_agg consists of integers and thus requires conversion
torch.mean(ts_agg.type(torch.float32)), ts_agg.type(torch.float32).mean()

(tensor(46.), tensor(46.))

In [56]:
# Tensors with floats and those that by default have float as its datatype have no issues with the mean
torch.mean(torch.rand(3, 4))

tensor(0.4221)

In [57]:
# Sum of a tensor
torch.sum(ts_agg), ts_agg.sum()

(tensor(460), tensor(460))

In [58]:
# Finding the positional min of a tensor
torch.argmin(ts_agg), ts_agg.argmin()

(tensor(0), tensor(0))

In [59]:
# Finding the positional max of a tensor
torch.argmax(ts_agg), ts_agg.argmax()

(tensor(9), tensor(9))

There are also various other operations available to manipulate tensors in several ways. Some of these are:
- Reshaping = Reshapes an input tensor to a defined shape
- View = Return a view of an input tensor of certain shape but keep the same memory as the original tensor
- Stacking = Combine multiple tensors on top of each other (vstack) or side by side (hstack)
- Squeeze = Removes all 1 dimensions from a tensor
- Unsqueeze = Add a 1 dimension to a tensor
- Permute = Return a view of the input tensor with dimensions permuted (swapped) in a certain way

In [60]:
ts_x = torch.arange(1., 13.)
ts_x, ts_x.shape

(tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.]),
 torch.Size([12]))

In [61]:
# Reshaping a tensor to a different row & column distribution
# reshape(3, 4) is possible as the original size (12) is a product of the chosen rows and columns
x_reshaped = ts_x.reshape(3, 4)
x_reshaped, x_reshaped.shape

(tensor([[ 1.,  2.,  3.,  4.],
         [ 5.,  6.,  7.,  8.],
         [ 9., 10., 11., 12.]]),
 torch.Size([3, 4]))

In [62]:
# Create a view of a tensor, sharing the same memory
# The view tensor shares the same underlying data with its base tensor
# This view avoids explicit data copy, allowing fast and memory efficient reshaping, slicing and element-wise operations
ts_z = ts_x.view(2, 6)
ts_z, ts_z.shape

(tensor([[ 1.,  2.,  3.,  4.,  5.,  6.],
         [ 7.,  8.,  9., 10., 11., 12.]]),
 torch.Size([2, 6]))

In [63]:
# Changing an element of tensor_z also changes the same element in tensor_x due to this shared memory
ts_z[0, 0] = 99
ts_z, ts_x

(tensor([[99.,  2.,  3.,  4.,  5.,  6.],
         [ 7.,  8.,  9., 10., 11., 12.]]),
 tensor([99.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.]))

In [64]:
# Stack tensors on top of each other
# The tensors have to be of equal size! (tensor_z's dimensions are different)
# dim=0 is default
x_stacked = torch.stack([ts_x, ts_x, ts_x], dim=0)
x_stacked

tensor([[99.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.],
        [99.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.],
        [99.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.]])

In [65]:
# Remove all dimensions of input of size 1 from a tensor
ts_sqz = torch.tensor([[1, 2, 3]])
ts_sqzd = ts_sqz.squeeze()
ts_sqz, ts_sqz.shape, ts_sqzd, ts_sqzd.shape

(tensor([[1, 2, 3]]), torch.Size([1, 3]), tensor([1, 2, 3]), torch.Size([3]))

In [66]:
# Add a single dimension to a tensor at a specific dimension
ts_sqz = ts_sqzd.unsqueeze(dim=0)
ts_sqz, ts_sqz.shape

(tensor([[1, 2, 3]]), torch.Size([1, 3]))

In [67]:
# Rearranging the dimensions of a tensor in a specified order
# Permute orders the dimensions based on their indices
# Index 0 = 2, index 1 = 3, index 2 = 5, thus a permuted order of 2,0,1 = [5, 2, 3]
# Permute also returns a view
ts_perm = torch.randn(2, 3, 5)
ts_perm.size(), torch.permute(ts_perm, (2, 0, 1)).size()

(torch.Size([2, 3, 5]), torch.Size([5, 2, 3]))

In [68]:
ts_perm[0, 0, 0] = 0.
ts_perm, torch.permute(ts_perm, (2, 0, 1))

(tensor([[[ 0.0000,  0.7601, -0.7658, -0.3008,  0.1359],
          [ 0.4907, -0.0621,  0.3325, -0.5783, -1.3454],
          [-2.2861, -0.0870, -1.2042,  0.7755, -0.4459]],
 
         [[ 0.2296, -0.0950,  0.0355, -0.2550, -1.1073],
          [-1.1641,  0.5569, -0.1903,  0.7410,  0.1902],
          [ 0.4659,  0.7517,  0.5118, -1.7440, -0.1842]]]),
 tensor([[[ 0.0000,  0.4907, -2.2861],
          [ 0.2296, -1.1641,  0.4659]],
 
         [[ 0.7601, -0.0621, -0.0870],
          [-0.0950,  0.5569,  0.7517]],
 
         [[-0.7658,  0.3325, -1.2042],
          [ 0.0355, -0.1903,  0.5118]],
 
         [[-0.3008, -0.5783,  0.7755],
          [-0.2550,  0.7410, -1.7440]],
 
         [[ 0.1359, -1.3454, -0.4459],
          [-1.1073,  0.1902, -0.1842]]]))

# Indexing <a name="indexing" />

In [69]:
# Indexing is similar to indexing in Python
# Indices are 0-indexed
ts_index = torch.arange(1, 13).reshape(1, 3, 4)
ts_index, ts_index.shape

(tensor([[[ 1,  2,  3,  4],
          [ 5,  6,  7,  8],
          [ 9, 10, 11, 12]]]),
 torch.Size([1, 3, 4]))

In [70]:
# Index of dimension 0
ts_index[0]

tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])

In [71]:
# Index of dimension 1
ts_index[0, 0], ts_index[0][1]

(tensor([1, 2, 3, 4]), tensor([5, 6, 7, 8]))

In [72]:
# Index of dimension 2, selecting final element
ts_index[0,2,3]

tensor(12)

In [73]:
# Select all of a certain dimension and certain ranges
ts_index[:,:,1:3]

tensor([[[ 2,  3],
         [ 6,  7],
         [10, 11]]])

NumPy is a popular scientific Python numerical computing library. Because of this, PyTorch has functionality to interact with it. It's possible for data to first exist in NumPy, requiring a conversion to a PyTorch tensor. The reverse operation is possible as well.

In [74]:
# NumPy array conversion to PyTorch tensor
np_arr = np.arange(1., 8.)
ts_arr = torch.from_numpy(np_arr)
np_arr, ts_arr

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

In [75]:
# The default NumPy data type is float64, which turns the datatype of the resulting tensor into float64 as well
np_arr.dtype, ts_arr.dtype

(dtype('float64'), torch.float64)

In [76]:
# Said datatype can be converted
ts_arr.type(torch.float32).dtype

torch.float32

In [77]:
# Changing the values of the NumPy array in this way does not affect the tensor
# Copy assignment like this breaks the view and shared memory between the NumPy array and the tensor
np_arr = np_arr + 1
np_arr, ts_arr

(array([2., 3., 4., 5., 6., 7., 8.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

In [78]:
# Using inadd assignment retains the view and shared memory between the NumPy array and the tensor
np_arr = np.arange(1., 8.)
ts_arr = torch.from_numpy(np_arr)
np_arr += 1
np_arr, ts_arr

(array([2., 3., 4., 5., 6., 7., 8.]),
 tensor([2., 3., 4., 5., 6., 7., 8.], dtype=torch.float64))

In [79]:
# Converting a tensor to a NumPy array
ts_arr = torch.ones(7)
np_arr = ts_arr.numpy()
ts_arr, np_arr

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

# Reproducability <a name="reproduce" />

Neural networks learn by working with random numbers and continuously updating said numbers to better represent the data, according to the network's findings. It's possible to reduce this randomness by introducing the concept of a seed. Any random number generating function isn't truly random and returns the exact same number sequence when the conditions are the exact same. To ensure the number sequence changes (and thus becomes random from a user's perspective), another component is needed to "seed" the random number generator. The time of day is commonly used example as it changes every second.

In [80]:
ts_A = torch.rand(3, 4)
ts_B = torch.rand(3, 4)
ts_A, ts_B

(tensor([[0.8919, 0.2108, 0.4254, 0.5120],
         [0.8458, 0.8510, 0.8972, 0.8534],
         [0.3139, 0.2183, 0.5733, 0.4214]]),
 tensor([[0.9355, 0.7816, 0.1570, 0.6089],
         [0.1528, 0.6789, 0.7044, 0.9028],
         [0.9412, 0.4875, 0.8442, 0.1539]]))

In [100]:
# Manually setting the seed
# Resetting the seed restarts the number sequence and allows for the same outputs to be given again
seed = 42
torch.manual_seed(seed)
ts_C = torch.rand(1, 3) # The same
ts_D = torch.rand(1, 3) # Different
torch.manual_seed(seed)
ts_E = torch.rand(1, 3) # The same
ts_C, ts_D, ts_E

(tensor([[0.8823, 0.9150, 0.3829]]),
 tensor([[0.9593, 0.3904, 0.6009]]),
 tensor([[0.8823, 0.9150, 0.3829]]))

- https://pytorch.org/docs/stable/notes/randomness
- https://en.wikipedia.org/wiki/Random_seed

## GPU Usage <a name="gpus" />

Utilizing a GPU will allow for faster computing thanks to a combination of CUDA + NVIDIA + PyTorch working behind the scenes. There are various ways to access a GPU, which are:
- Use a GPU in Google Colab, which offers both free and paid options
- Use your own GPU, which requires some setup and investment
- Use a Cloud Computing service such as AWS or Azure

Refer to https://pytorch.org/get-started/locally/ for instructions on using your own GPU

In [82]:
# Check if CUDA is installed
torch.cuda.is_available()

True

As PyTorch is capable to be run on both a CPU and GPU, it's best practice to create device-agnostic code. See https://pytorch.org/docs/stable/notes/cuda#best-practices for more details.

In [83]:
# Setting up device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

In [84]:
torch.cuda.device_count()

1

In [88]:
# Creating a tensor defaults to using the CPU
ts_cpu = torch.tensor([1, 2, 3])
ts_cpu, ts_cpu.device

(tensor([1, 2, 3]), device(type='cpu'))

In [89]:
# My GPU is old :)
ts_gpu = ts_cpu.to(device)
ts_gpu, ts_gpu.device

    Found GPU0 GeForce GTX 780 Ti which is of cuda capability 3.5.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability supported by this library is 3.7.
    


(tensor([1, 2, 3], device='cuda:0'), device(type='cuda', index=0))

NumPy cannot work with tensors on the GPU, and can only work with tensors on the GPU. If you want to use NumPy on a GPu tensor, it's thus required that you convert it to a CPU tensor.

<img src="images/00_numpyfail.png" />

In [94]:
# Conversion of a GPU tensor to a CPU tensor to a NumPy array
ts_cpu2 = ts_gpu.cpu().numpy()
ts_cpu2

array([1, 2, 3], dtype=int64)