Introduction to the pytorch core library
========================================

Pytorch is a numerical computation library that makes it convenient to build, train, and test deep neural networks.  It provides four main idioms that build upon each other.  These form the backbone classes that are further specialized when building specific networks for vision, NLP, etc.

* Tensor arithmetic: a math library for manipulating n-dimensional arrays of numbers on CPU or GPU.
* Autograd: evaluates pointwise derivatives of any computed tensor with respect to input tensors.
* Optimization: updates input tensors to optimize any computed objective, using autograd gradients.
* Network modules: organizes parameterized tensor computations into modular neural network components.
* Dataloader: a utility to simplify efficient multithreaded prefetching of large streams of data.

This notebook is a quick overview of these four pieces, and gives links to other reference material and resources.

pytorch Tensors
===============

The trick for doing math fast on a modern computer is to do big array operations all at once.  To faciliate this, pytorch provides a [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html) class that is a lookalike to the older [`numpy.ndarray`](https://numpy.org/doc/1.18/reference/arrays.ndarray.html).  Just like a numpy `ndarray`, the pytorch `Tensor` stores an n-dimensional array of numbers, where n can be zero or more, and where the numbers can be various float or integer types.  Almost all the numpy operations are also available on torch tensors, but if something is missing, torch tensors can be directly converted to numpy using `x.numpy()` and from numpy using `torch.from_numpy(n)`.

**What makes a torch.Tensor different from a numpy.ndarray.**  There are two things that torch provides beyond numpy arrays:

 1. pytorch Tensors can live on either GPU or CPU (numpy is cpu-only)
 2. pytorch can automatically track tensor computations to enable automatic differentiation.
    


Basic operations
----------------

**Elementwise operations.** Most tensor operations are simple (embarassingly parallelizble) elementwise operations, where the same math is done on every element of the array.

**Common reduction operations.**  There are some common operations such as `max`, `min`, `mean`, `sum` that reduce the array by one or more dimension (which dimension you reduce you can usually specify by saying `dim=1`).

**Copy semantics by default.** Almost all operations, including things like `x.sort()` return a new copy of the tensor without overwriting the input tensors.  The exceptions are functions that end in an underscore such as `x.mul_(2)` which doubles the contents of x in-place.

**Where is argmin?** Note that `x.sort(dim=0)` and `x.min(dim=0)` return the pair of both the answer an the index values, so you do not need to separately recompute `argsort` or `argmin` when you need it.

See the [reference for Tensor methods](https://pytorch.org/docs/stable/tensors.html#torch.Tensor) for what comes built-in.



In [None]:
import math, numpy, torch
from matplotlib import pyplot as plt

# Make a vector of 100 equally spaced numbers from 0 to 5.
x = torch.linspace(0, 5, 100)

y1, y2 = x.sin(), x ** x.cos()
y3 = y2 - y1
y4 = y3.min()

print(f'The shape of x is {x.shape}')
print(f'The shape of y1=x.sin() is {y1.shape}')
print(f'The shape of y2=x ** x.cos() is {y2.shape}')
print(f'The shape of y3=y2 - y1 is {y3.shape}')
print(f'The shape of y4=y3.min() is {y4.shape}, a zero-d scalar')

plt.plot(x, y1, 'red', x, y2, 'blue', x, y3, 'green')
plt.axhline(y4, color='green', linestyle='--')
plt.show()

Subscripts and multiple dimensions
----------------------------------

**Slicing.** As normal in python, you can use `[min:max:stride]` to slice ranges, and multidimensional subscripts work as you would expect.

**Unsqueezing to add a dimension.** While the subscript `[0]` eliminates a dimension, the special subscript `[None]` does the reverse and adds an extra dimension instead of reducing one.

**Fancy indexing.** By passing arrays as subscripts things can get quite fancy; the rules are modeled on the capabilties in numpy.  For details see [Numpy fancy indexing](https://numpy.org/doc/stable/user/basics.indexing.html).

In [None]:
# Make an array of normally distributed randoms.
m = torch.randn(2, 3)
print(f'm is {m}, and m[1,2] is {m[1,2]}\n')
print(f'column zero, m[:,0] is {m[:,0]}')
print(f'row zero m[0,:] is {m[0,:]}\n')
dot_product = (m[0,:] * m[1,:]).sum()
print(f'The dot product of rows (m[0,:] * m[1,:]).sum() is {dot_product}\n')
outer_product = m[0,:][None,:] * m[1,:][:,None]
print(f'the outer product of rows m[0,:][None,:] * m[1,:][:,None] is:\n{outer_product}')

Devices and types
-----------------

**Single precision CPU default.** By default a torch tensor will be stored on the CPU and will store single-precision 32-bit floating values.

**Specifying data type.** To store a different data type such as integers, say `dtype=torch.long` when you create it.

**Specifying GPU.** To put the tensor on the GPU, specify `device='cuda'`.

**Copying a tensor to a different device or type.** You cannot directly combine tensors that are on different devices (e.g., GPU vs CPU or even different GPUs) and most different-type combinations are also prohibited, so you may ocassionally find that you need to convert types and move devices explicitly to make tensors compatible before combining them.  The `x.to(device and/or type)` function makes a copy of the tensor on the specified device or type.

There are also commonly-used convenience synonyms `x.cpu()`, `x.cuda()`, `x.float()`, `x.long()`, etc. for making a copy of `x` with the specified device or type.

Keep in mind that copying a tensor between devices can take much more time than doing an operation on the same device: to make your computation fast, move all the data you need to the same GPU, and keep it there.

See the [Tensor reference](https://pytorch.org/docs/stable/tensors.html) for all the types.


In [None]:
r_orig = torch.randn(2, 3, device='cuda', dtype=torch.double)
print(r_orig)
r_moved = r_orig.cpu().byte()
print(r_moved)

pytorch Tensor dimension-ordering conventions
---------------------------------------------

**Multidimensional data convention.** As soon as you have more than one dimension, you need to decide how to order them.  Much image-related data in pytorch is four dimensional, and the dimensions are ordered like this: `data[batch_index, channel_index, y_position, x_position]`, That is:
* Dimension 0 is used to index separate images within a batch.
* Dimension 1 indexes channels within an image representation (e.g., 0,1,2 = R,G,B)
* Dimension 2 indexes the row position (y-value, starting from the top)
* Dimension 3 indexes the column position (x-value, starting from the left)

There are good reasons for this ordering: the last dimensions are stored physically closer in RAM; since they are often combined with each other, this helps with locality, whereas the first (batch) dimension usually just groups separate independent data points which are not combined much, except maybe to average them together at the end.  Stream-oriented data will drop one or two dimensions, and 3d data will add a dimension, adding a depth z before y.  The same ordering is used in caffe and tensorflow.

**Multidimensional linear operation convention.** When storing matrix weights or convolution weights, linear algebra conventions are followed
* Dimension 0 (number of rows) matches the output channel dimension
* Dimension 1 (number of columns) matches the input channel dimension
* Dimension 2 (if present) is the convolutional kernel y-dimension
* Dimension 3 (if present) is the convolutional kernel x-dimension