# PyTorch Fundamentals

This notebook closely follows the material available at learnpytorch.io [[1]](https://www.learnpytorch.io/) with occassional refactoring and extension for consistency of style and to make connections with other parts of the package. It is also more extensive on examples and does less revisit of lower level concepts once discussed.

### About PyTorch

PyTorch is an open source machine and/or, depending on accepted classification, deep learning framework with a highly optimized tensor workflow using both CPU and GPU for computation. It is widely used both in industry and academia, and, as of 2022, it is the most used deep learning framework on Papers with Code [[2]](https://paperswithcode.com/trends). Its design philosophy is based around the following principles.
- Usability over performance
- Simple over easy
- Python first with best in class language interoperability

### Install PyTorch and confirm version

The recommended way to install PyTorch in a virtual environment and CUDA 11.7 support is the following.

`pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117`

The version of `torch` references both the PyTorch version and the linked CUDA installation.

In [None]:
import torch
print(torch.__version__)

In order to check if there is an Nvidia GPU, we can run the `nvidia_smi` command on the command line. From within the notebook we can achieve the same by prefixing it with a bang `!`.

In [None]:
!nvidia-smi

**Note.** If the CUDA version listed differs from the one that shipped with the PyTorch binaries, then it may not necessarily be a problem. For instance, 11.7 version coming with PyTorch will be able to work with 12.1 on the GPU side. *This was tested with an NVIDIA GeForce GTX 1050 Ti 4GB GPU and driver version of 531.41.*

Cuda availability can be checked explicitly leaving it open to make the usage of the GPU conditional.

In [None]:
torch.cuda.is_available()

### Tensors

Tensors are higher-order generalizations of vector and matrix concepts and as such are implemented using the abstraction of higher-dimensional arrays. An object $T$ is called an $n$-order or $n$-dimensional tensor if there are vector spaces $V_i$ of dimensions $d_i$ for $i = 0, 1, \dots, n - 1$ such that

$$
T = V_0 \times V_1 \times \dots \times V_{n - 1}
$$

In PyTorch, the vector spaces have finitely many elements by nature (finiteness of representation) and can correspond to one of the supported data types.

In [None]:
torch_types = {
    "32-bit floating point": (torch.float32, torch.float),
    "64-bit floating point": (torch.float64, torch.double),
    "64-bit complex": (torch.complex64, torch.cfloat),
    "128-bit complex": (torch.complex128, torch.cdouble),
    "16-bit floating point (1 sign, 5 exponents, 10 significand )": (torch.float16, torch.half),
    "16-bit Brain floating point (1 sign, 8 exponent, 7 significand)": (torch.bfloat16,),
    "8-bit integer (unsigned)": (torch.uint8,),
    "8-bit integer (signed)": (torch.int8,),
    "16-bit integer (signed)": (torch.int16, torch.short),
    "32-bit integer (signed)": (torch.int32, torch.int),
    "64-bit integer (signed)": (torch.int64, torch.long),
    "Boolean": torch.bool,
}

Perhaps not surprisingly, standard nested `list` objects that respect the homogeneous shape requirements, can be used to construct the first `Tensor` examples.

In [None]:
vector = torch.tensor([1, 0])
matrix = torch.tensor(
    [
        [0., 1.],
        [1., 0.],
    ]
)
tensor = torch.tensor(
    [
        [
            [True, False],
            [False, True],
        ],
        [
            [True, True],
            [False, False],
        ]
    ]
)
for t in [vector, matrix, tensor]:
    print(f"{t}\n\n\ttensor of type {t.dtype}\n")

**Note.** Above examples also show the default `dtype` chosen for Python types.

PyTorch supports not-a-number values in the form of `float("nan")` and also positive and negative infinite values `float("Inf")` and `float("-Inf")`, respectively.
It worth mentioning that the Boolean type in PyTorch is not nulleable, NaN values are considered `True`.

In [None]:
torch.tensor([float(item) for item in ["nan", "Inf", "-Inf"]], dtype=torch.bool)

In mathematics, the most straighforward explanation of a tensor would be a multilinear mapping. However, in PyTorch, tensors are meant to be understood more as real world inputs mapped onto supported data types, at least initially. For example, a 2 by 2 pixels image can be  considered as a tensor of order 3 over integer-type domains of dimensions 2, 2, and 3, where the first and second orders with dimension 2 each stand for the height and the width, respectively, and the third order of dimension 3 reflects the color channel (RGB).

In [None]:
image = torch.tensor(
    [
        [
            [0.1, 0.2, 0.3],  # RGB of pixel (0, 0)
            [0.2, 0.2, 0.3],  # RGB of pixel (0, 1)
        ],
        [
            [0.4, 0.5, 0.5],  # RGB of pixel (1, 0)
            [0.5, 0.6, 0.6],  # RGB of pixel (1, 1)
        ]
    ]
)
image.shape

### Deterministic tensor constructors

Let us begin with the convention that scalars are thought of as tensors of dimension 0 rather than dimension 1 over any of the allowed domains.

In [None]:
scalar = torch.tensor(0)
print(f"{scalar} is an object {type(scalar)} of dimension {scalar.ndim}")

We have already seen the conversion of Python types into tensors by means of `torch.tensor`. This is fairly inefficient due to the performance of Python objects and also rarely needed as under normal circumstances inputs are flowing in through I/O streams. We may additionally need to create tensors of special forms to supplement the flow of computation.

In [None]:
zeros = torch.zeros((2, 2))
print(zeros)

In [None]:
ones = torch.ones((3, 2, 1))
print(ones)

In [None]:
arith_range = torch.arange(30).reshape(2, 3, 5)
print(arith_range)

In [None]:
equispaced = torch.linspace(0, 9, 10).reshape(2, 5)
print(equispaced)

In [None]:
equiscaled = torch.logspace(0, 1, 10).reshape(2, 5)
print(equiscaled)

### Tensor algebra

Tensors generalize matrices not only in terms of representation, but also in terms of operations defined for them. PyTorch tensors support pointwise addition, subtraction, multiplication, and division. Also, as expected from vectors spaces, they support scalar multiplication, matrix multiplication along axes of identical dimensions, and also implement the transpose operation.

In [None]:
zeros = torch.zeros((2, 3))
ones = torch.ones((3, 2))
transponent = ones.t()
print(f"Transponent of\n{ones}\n  is:\n{transponent}")
print()
print(f"Sum of\n{zeros}\n  and\n{transponent}\n  is:\n{zeros + transponent}")
print()
print(f"Difference of\n{zeros}\n  and\n{transponent}\n  is:\n{zeros - transponent}")
print()
print(f"Pointwise product of\n{zeros}\n  and\n{transponent}\n  is:\n{zeros * transponent}")
print()
print(f"Pointwise quotient of\n{zeros}\n  and\n{transponent}\n  is:\n{zeros / transponent}")
print()
print(f"Double of\n{ones}\n  is:\n{2 * ones}")
print()
print(f"Matrix product of\n{zeros}\n  and\n{ones}\n  is:\n{torch.matmul(zeros, ones)})")

Basic concepts of determinant and matrix inversion are also lifted to their higher dimensional analogues.

In [None]:
tensor = torch.linspace(0, 1, 16).reshape(2, 2, 2, 2)
print(f"(Hyper)determinant of\n{tensor}\n  is:\n{tensor.det()}")
print()
print(f"Inverse of\n{tensor}\n  is:\n{tensor.inverse()}")

**Note.** A legitim question at this point is that *"under what circumstances the determinant and inverse can be thought of as useful measures of input data?"*. In fact, these become more relevant once the tensor is thought of as a multilinear mapping or in an overly simplified manner as a state transition expression. The determinant helps understanding enlargening/shrinking effects while the inverse can be considered a reversal of transition.

Comparison is understood pontwise between tensors of identical shape. It is also possible to compare against scalars.

In [None]:
print(f"Positions where\n{zeros}\n  is less than\n{transponent}\n  are:\n{zeros < transponent}")
print()
print(f"Positions where\n{zeros}\n  is greater than\n{transponent}\n  are:\n{zeros > transponent}")
print()
print(f"Positions where\n{zeros}\n  is equal to\n{transponent}\n  are:\n{zeros == transponent}")
print()
print(f"Positions where\n{zeros}\n  is not equal to\n{transponent}\n  are:\n{zeros != transponent}")
print()

Boolean operations apply pointwise too; the boolean value of tensors themselves are ambiguous.

In [None]:
tensor = torch.arange(9).reshape(3, 3)
print(f"Positions where\n{tensor}\n  is less than 3 or larger than 6 are\n{(tensor < 3) | (tensor > 6)}")
print()
print(f"Positions where\n{tensor}\n  is less than 6 and larger than 3 are\n{(tensor > 3) & (tensor < 6)}")
print()
print(f"Positions where\n{tensor}\n  is less than or equal to 3 or larger than or equal to 6\n{~((tensor > 3) & (tensor < 6))}")

### Stochastic tensor constructors

Often times, we need to sample values of a tensor from some probability distribution. PyTorch offers stochastic constructors for many common distributions.

In [None]:
uniform_tensor = torch.rand(2, 3, 4)
print(uniform_tensor)

In [None]:
# Unconventional constructor using an pointwise free p parameter
# It does have to do with Markov-chains and evolutionary processes
bernoulli_tensor = torch.bernoulli(
    0.5 * torch.ones((10, 10))  # Fix p to be 0.5 across all the sampling
)
print(bernoulli_tensor)

In [None]:
gaussian_tensor = torch.normal(0, 1, (5, 5))
print(gaussian_tensor)

In [None]:
# Same comment applies as for Bernoulli sampling
poisson_tensor = torch.poisson(
    torch.ones((5, 5))  # Fix lambda to be 1 across all the sampling
)
print(poisson_tensor)

In case sampling scenarios are to be repeated, the random seed can be fixed to ensure identical outputs. Note that the seed moves away during each sampling according to the generator process and needs to be reset.

In [None]:
def reset_seed():
    torch.manual_seed(seed=1)

reset_seed()
print(torch.rand(3, 3))
reset_seed()
print(torch.rand(3, 3))
reset_seed()
print(torch.rand(3, 3))

### Indexing

Tensors implement the standard accessors by position: explicit indexing in one or multiple dimensions and slicing.

In [None]:
tensor = torch.arange(24).reshape(2, 3, 4)
print(f"Tensor example used is:\n{tensor}\nWe are going to think along x, y, and z axes respectively.")
print()
print(f"Accessing first element along axis x:\n{tensor[0]}")
print()
print(f"Accessing first element along axis y:\n{tensor[:, 0]}")
print()
print(f"Accessing first element along axis z:\n{tensor[:, :, 0]}")
print()
print(f"Accessing first element along axis x and y:\n{tensor[0, 0]}")
print()
print(f"Accessing first element along axis x and z:\n{tensor[0, :, 0]}")
print()
print(f"Accessing first element along axis y and z:\n{tensor[:, 0, 0]}")
print()
print(f"Accessing first element along axis x, second and third along y:\n{tensor[0, 1:]}")

In addition, tensors support boolean masking like NumPy arrays and Pandas dataframes.

In [None]:
tensor = torch.normal(0, 1, (3, 3, 3))
print(f"Tensor example used is:\n{tensor}")
print()
print(f"Elements less than 0 are:\n{tensor[tensor < 0]}")
print()
print(f"Elements between -1 and 1:\n{tensor[(-1 < tensor) & (tensor < 1)]}")

### Aggregation

Since tensors are used for representation of input data, in many scenarios, we are interested in aggregate measures, more precisely statistics, along various axes. Most common descriptives are implemented as methods of `Tensor` objects. Note that some aggregates return a corresponding type that is more of a view with a tensor of indices supplementing it (for instance, minimum, maximum, median, etc...).

In [None]:
sample = torch.normal(0, 1, (5, 5, 5))

print(f"Tensor example used is:\n{sample}")
print()
print(f"Mean across all three axes is:\n{sample.mean()}")
print()
print(f"Means across x axis are:\n{sample.mean(0)}")
print()
print(f"Maximums across y axis are:\n{sample.max((1))}")
print()
print(f"Minimums across z axis are:\n{sample.min(2)}")
print()
print(f"Standard deviations across x and y axes are:\n{sample.std((0, 1))}")
print()
print(f"Medians across x and z axes are:\n{sample.median((0))[0].median(1)}")
print()
print(f"Cummulative sums across y and z axes are:\n{sample.cumsum(1)[0].cumsum(1)}")
print()

### CPU vs. GPU

When a GPU is available, then ideally, we should be able to leverage it for computation. This requires the tensor to be stored on the GPU device rather than the CPU which is the default.

In [None]:
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(DEVICE)

In [None]:
tensor_on_cpu = torch.rand(3, 3)
tensor_on_gpu = torch.rand(3, 3, device=DEVICE)
print(f"Tensor\n{tensor_on_cpu}\n  is stored on {tensor_on_cpu.device}.")
print()
print(f"Tensor\n{tensor_on_cpu}\n  is stored on {tensor_on_gpu.device}.")
print()
print(f"We can also move between the devices with `.to(<device>)`tensor_on_cpu.to(DEVICE).device

### References

[1] Learn PyTorch for Deep Learning: Zero to Mastery book, accessed online on 2023.03.25.

[2] Papers with Code trends, accessed online on 2023.03.25.