In [1]:
import torch

# Introduction to PyTorch

## Schedule

### PyTorch Overview

- What is PyTorch?
- Applications
- History
- Overview of PyTorch Components

### What is PyTorch?

> Pytorch is an optimized tensor library for deep learning using GPUs and CPUs.

But what does that really mean?

#### What is a tensor
For the purposes of machine learning, a tensor is a multi-dimensional array.
You are familiar with one-dimensional arrays, it is just a row vector:
$$
\left(
  \begin{array}{cccc}
    1 & 2 & 3 & 4
  \end{array}
\right)
$$
and you are familiar with two-dimensional arrays, which is just a matrix:
$$
\left(
  \begin{array}{cc}
    1 & 0 \\
    0 & 1
  \end{array}
\right)
$$
A tensor extends this analogy to multiple dimensions.

The main reason we need to extend this analogy is that the data we will work with is more naturally stored in tensors.
For example, images are typically stored in rgb (red, green, blue) format.
So, each image consists of a red part, a green part, and a blue part.
Each of these parts consists of a matrix that encodes the intensity of a pixel of that part.
An image is stored as 3 $2 \times 2$ matrices, or in other words, which we will make more precise, it is a tensor with dimensions or shape $(3, H, W)$ or $(H, W, 3)$, depending on the format you are storing it in, here $H$ denotes height, and $W$ denotes width.
When we want to train a machine learning model, we typically "batch" multiple samples together and so a batch of images would be a tensor with dimensions or shape $(B, 3, H, W)$ or $(B, H, W, 3)$, where $B$ is the number of samples in the batch.

Similary, if we have text, it will be tokenized, we'll talk more about that later, it will be in a multi-dimensional tensor.
Typically, it is of the form $(B, L)$ or $(B, 1, L)$.

#### What is deep learning?

Deep learning is a term that is beginning to fall out of fashion, replaced by older, yet newer words such as Artificial Intelligence.
Deep learning is the field of neural networks with many layers.
Mathematically, if $f_{k}$ for $k = 1, 2, \dots, d$ are functions of the form
$$
f_{k}(\mathbf{x}) = g_{k}(\mathbf{A}_{k}\mathbf{x} + \mathbf{b}_{k}, \mathbf{x})
$$
where $\mathbf{x} \in \mathbb{R}^{n_{k - 1} \times 1}$, $\mathbf{A}_{k}$ is an $n_{k} \times m$ matrix and $\mathbf{b}_{k} \in \mathbb{R}^{n_{k} \times 1}$ and $n_{0} = m$ and $g_{k}: \mathbb{R}^{n_{k} \times 1} \times \mathbb{R}^{n_{k - 1} \times 1} \to \mathbb{R}^{n_{k} \times 1}$.
Or in other words, a layer is an affine function composed with a non-linear function.
It should be noted that, $g_{k}$'s range is for mathematical convenience.
It should also be noted that often the $g_{k}$ functions operate component-wise.
That is, if we have a vector
$$
\textbf{x} =
\left(
\begin{array}{c}
1 \\
2 \\
3 \\
4
\end{array}
\right)
$$
then
$$
g(\textbf{x}) =
\left(
\begin{array}{c}
g(1) \\
g(2) \\
g(3) \\
g(4)
\end{array}
\right)
$$ 

There isn't a really set number for how deep a neural network has to be before it's considered a deep neural network.

### Why GPUs?

PyTorch is optimized for both GPUs and CPUs, but the primary use case is GPUs.
So, why GPUs?
As noted before, there are a few neural network components that comprise the majority of FLOPs (FLoating point OPerations):
- matrix matrix multiplication
- matrix vector multiplication
- matrix and vector addition
- component-wise functions.
All of these components are operations that are highly parallelizable.

CPUs are typically composed of a large amount of cache and a large control unit with a few arithmetic logic units (ALU).
GPUs, on the other hand, have a lot of small pieces of cache with a lot of small control units with a large amount of ALUs.
Further, the chip is divided up in such a way that each individual cache and each invidual control unit controls multiple ALUs.
This is possible, because it is expected that multiple ALUs will be performing the same type of operation.

### Applications

Deep learning has found a large range of applications:
- image recognition
- objection detection
- image segmentation
- automatic speech transcription