# PyTorch basics

In [1]:
import torch
import numpy as np

# Check PyTorch version
torch.__version__

'1.5.0'

## Creating Torch Tensors
See https://pytorch.org/docs/stable/tensors.html on tensor types.
The default tensor is 32 bit. If tensor is another dtpe, this will be shown when the tensor is printed. Frequently 16 bit will be enough.

In [2]:
# Create empty tensor (32 bit float)
a = torch.empty(3,2)
print (a, a.dtype)

tensor([[1.0779e+04, 4.5602e-41],
        [1.0779e+04, 4.5602e-41],
        [4.1486e-08, 1.3556e-19]]) torch.float32


In [3]:
# Create a zero tensor (32 bit float, by default)
a = torch.zeros(4, 3)
print (a, a.dtype)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]]) torch.float32


In [4]:
# Create a ones tensor (32 bit float, by default)
x = torch.ones(4, 3)
print (a, a.dtype)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]]) torch.float32


In [5]:
# Create a zero tensor using integers
a = torch.zeros(4, 3,  dtype=torch.int64)
print (a, a.dtype)

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]]) torch.int64


In [6]:
# Use PyTorch arange like NumPy arange:
a = torch.arange(0, 18, 2).reshape(3,3)
print (a)
print (a.dtype)

# Use decimals to create a float tensor
a = torch.arange(0.0, 18.0, 2.0).reshape(3,3)
print()
print (a)
print (a.dtype)

tensor([[ 0,  2,  4],
        [ 6,  8, 10],
        [12, 14, 16]])
torch.int64

tensor([[ 0.,  2.,  4.],
        [ 6.,  8., 10.],
        [12., 14., 16.]])
torch.float32


In [7]:
# Or linspace:
a = torch.linspace(0.0, 22.0, 12).reshape(3,4)
print()
print (a)
print (a.dtype)


tensor([[ 0.,  2.,  4.,  6.],
        [ 8., 10., 12., 14.],
        [16., 18., 20., 22.]])
torch.float32


In [8]:
# Uniform random distribution (0-1)
a = torch.rand(4,3)
print (a)
print (a.dtype)

tensor([[0.0477, 0.5256, 0.5988],
        [0.6072, 0.9047, 0.5129],
        [0.5905, 0.7084, 0.9485],
        [0.9110, 0.8052, 0.2731]])
torch.float32


In [9]:
# Normal standard distribution
a = torch.randn(4,3)
print (a)
print (a.dtype)

tensor([[ 0.3883,  0.7024, -0.3425],
        [ 1.1361,  0.3308,  0.2970],
        [-0.2772, -0.6546,  0.0425],
        [-0.9947,  1.4162, -0.8947]])
torch.float32


In [10]:
# Normal integer
a = torch.randint(low=0, high=4, size=(5,5))
print (a)
print (a.dtype)

tensor([[3, 3, 0, 3, 3],
        [3, 3, 3, 1, 1],
        [2, 1, 0, 2, 1],
        [2, 0, 3, 1, 2],
        [3, 3, 2, 2, 3]])
torch.int64


In [11]:
# Use dimensions of another tensor and generate random numbers
x = torch.zeros(2,5)
a = torch.randn_like(x)
print (a)
print (a.dtype)

# Similar methods are:
# torch.zeros_like(input)
# torch.ones_like(input)

tensor([[-0.8796,  1.1050, -0.3258, -0.5240, -0.7160],
        [ 0.8189, -0.3603,  0.6720,  0.3699, -1.5484]])
torch.float32


In [12]:
# Set random seed
torch.manual_seed(42)
a = torch.rand(2, 3)
print(a)

tensor([[0.8823, 0.9150, 0.3829],
        [0.9593, 0.3904, 0.6009]])


### Changing tensor data types

In [13]:
# Create a zero tensor using integers
a = torch.zeros(4, 3,  dtype=torch.int64)
print (a, a.dtype)

# Convert to 32 bit
b = a.type(torch.float32)
print ()
print (b, b.dtype)

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]]) torch.int64

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]]) torch.float32


### Create Torch Tensor from NumPy array
Note: NumPy works in 64 bit precision by default, but we usally work in 32 bit in PyTorch (for efficiency).

Additionally to the methods below, the `torch.from_numpy` method creates a link between a NumPy array and a Torch Tensor, such that if one is changed, so is the other. Usually though, we wish to have them as separate entities, with the Torch Tensor being a copy of the NumPy array.

In [14]:
# Create from numpy array
# Define dtype in NumPy: note default NumPy array is 64 bit
n = np.zeros(shape=(3,2))
b = torch.tensor(n)
b

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]], dtype=torch.float64)

In [15]:
# Define NumPy array as 32 bit first
n = np.zeros(shape=(3,2), dtype=np.float32)
b = torch.tensor(n)
b

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

In [16]:
# Or change Torch tensor to 32 bit
n = np.zeros(shape=(3,2))
b = torch.tensor(n, dtype=torch.float32)
b

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

To get tensor values as NumPy arrays use `.numpy()`

In [17]:
b.numpy()

array([[0., 0.],
       [0., 0.],
       [0., 0.]], dtype=float32)

`torch.tensor` (lower case t in tensort) infers the data type from the original object. `torch.Tensor` (upper case T in Tensor) is an alias for `torch.FloatTensor` and will create a 32 bit floating point tensor object. 

See the example below, passing first an integer array to Torc, and then a 64 bit NumPy array

In [18]:
n = np.array([1,2,3])
a = torch.tensor(n)
b = torch.Tensor(n)

print (a, a.dtype)
print (b, b.dtype)

tensor([1, 2, 3]) torch.int64
tensor([1., 2., 3.]) torch.float32


In [19]:
n = np.random.random(3)
a = torch.tensor(n)
b = torch.Tensor(n)

print (a, a.dtype)
print (b, b.dtype)

tensor([0.4714, 0.5480, 0.2013], dtype=torch.float64) torch.float64
tensor([0.4714, 0.5480, 0.2013]) torch.float32


## Scalar tensors

In [20]:
a = torch.tensor([1,2,3])
a

tensor([1, 2, 3])

In [21]:
s = a.sum()
s

tensor(6)

Use `.item` to get scalr tensor value as ordinary python sclalar.

In [22]:
s.item()

6

### Tensor attributes

In [23]:
a = torch.zeros(4, 3)

print (a.dtype)
print (a.shape)
print (a.device)

torch.float32
torch.Size([4, 3])
cpu


## Tensor operations

See all operations at https://pytorch.org/docs/stable/index.html.

Inplace operations are followed by `_` and work on the tensor itslef, rather than producing a copy. This is usually more efficient.

Pytorch has many functions that parallel NumPy.

<table style="display: inline-block">
<caption style="text-align: center"><strong>Arithmetic</strong></caption>
<tr><th>OPERATION</th><th>FUNCTION</th><th>DESCRIPTION</th></tr>
<tr><td>a + b</td><td>a.add(b)</td><td>element wise addition</td></tr>
<tr><td>a - b</td><td>a.sub(b)</td><td>subtraction</td></tr>
<tr><td>a * b</td><td>a.mul(b)</td><td>multiplication</td></tr>
<tr><td>a / b</td><td>a.div(b)</td><td>division</td></tr>
<tr><td>a % b</td><td>a.fmod(b)</td><td>modulo (remainder after division)</td></tr>
<tr><td>a<sup>b</sup></td><td>a.pow(b)</td><td>power</td></tr>
</table>

<table style="display: inline-block">
<caption style="text-align: center"><strong>Monomial Operations</strong></caption>
<tr><th>OPERATION</th><th>FUNCTION</th><th>DESCRIPTION</th></tr>
<tr><td>|a|</td><td>torch.abs(a)</td><td>absolute value</td></tr>
<tr><td>1/a</td><td>torch.reciprocal(a)</td><td>reciprocal</td></tr>
<tr><td>$\sqrt{a}$</td><td>torch.sqrt(a)</td><td>square root</td></tr>
<tr><td>log(a)</td><td>torch.log(a)</td><td>natural log</td></tr>
<tr><td>e<sup>a</sup></td><td>torch.exp(a)</td><td>exponential</td></tr>
<tr><td>12.34  ==>  12.</td><td>torch.trunc(a)</td><td>truncated integer</td></tr>
<tr><td>12.34  ==>  0.34</td><td>torch.frac(a)</td><td>fractional component</td></tr>
</table>

<table style="display: inline-block">
<caption style="text-align: center"><strong>Summary Statistics</strong></caption>
<tr><th>OPERATION</th><th>FUNCTION</th><th>DESCRIPTION</th></tr>
<tr><td>$\sum a$</td><td>torch.sum(a)</td><td>sum</td></tr>
<tr><td>$\bar a$</td><td>torch.mean(a)</td><td>mean</td></tr>
<tr><td>a<sub>max</sub></td><td>torch.max(a)</td><td>maximum</td></tr>
<tr><td>a<sub>min</sub></td><td>torch.min(a)</td><td>minimum</td></tr>
<tr><td colspan="3">torch.max(a,b) returns a tensor of size a<br>containing the element wise max between a and b</td></tr>
</table>

## Dot products
A <a href='https://en.wikipedia.org/wiki/Dot_product'>dot product</a> is the sum of the products of the corresponding entries of two 1D tensors. If the tensors are both vectors, the dot product is given as:<br>

$\begin{bmatrix} a & b & c \end{bmatrix} \;\cdot\; \begin{bmatrix} d & e & f \end{bmatrix} = ad + be + cf$

If the tensors include a column vector, then the dot product is the sum of the result of the multiplied matrices. For example:<br>
$\begin{bmatrix} a & b & c \end{bmatrix} \;\cdot\; \begin{bmatrix} d \\ e \\ f \end{bmatrix} = ad + be + cf$<br><br>
Dot products can be expressed as <a href='https://pytorch.org/docs/stable/torch.html#torch.dot'><strong><tt>torch.dot(a,b)</tt></strong></a> or `a.dot(b)` or `b.dot(a)`

In [36]:
a = torch.tensor([1,2,3], dtype=torch.float)
b = torch.tensor([4,5,6], dtype=torch.float)
print(a.mul(b)) # for reference
print()
print(a.dot(b))

tensor([ 4., 10., 18.])

tensor(32.)


<div class="alert alert-info"><strong>NOTE:</strong> There's a slight difference between <tt>torch.dot()</tt> and <tt>numpy.dot()</tt>. While <tt>torch.dot()</tt> only accepts 1D arguments and returns a dot product, <tt>numpy.dot()</tt> also accepts 2D arguments and performs matrix multiplication. We show matrix multiplication below.</div>

## Matrix multiplication
2D <a href='https://en.wikipedia.org/wiki/Matrix_multiplication'>Matrix multiplication</a> is possible when the number of columns in tensor <strong><tt>A</tt></strong> matches the number of rows in tensor <strong><tt>B</tt></strong>. In this case, the product of tensor <strong><tt>A</tt></strong> with size $(x,y)$ and tensor <strong><tt>B</tt></strong> with size $(y,z)$ results in a tensor of size $(x,z)$
<div>
<div align="left"><img src='./Images/Matrix_multiplication_diagram.png' align="left"><br><br>

$\begin{bmatrix} a & b & c \\
d & e & f \end{bmatrix} \;\times\; \begin{bmatrix} m & n \\ p & q \\ r & s \end{bmatrix} = \begin{bmatrix} (am+bp+cr) & (an+bq+cs) \\
(dm+ep+fr) & (dn+eq+fs) \end{bmatrix}$</div></div>

<div style="clear:both">Image source: <a href='https://commons.wikimedia.org/wiki/File:Matrix_multiplication_diagram_2.svg'>https://commons.wikimedia.org/wiki/File:Matrix_multiplication_diagram_2.svg</a></div>

Matrix multiplication can be computed using <a href='https://pytorch.org/docs/stable/torch.html#torch.mm'><strong><tt>torch.mm(a,b)</tt></strong></a> or `a.mm(b)` or `a @ b`

In [39]:
a = torch.tensor([[0,2,4],[1,3,5]], dtype=torch.float)
b = torch.tensor([[6,7],[8,9],[10,11]], dtype=torch.float)

print('a: ',a.size())
print('b: ',b.size())
print('a x b: ',torch.mm(a,b).size())

a:  torch.Size([2, 3])
b:  torch.Size([3, 2])
a x b:  torch.Size([2, 2])


In [40]:
print(torch.mm(a,b))

tensor([[56., 62.],
        [80., 89.]])


In [41]:
# Or (the same)

print(a.mm(b))

tensor([[56., 62.],
        [80., 89.]])


In [42]:
# Or (the same, but a bit obscure shorthand)

print(a @ b)

tensor([[56., 62.],
        [80., 89.]])


## GPU tensors

Tensors may be passed to/from GPUs using `cuda` device.

See https://pytorch.org/docs/stable/notes/cuda.html

By default, tensors are created on the CPU.

In [24]:
a = torch.FloatTensor(3,2)

# Pass to GPU if one exists (and PyTorch is enables to use GPUs)
try:
    a = a.cuda()
except:
    pass

a.device

device(type='cpu')

## Indexing and slicing

Indexing and slicing works like NumPy.

In [25]:
x = torch.arange(6).reshape(3,2)
print(x)

tensor([[0, 1],
        [2, 3],
        [4, 5]])


In [26]:
# Grabbing the right hand column values
x[:,1]

tensor([1, 3, 5])

In [27]:
# Grabbing the right hand column as a (3,1) slice
x[:,1:]

tensor([[1],
        [3],
        [5]])

## Reshape with view and reshape

View and reshape are essentially the same.

In [28]:
x = torch.arange(10)
print(x)

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


In [29]:
x = x.reshape(2,5)

By passing in <tt>-1</tt> PyTorch will infer the correct value from the given tensor

In [30]:
x.view(2,-1)

tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])

In [31]:
x.view(-1,5)

tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])

Adopt another Tensors shape

In [32]:
z = x.view(2,5)
x.view_as(z)

tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])

## Neural net building blocks

The `torch.nn` package contains lots of basic building blocks.
Example of building sequential ANN:

In [33]:
import torch.nn as nn

# Three layer nn ending in Softmax across dimension 1

s = nn.Sequential(
    nn.Linear(2, 5),
    nn.ReLU(),
    nn.Linear(5,20),
    nn.ReLU(),
    nn.Linear(20,10),
    nn.Dropout(p=0.3),
    nn.Softmax(dim=1)
)

s(torch.FloatTensor([[1,2]]))


tensor([[0.1021, 0.1021, 0.1021, 0.0779, 0.1021, 0.1222, 0.0988, 0.0882, 0.1021,
         0.1021]], grad_fn=<SoftmaxBackward>)

## Pytorch modules

We can create our own building clocks with the `nn.Module` class. This class also allows easy handling of the whole network (e.g. getting all parameters across the net).

In [34]:
class OurModule(nn.Module):
    def __init__(self, num_inputs, num_classes, dropout_prob=0.3):
        super(OurModule, self).__init__()
        self.pipe = s = nn.Sequential(
                            nn.Linear(num_inputs, 5),
                            nn.ReLU(),
                            nn.Linear(5,20),
                            nn.ReLU(),
                            nn.Linear(20,num_classes),
                            nn.Dropout(p=0.3),
                            nn.Softmax(dim=1)
                            )
            
    # Modify the classess forward method
    def forward(self,x):
        return self.pipe(x)
    
net = OurModule(num_inputs=2, num_classes=3)
v = torch.FloatTensor([[2,3]])
out = net(v)
print(net)
print(out)

OurModule(
  (pipe): Sequential(
    (0): Linear(in_features=2, out_features=5, bias=True)
    (1): ReLU()
    (2): Linear(in_features=5, out_features=20, bias=True)
    (3): ReLU()
    (4): Linear(in_features=20, out_features=3, bias=True)
    (5): Dropout(p=0.3, inplace=False)
    (6): Softmax(dim=1)
  )
)
tensor([[0.4366, 0.3595, 0.2039]], grad_fn=<SoftmaxBackward>)


## Loss functions

Loss functions usually accept two arguments: prediction and observed.

Most common loss functions are:

* `nn.MSELoss`: mean square error (regression)
* `nn.BCELoss` and `nn.BCEWithLogits`: Binary cross entropy (binary classification). With logits applies sigmoid transormation itslef and is usually more stable. Without logits expects a single probability value (i.e. sigmoid already done).
* `nn.CrossEntropyLoss` and `NLLLoss`: Multiclass maximum likelyhood. First version applies LogSoftmax internally, while the second expects log proabilities as the input.

## Optimizers

Commonly used optimizers are:

* `SGD`: vanilla stochastic gradient descent
* `RMSpropr`: Hinton's optimizer
* `Adagrad`: Adaptive gradiants optimizer

## Blue print of a training loop

In [35]:
"""
# Loop that iterates over data creating X and y batches
for batch_samples, batch_labels in iterate_batches(data, batch_size=32):
    # Create tensors from batch X and y
    batch_samples_t = torch.tensor(batch_samples)
    batch_labels_t = torch.tensor(batch_labels)
    # Caluclate output of net
    out_t = net(batch_samples_t)
    # Calculate loss
    loss_t = loss_function(out_t, batch_labels_t)
    # Back propogate loss and calulcate gradients
    loss_t.backward()
    # Apply optimizer
    optimizer.step()
    # Reset network gradients (can also be done at start of loop)
    optimizer.zero_grad()
""";