# Neural Network Programming – Deep Learning with PyTorch

Link: http://deeplizard.com/learn/video/v5cngxo4mIg

# PyTorch Explained - Python Deep Learning Neural Network API

Link: http://deeplizard.com/learn/video/iTKbyFh-7GM

PyTorch is a python package that provides two high-level features:

- Tensor computation (like numpy) with strong GPU acceleration
- Deep Neural Networks built on a tape-based autodiff system

# PyTorch Install – Quick and Easy

Link: http://deeplizard.com/learn/video/UWlFM0R_x6I

## Verify the PyTorch install

In [None]:
import torch

In [None]:
print(torch.__version__)

In [None]:
torch.cuda.is_available()

In [None]:
torch.version.cuda

# CUDA Explained – Why Deep Learning Uses GPUs

Link: http://deeplizard.com/learn/video/6stDhEA0wFQ

In [None]:
t = torch.tensor([1,2,3])
t

In [None]:
t = t.cuda()
t

# Tensors Explained – Data Structures of Deep Learning

## Indexes

In [None]:
a = [1,2,3,4]

In [None]:
a[2]

In [None]:
dd = [
[1,2,3],
[4,5,6],
[7,8,9]
]

In [None]:
dd[0][2]

## Rank, Axes, and Shape Explained – Tensors for Deep Learning

## Rank

A tensor's *rank* tells us how many indexes are needed to refer to a specific element within the tensor.

## Axes

In [None]:
dd = [
[1,2,3],
[4,5,6],
[7,8,9]
]

In [None]:
dd[0]

In [None]:
dd[1]

In [None]:
dd[2]

The first index of axis 2:

In [None]:
dd[0][0]

In [None]:
dd[1][0]

In [None]:
dd[2][0]

The second index of axis 2:

In [None]:
dd[0][1]

In [None]:
dd[1][1]

In [None]:
dd[2][1]

The third index of axis 3:

In [None]:
dd[0][2]

In [None]:
dd[1][2]

In [None]:
dd[2][2]

## Shape

In [None]:
dd = [
[1,2,3],
[4,5,6],
[7,8,9]
]

In [None]:
t = torch.tensor(dd)
t

In [None]:
type(t)

In [None]:
t.shape

A tensor's shape is important:

In [None]:
t.reshape(1,9)

In [None]:
t.reshape(1,9).shape

# PyTorch Tensors Explained - Neural Network Programming

In [None]:
import torch
import numpy as np

## Tensor class

Tensors in PyTorch are represented using the ```torch.Tensor``` class. 

In [None]:
t = torch.Tensor()
type(t)

## Tensor attributes

In [None]:
print(t.dtype)
print(t.device)
print(t.layout)

In [None]:
device = torch.device('cuda:0')
device

## Creation options using data

In [None]:
data = np.array([1,2,3])
type(data)

In [None]:
torch.Tensor(data)

In [None]:
torch.tensor(data)

In [None]:
torch.as_tensor(data)

In [None]:
torch.from_numpy(data)

## Creation options without data

In [None]:
torch.eye(2)

In [None]:
torch.zeros(2,2)

In [None]:
torch.ones(2,2)

In [None]:
torch.rand(2,2)

## Creating PyTorch Tensors – Best Options

In [None]:
import torch
import numpy as np

In [None]:
data = np.array([1,2,3])

In [None]:
t1 = torch.Tensor(data)
t2 = torch.tensor(data)
t3 = torch.as_tensor(data)
t4 = torch.from_numpy(data)

In [None]:
print(t1)
print(t2)
print(t3)
print(t4)

In [None]:
print(t1.dtype)
print(t2.dtype)
print(t3.dtype)
print(t4.dtype)

The constructor function used the default dtype.

In [None]:
torch.get_default_dtype()

In [None]:
t1.dtype == torch.get_default_dtype()

The factory functions infer the dtype.

In [None]:
torch.tensor(np.array([1,2,3]))

In [None]:
torch.tensor(np.array([1.,2.,3.]))

The dtype can be explicitly set.

In [None]:
torch.tensor(np.array([1,2,3]), dtype=torch.float64)

## Memory: Sharing vs Copying

In [None]:
data = np.array([1,2,3])
data

In [None]:
t1 = torch.Tensor(data)
t2 = torch.tensor(data)
t3 = torch.as_tensor(data)
t4 = torch.from_numpy(data)

In [None]:
data[0] = 0
data[1] = 0
data[2] = 0

In [None]:
print(t1)
print(t2)

In [None]:
print(t3)
print(t4)

# Tensor Operations for Deep Learning

Operations by category:
- Reshaping ops
- Element-wise ops
- Reduction ops
- Access ops


## Reshaping Explained - PyTorch Tensors for Deep Learning

In [None]:
import torch

In [None]:
t = torch.tensor([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3]
], dtype=torch.float32)

In [None]:
t.size()

In [None]:
t.shape

In [None]:
len(t.shape)

In [None]:
torch.tensor(t.shape).prod()

In [None]:
t.numel()

In [None]:
t.reshape(1,12)

In [None]:
t.reshape(2,6)

In [None]:
t.reshape(3,4)

In [None]:
t.reshape(4,3)

In [None]:
t.reshape(6,2)

In [None]:
t.reshape(12,1)

In [None]:
t.reshape(2,2,3)

In [None]:
print(t.reshape(1,12))
print(t.reshape(1,12).shape)

In [None]:
print(t.reshape(1,12).squeeze())
print(t.reshape(1,12).squeeze().shape)

In [None]:
print(t.reshape(1,12).squeeze().unsqueeze(dim=0))
print(t.reshape(1,12).squeeze().unsqueeze(dim=0).shape)

In [None]:
def flatten(t):
    t = t.reshape(1,-1)
    t = t.squeeze()
    return t

In [None]:
flatten(t)

In [None]:
t.reshape(-1)

## Reshaping Explained - PyTorch Tensors for Deep Learning

In [None]:
import torch

### Shape/Size of a tensor

In [None]:
t = torch.tensor([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3]
], dtype=torch.float32)

In [None]:
t.size()

In [None]:
t.shape

In [None]:
len(t.shape)

In [None]:
torch.tensor(t.shape).prod()

In [None]:
t.numel()

Since our tensor ```t``` has 12 elements, any reshaping must account for the number of elements (```12```). 

### Reshape/View

In [None]:
t.reshape(1,12)

In [None]:
t.reshape(2,6)

In [None]:
t.reshape(3,4)

In [None]:
t.reshape(4,3)

In [None]:
t.reshape(6,2)

In [None]:
t.reshape(12,1)

Reshaping and changing the rank.

In [None]:
t.reshape(2,2,3)

### Squeeze/Unsqueeze

In [None]:
print(t.reshape(1,12))
print(t.reshape(1,12).shape)

In [None]:
print(t.reshape(1,12).squeeze())
print(t.reshape(1,12).squeeze().shape)

In [None]:
print(t.reshape(1,12).squeeze().unsqueeze(dim=0))
print(t.reshape(1,12).squeeze().unsqueeze(dim=0).shape)

Here is a use case for squeeze:

In [None]:
def flatten(t):
    t = t.reshape(1, -1)
    t = t.squeeze()
    return t

In [None]:
flatten(t)

### Concantenate two tensors

In [None]:
t1 = torch.tensor([
    [1,2],
    [3,4]
])
t2 = torch.tensor([
    [5,6],
    [7,8]
])

In [None]:
torch.cat(t1, t2, dim=0)

In [None]:
torch.cat((t1, t2), dim=0).shape

In [None]:
torch.cat((t1, t2), dim=1)

In [None]:
torch.cat((t1, t2), dim=1).shape

## CNN Flatten Operation Visualized – Tensor Batch Processing for Deep Learning

In [None]:
import torch

In [None]:
t1 = torch.tensor([
    [1,1,1,1],
    [1,1,1,1],
    [1,1,1,1],
    [1,1,1,1]
])

t2 = torch.tensor([
    [2,2,2,2],
    [2,2,2,2],
    [2,2,2,2],
    [2,2,2,2]
])

t3 = torch.tensor([
    [3,3,3,3],
    [3,3,3,3],
    [3,3,3,3],
    [3,3,3,3]
])

In [None]:
t = torch.stack((t1,t2,t3))
t.shape

In [None]:
t

Add the color channel axis so we'll have: `(Batch size, Color channels, Height, Width)`

In [None]:
t = t.reshape(3,1,4,4)
t

In [None]:
t[0]

In [None]:
t[0][0]

In [None]:
t[0][0][0]

In [None]:
t[0][0][0][0]

### Shout outs for alternative implementations of the `flatten()`s function!

In [None]:
t.reshape(1,-1)[0] # Thank you Mick!

In [None]:
t.reshape(-1) # Thank you Aamir!

In [None]:
t.view(t.numel()) # Thank you Ulm!

In [None]:
t.flatten() # Thank you PyTorch!

### Only Flatten the last 3 axes

In [None]:
t.flatten(start_dim=1).shape

In [None]:
t.flatten(start_dim=1)

## Element-wise tensor operations - Deep Learning and Neural Networks

Element-wise means that we operate on corresponding elements.  
Two elements are said to corresponding if the two elements occupy the same position within the tensor.  
The position is determined by the indexes used to specify a given element.

In [None]:
import torch
import numpy as np

In [None]:
t1 = torch.tensor([
    [1,2],
    [3,4]
], dtype=torch.float32)

t2 = torch.tensor([
    [9,8],
    [7,6]
], dtype=torch.float32)

In [None]:
# Example from the first axis
t1[0]

In [None]:
# Example from the second axis
t1[0][0]

In [None]:
t1[0][0]

In [None]:
t2[0][0]

### Addition is an element-wise operation

In [None]:
t1 + t2

### Arithmetic operations are element-wise operations

Let's see how we can add scalar values to tensors:

In [None]:
t1 + 2

In [None]:
t1 - 2

In [None]:
t1 * 2

In [None]:
t1 / 2

In [None]:
t1.add(2)

In [None]:
t1.sub(2)

In [None]:
t1.mul(2)

In [None]:
t1.div(2)

These examples make it look like element-wise means *for each element*.  
However, this is misleading.  
A concept called broadcasting is being used here.

#### Broadcasting tensors

What is really happening in ```t1 + 2``` is that the scaler ```2``` is being broadcasted to the shape of ```t1```.  
This can be done using Numpy like this:

In [None]:
np.broadcast_to(2, t1.shape)

After the new tensor is created from the broadcast operation, the element-wise addition is performed.  
So this:

In [None]:
t1 + 2

is really this:

In [None]:
t1 + torch.tensor(
    np.broadcast_to(2, t1.shape)
    ,dtype=torch.float32
)

Trickier example:

In [None]:
t1 = torch.tensor([
    [1,1],
    [1,1]
], dtype=torch.float32)

t2 = torch.tensor([2,4], dtype=torch.float32)

In [None]:
# t1 + t2 ???????

In [None]:
t1.shape

In [None]:
t2.shape

In [None]:
np.broadcast_to(t2.numpy(), t1.shape)

In [None]:
t1 + t2

See this one for a deeper discussion on broadcasting: http://deeplizard.com/learn/video/6_33ulFDuCg

### Element-wise comparison operations

In [None]:
t = torch.tensor([
    [0,5,7],
    [6,0,7],
    [0,8,0]
], dtype=torch.float32)

In [None]:
t.eq(0)

In [None]:
t.ge(0)

In [None]:
t.gt(0)

In [None]:
t.lt(0)

In [None]:
t.le(7)

Thinking about these types of operations from a broadcasting perspective, we can see that ```t.le(7)``` is the same thing as:

In [None]:
t <= torch.tensor(
    np.broadcast_to(7, t.shape)
    ,dtype=torch.float32
)

and this:

In [None]:
t <= torch.tensor([
    [7,7,7],
    [7,7,7],
    [7,7,7]
], dtype=torch.float32)

### Element-wise operations using functions

In [None]:
t.abs()

In [None]:
t.sqrt()

In [None]:
t.neg()

In [None]:
t.neg().abs()

## Argmax and Reduction Ops - Tensors for Deep Learning

In [None]:
import torch
import numpy as np

A tensor *reduction operation* is an operation that reduces the number of elements contained within a given tensor. 

Suppose we have the following tensor:

In [None]:
t = torch.tensor([
    [0,1,0],
    [2,0,2],
    [0,3,0]
], dtype=torch.float32)

Observe that the sum function returns the sum of all the elements in the tensor:

In [None]:
t.sum()

In [None]:
t.numel()

In [None]:
t.sum().numel()

In [None]:
t.sum().numel() < t.numel()

Therefore, we can conclude that the ```sum()``` function is a reduction operation.

### Common reduction operations

In [None]:
t.sum()

In [None]:
t.prod()

In [None]:
t.mean()

In [None]:
t.std()

### Reduce specific axes

In [None]:
t = torch.tensor([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3]
], dtype=torch.float32)

In [None]:
t.sum(dim=0)

In [None]:
t.sum(dim=1)

Element-wise opeartions are in play here:

In [None]:
t.sum(dim=0)

In [None]:
t[0]

In [None]:
t[1]

In [None]:
t[2]

In [None]:
t[0] + t[1] + t[2]

Three groups of four:

In [None]:
t[0].sum()

In [None]:
t[1].sum()

In [None]:
t[2].sum()

In [None]:
t.sum(dim=1)

### Max/Argmax

In [None]:
t = torch.tensor([
    [1,0,0,2],
    [0,3,3,0],
    [4,0,0,5]
], dtype=torch.float32)

In [None]:
t.max()

In [None]:
t.argmax()

In [None]:
t.flatten()

If we specify a dimension, the ```max()``` function will return two tensors.  

In [None]:
t.max(dim=0)

In [None]:
t.argmax(dim=0)

In [None]:
t.max(dim=1)

In [None]:
t.argmax(dim=1)

## Accessing operations

In [None]:
t = torch.tensor([
    [1,2,3],
    [4,5,6],
    [7,8,9]
], dtype=torch.float32## Accessing operations)

In [None]:
t.mean()

In [None]:
t.mean().item()

In [None]:
t.mean(dim=0).tolist()

In [None]:
t.mean(dim=0).numpy()