### Learning Pytorch
### Pytorch led by Facebook AI Research (FAIR) is a minimal library built off Torch as a python extension of Lua which can directly use CUDA GPUs for accelerated processing - and has good codeflow.


### The main modules of pytorch as as below and they are self explanatory in sense. 

<img src = "images/pytorch_modules.png">

<br><br>
- torch.nn = Base class for all neural network modules.
    - torch.nn.functional = functionality to blocks of network
<br><br>
- torch.Tensor = multi-dimensional matrix containing elements of a single data type.
<br><br>
- torch.autograd = provides classes and functions implementing automatic differentiation. A recorder records what operations have performed, and then it replays it backward to compute the gradients. This method is especially powerful when building neural networks to save time on one epoch by calculating differentiation of the parameters at the forward pass. 
<br><br>
- torch.utils.data = the heart of PyTorch data loading utility is the torch.utils.data.DataLoader class. It represents a Python iterable over a dataset.

#### Make sure you have an virutal env set up with anaconda, and conda installed pytorch, cudatoolkit, and jupyter.

### Verify the install


In [66]:
# Should succeed without hesitation
import torch

In [67]:
print(torch.__version__)

1.7.1


In [68]:
# Verify you have CUDA-supported NVIDIA GPU
torch.cuda.is_available()

True

In [69]:
# Should match the version of cudatoolkit installed
torch.version.cuda

'10.2'

### Sidenote: Why GPUs
For tasks such as mathematical computation, it is possbile to achieve solution through parallel computing because even complex equations can be broken down into smaller computations that are independent with respect to one another. GPUs are much more capable of parallel computation because of their massive core counts.

CUDA is a API library that has to be downloaded (like cudatoolkit) to enable this feature on NVIDIA GPUs.

Pytorch is verstile in that it can selectively hold mathematical objects such as tensors on different devices such as the GPU or CPU. 
#### Why not use a GPU for everything?
Moving simple computation into GPU will slow down code even more because the time it would take a modern CPU to complete the code can be done by the time the GPU receives the task. Remember GPUs are only amazing densely mathematical tasks that can be done in parallel.

In [70]:
# simply creating Pytorch tensor object
t = torch.Tensor([1,2,3])
t

tensor([1., 2., 3.])

In [71]:
# viewing GPU
device = torch.device('cuda:0')
device

device(type='cuda', index=0)

In [72]:
# Moving tensor object to GPU
t = t.cuda()
t

tensor([1., 2., 3.], device='cuda:0')

In [73]:
print(t.dtype)
print(t.device)
print(t.layout)

torch.float32
cuda:0
torch.strided


### Tensors
Tensors, in short, is a general term to describe a representation of numbers which can have rank, axes, and shapes. Think about an matrix, but with more dimensionability. 

Number of dimensions present within a tensor. How many indices you need to access an element - for example the above tensor 't' has a rank of 1 because each element only needs one index for access like t[1]. Axes are measures of which elements can exist, and multiple axes create more dimensions for elements to exist. 

Shapes of tensors encode all relevant info about rank, axes and even indices. Tensors are constantly reshaped and transformed (which is why their named tensors). Tensors can represent the same underlying data but have different shapes. Think about it: if 'a' was a list of lengths of toothpicks (or something weird like that) in the code below, changing its shape wouldn't change those lengths of the the toothpicks.


Shapes can be more than 2 or 3 dimensions. For example, an image can typically have three dimensions represented by height, width, and RGB value of a pixel: [C,H,W]. However, in deep learning, more dimensions can be added, such as batch size (an amount of images in a subgroup). Now an image will have 4 dimensions and look like: [B,C,H,W]. So now as a rank 4 tensor we can navigate to a specific pixel in a specific image with a specific color. 

Features can be extracted from these values by transforming them with specific operations. See more later on this.


In [74]:
a = [
    [4,5,6],
    [7,8,9]
]

In [75]:
a[1][2]

9

In [76]:
type(a)

list

In [77]:
#list to tensors; different tensor calls create different tensor types but still same shape
b = torch.tensor(a)
c = torch.Tensor(a)
print(type(b), b.dtype, b)
print(type(c), c.dtype,  c)

<class 'torch.Tensor'> torch.int64 tensor([[4, 5, 6],
        [7, 8, 9]])
<class 'torch.Tensor'> torch.float32 tensor([[4., 5., 6.],
        [7., 8., 9.]])


In [78]:
b.shape

torch.Size([2, 3])

In [79]:
#reshaping a tensor
b.reshape(1,6)

tensor([[4, 5, 6, 7, 8, 9]])

In [80]:
b.reshape(1,6).shape

torch.Size([1, 6])

In [81]:
#remember that calling reshape doesn't actually change b as a variable, only for runtime not storage
b.shape

torch.Size([2, 3])

In [82]:
import numpy as np

In [83]:
data = np.array([1,2,3])
type(data)

numpy.ndarray

#### Different ways to initiate a tensor and exploring their data types

In [84]:
#notice this is a different construction of a tensor then the next 4: Pytorch Class Constructor of a tensor  - unchanged by numpy array manipulation

t1 = torch.Tensor(data)
print(t1.dtype)

torch.float32


In [85]:
#factory function (accepts parameter inputs and outputs tensor objects) - unchanged by numpy array manipulation

t2 = torch.tensor(data)
print(t2.dtype)

torch.int64


In [86]:
#also a factory function --> changing the numpy array (variable 'data') will change these values

t3 = torch.as_tensor(data)
print(t3.dtype)

torch.int64


In [87]:
#also a factory function --> changing the numpy array (variable 'data') will change these values

t4 = torch.from_numpy(data)
print(t4.dtype)

torch.int64


### Recap
torch.Tensor(data) & torch.tensor(data) = **data is copied** but also different data types
<br>
torch.as_tensor(data) & torch.from_numpy(data) = **data is shared**(saves memory space) and same data types
<br><br>
torch.tensor() = best for everyday use
<br>
torch.as_tensor() = best for fine-tuning

In [88]:
# data type also is inferred by incoming data
print(torch.get_default_dtype()) 

torch.float32


In [89]:
#type inference - what you put in is what it becomes
a = torch.tensor(np.array([1,2,3]))
print(a)
a.dtype

tensor([1, 2, 3])


torch.int64

In [90]:
a = torch.tensor(np.array([1.,2.,3.]))
print(a)
a.dtype

tensor([1., 2., 3.], dtype=torch.float64)


torch.float64

In [91]:
#explicitly set
a = torch.tensor(np.array([1,2,3]), dtype=torch.float64)
print(a)
a.dtype

tensor([1., 2., 3.], dtype=torch.float64)


torch.float64

#### Creating multi-dimensional tensors

In [92]:
torch.eye(2)

tensor([[1., 0.],
        [0., 1.]])

In [93]:
print(torch.zeros(3,2))
print(torch.zeros(3,2).dtype)

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])
torch.float32


In [94]:
print(torch.ones(2,3))
print(torch.ones(2,3).dtype)

tensor([[1., 1., 1.],
        [1., 1., 1.]])
torch.float32


In [95]:
#between 0-1 and four decimals out
torch.rand(2,2)

tensor([[0.2401, 0.2106],
        [0.1259, 0.6181]])

# Main Tensor operation types:

   **1. Reshaping operations**
   <br><br>2. Element-wise operations
   <br>3. Reduction operations
   <br>4. Access operations


##    1. Reshaping operations 

In [96]:
# rank 2 tensor with two axes; one axis with length of 4 and made up of arrays,
#   the other axis has length of 3 and is made up of numbers
t = torch.tensor([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3]
], dtype=torch.float32)

In [97]:
t.size()

torch.Size([3, 4])

In [98]:
t.shape

torch.Size([3, 4])

In [99]:
# length of shape gives rank of a tensor
len(t.shape)

2

Find the number of elements in tensor product of dimensions or numel

In [100]:
torch.tensor(t.shape).prod()

tensor(12)

In [101]:
t.numel()

12

All reshaping operations for this 3 by 4 tensor will always have a product of 12 b/c the elements (and # of elements) is unchanged

Sidenote: that's kinda why their called tensors. Tensor elements are unchanged when ungoing transformations.

In [102]:
t.reshape(1,12)

tensor([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.]])

In [103]:
t.reshape(2,6)

tensor([[1., 1., 1., 1., 2., 2.],
        [2., 2., 3., 3., 3., 3.]])

In [104]:
t.reshape(6,2)

tensor([[1., 1.],
        [1., 1.],
        [2., 2.],
        [2., 2.],
        [3., 3.],
        [3., 3.]])

In [105]:
t.reshape(12,1)

tensor([[1.],
        [1.],
        [1.],
        [1.],
        [2.],
        [2.],
        [2.],
        [2.],
        [3.],
        [3.],
        [3.],
        [3.]])

In [106]:
# can change the rank!
t.reshape(2,2,3)

tensor([[[1., 1., 1.],
         [1., 2., 2.]],

        [[2., 2., 3.],
         [3., 3., 3.]]])

**Squeezing** allows to one to change the rank of tensor without even changing the shape (doesn't change rank)

In [107]:
print(t.reshape(1,12))
print(t.reshape(1,12).shape)

tensor([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.]])
torch.Size([1, 12])


In [108]:
print(t.reshape(1,12).squeeze())
print(t.reshape(1,12).squeeze().shape)

tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])
torch.Size([12])


In [109]:
print(t.reshape(1,12).squeeze().unsqueeze(dim=0))
print(t.reshape(1,12).squeeze().unsqueeze(dim=0).shape)

tensor([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.]])
torch.Size([1, 12])


**Flattening** removes all axes except one, making it a rank 1 tensor (change rank to 1)


In [110]:
def flatten(t):
    t = t.reshape(1,-1)
    t = t.squeeze()
    return t

In [111]:
#reshape and flatten (removes a set of brackets!)!
flatten(t)

tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])

In [112]:
#reshape only
t.reshape(1,12)

tensor([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.]])

In [113]:
#same thing as above
t.reshape(1,-1)

tensor([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.]])

**Concatenating** tensors (doesn't change rank)

In [114]:
t1 = torch.tensor([
    [1,2],
    [3,4]
])
t2 = torch.tensor([
    [5,6],
    [7,8]
])

#original shape
print(t1.shape)

torch.Size([2, 2])


In [115]:
#combine t1 and t2 row-wise (axis-0)
torch.cat((t1, t2), dim=0)

tensor([[1, 2],
        [3, 4],
        [5, 6],
        [7, 8]])

In [116]:
#combine them column-wise (axis-1) 
torch.cat((t1, t2), dim=1)

tensor([[1, 2, 5, 6],
        [3, 4, 7, 8]])

In [117]:
#axis 0 because listed first --> adding on axis 0 adds 2 to first number
torch.cat((t1, t2), dim=0).shape

torch.Size([4, 2])

In [118]:
#likewise for axis 1
torch.cat((t1, t2), dim=1).shape

torch.Size([2, 4])

### CNN Flatten Operations - Tensor Batch Processing



In [119]:
# three rank 2 tensors with shape (4,4)
t1 = torch.tensor([
    [1,1,1,1],
    [1,1,1,1],
    [1,1,1,1],
    [1,1,1,1]
])

t2 = torch.tensor([
    [2,2,2,2],
    [2,2,2,2],
    [2,2,2,2],
    [2,2,2,2]
])

t3 = torch.tensor([
    [3,3,3,3],
    [3,3,3,3],
    [3,3,3,3],
    [3,3,3,3]
])

In [120]:
#stack is a concat function
t = torch.stack((t1,t2,t3))
t.shape

torch.Size([3, 4, 4])

In [121]:
#rank 3 tensor
t

tensor([[[1, 1, 1, 1],
         [1, 1, 1, 1],
         [1, 1, 1, 1],
         [1, 1, 1, 1]],

        [[2, 2, 2, 2],
         [2, 2, 2, 2],
         [2, 2, 2, 2],
         [2, 2, 2, 2]],

        [[3, 3, 3, 3],
         [3, 3, 3, 3],
         [3, 3, 3, 3],
         [3, 3, 3, 3]]])

In [130]:
# as we've seen we change increase the rank of a tensor with the reshape operation
t = t.reshape(3,1,4,4)
t

tensor([[[[1, 1, 1, 1],
          [1, 1, 1, 1],
          [1, 1, 1, 1],
          [1, 1, 1, 1]]],


        [[[2, 2, 2, 2],
          [2, 2, 2, 2],
          [2, 2, 2, 2],
          [2, 2, 2, 2]]],


        [[[3, 3, 3, 3],
          [3, 3, 3, 3],
          [3, 3, 3, 3],
          [3, 3, 3, 3]]]])

That tensor has lots of information stored. For example if each original tensor (t1, t2, t3) were 4 by 4 pixel grayscale images, then this final tensor *t* would have information on (batch, channel, height, width)

Notice how the additional axis of length 1 as a channel doesn't change the number of elements in the tensor. This is because the product of the components values doesn't change when we multiply by one. 

#### The first axis has 3 elements. Each element of the first axis represents an image (an image in a batch). For each image, we have a single color channel on the channel axis (b/c it's grayscale). Each of these channels contain 4 arrays that contain 4 numbers or scalar components.

Observe this by indexing:

In [123]:
# the first image
t[0]

tensor([[[1, 1, 1, 1],
         [1, 1, 1, 1],
         [1, 1, 1, 1],
         [1, 1, 1, 1]]])

In [124]:
#first color channel in first image
t[0][0]

tensor([[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]])

In [125]:
# first row of the first color channel of the first image
t[0][0][0]

tensor([1, 1, 1, 1])

In [126]:
# first pixel value in the first row of the first color channel of the first image
t[0][0][0][0]

tensor(1)

In [127]:
# Flattens entire batch
t.flatten()

tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

In [128]:
# Leave batch tensor intact to maintain images seperate
t.flatten(start_dim=1).shape

torch.Size([3, 16])

In [129]:
# Flatten by the batch index (1)
t.flatten(start_dim=1)

tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
        [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])

   1. Reshaping operations
   **2. Element-wise operations**
   <br><br>3. Reduction operations
   <br>4. Access operations

## Element-wise Tensor Operation

An element-wise operation operates on corresponding elements between tensors.

In [131]:
t1 = torch.tensor([
    [1,2],
    [3,4]
], dtype=torch.float32)

t2 = torch.tensor([
    [9,8],
    [7,6]
], dtype=torch.float32)

In [133]:
# Example of the first axis
print(t1[0])

# Example of the second axis
print(t1[0][0])

SyntaxError: invalid syntax (<ipython-input-133-a6da4f3f8867>, line 2)