### Learning Pytorch
### Pytorch led by Facebook AI Research (FAIR) is a minimal library built off Torch as a python extension of Lua which can directly use CUDA GPUs for accelerated processing - and has good codeflow.


### The main modules of pytorch as as below and they are self explanatory in sense. 

<img src = "images/pytorch_modules.png">

<br><br>
- torch.nn = Base class for all neural network modules.
    - torch.nn.functional = functionality to blocks of network
<br><br>
- torch.Tensor = multi-dimensional matrix containing elements of a single data type.
<br><br>
- torch.autograd = provides classes and functions implementing automatic differentiation. A recorder records what operations have performed, and then it replays it backward to compute the gradients. This method is especially powerful when building neural networks to save time on one epoch by calculating differentiation of the parameters at the forward pass. 
<br><br>
- torch.utils.data = the heart of PyTorch data loading utility is the torch.utils.data.DataLoader class. It represents a Python iterable over a dataset.

#### Make sure you have an virutal env set up with anaconda, and conda installed pytorch, cudatoolkit, and jupyter.

### Verify the install


In [None]:
# Should succeed without hesitation
import torch

In [None]:
print(torch.__version__)

In [None]:
# Verify you have CUDA-supported NVIDIA GPU
torch.cuda.is_available()

In [None]:
# Should match the version of cudatoolkit installed
torch.version.cuda

### Sidenote: Why GPUs
For tasks such as mathematical computation, it is possbile to achieve solution through parallel computing because even complex equations can be broken down into smaller computations that are independent with respect to one another. GPUs are much more capable of parallel computation because of their massive core counts.

CUDA is a API library that has to be downloaded (like cudatoolkit) to enable this feature on NVIDIA GPUs.

Pytorch is verstile in that it can selectively hold mathematical objects such as tensors on different devices such as the GPU or CPU. 
#### Why not use a GPU for everything?
Moving simple computation into GPU will slow down code even more because the time it would take a modern CPU to complete the code can be done by the time the GPU receives the task. Remember GPUs are only amazing densely mathematical tasks that can be done in parallel.

In [None]:
# simply creating Pytorch tensor object
t = torch.Tensor([1,2,3])
t

In [None]:
# viewing GPU
device = torch.device('cuda:0')
device

In [None]:
# Moving tensor object to GPU
t = t.cuda()
t

In [None]:
print(t.dtype)
print(t.device)
print(t.layout)

### Tensors
Tensors, in short, is a general term to describe a representation of numbers which can have rank, axes, and shapes. Think about an matrix, but with more dimensionability. 

Number of dimensions present within a tensor. How many indices you need to access an element - for example the above tensor 't' has a rank of 1 because each element only needs one index for access like t[1]. Axes are measures of which elements can exist, and multiple axes create more dimensions for elements to exist. 

Shapes of tensors encode all relevant info about rank, axes and even indices. Tensors are constantly reshaped and transformed (which is why their named tensors). Tensors can represent the same underlying data but have different shapes. Think about it: if 'a' was a list of lengths of toothpicks (or something weird like that) in the code below, changing its shape wouldn't change those lengths of the the toothpicks.


Shapes can be more than 2 or 3 dimensions. For example, an image can typically have three dimensions represented by height, width, and RGB value of a pixel: [C,H,W]. However, in deep learning, more dimensions can be added, such as batch size (an amount of images in a subgroup). Now an image will have 4 dimensions and look like: [B,C,H,W]. So now as a rank 4 tensor we can navigate to a specific pixel in a specific image with a specific color. 

Features can be extracted from these values by transforming them with specific operations. See more later on this.


In [None]:
a = [
    [4,5,6],
    [7,8,9]
]

In [None]:
a[1][2]

In [None]:
type(a)

In [None]:
#list to tensors; different tensor calls create different tensor types but still same shape
b = torch.tensor(a)
c = torch.Tensor(a)
print(type(b), b.dtype, b)
print(type(c), c.dtype,  c)

In [None]:
b.shape

In [None]:
#reshaping a tensor
b.reshape(1,6)

In [None]:
b.reshape(1,6).shape

In [None]:
#remember that calling reshape doesn't actually change b as a variable, only for runtime not storage
b.shape

In [None]:
import numpy as np

In [None]:
data = np.array([1,2,3])
type(data)

#### Different ways to initiate a tensor and exploring their data types

In [None]:
#notice this is a different construction of a tensor then the next 4: Pytorch Class Constructor of a tensor  - unchanged by numpy array manipulation

t1 = torch.Tensor(data)
print(t1.dtype)

In [None]:
#factory function (accepts parameter inputs and outputs tensor objects) - unchanged by numpy array manipulation

t2 = torch.tensor(data)
print(t2.dtype)

In [None]:
#also a factory function --> changing the numpy array (variable 'data') will change these values

t3 = torch.as_tensor(data)
print(t3.dtype)

In [None]:
#also a factory function --> changing the numpy array (variable 'data') will change these values

t4 = torch.from_numpy(data)
print(t4.dtype)

### Recap
torch.Tensor(data) & torch.tensor(data) = **data is copied** but also different data types
<br>
torch.as_tensor(data) & torch.from_numpy(data) = **data is shared**(saves memory space) and same data types
<br><br>
torch.tensor() = best for everyday use
<br>
torch.as_tensor() = best for fine-tuning

In [None]:
# data type also is inferred by incoming data
print(torch.get_default_dtype()) 

In [None]:
#type inference - what you put in is what it becomes
a = torch.tensor(np.array([1,2,3]))
print(a)
a.dtype

In [None]:
a = torch.tensor(np.array([1.,2.,3.]))
print(a)
a.dtype

In [None]:
#explicitly set
a = torch.tensor(np.array([1,2,3]), dtype=torch.float64)
print(a)
a.dtype

#### Creating multi-dimensional tensors

In [None]:
torch.eye(2)

In [None]:
print(torch.zeros(3,2))
print(torch.zeros(3,2).dtype)

In [None]:
print(torch.ones(2,3))
print(torch.ones(2,3).dtype)

In [None]:
#between 0-1 and four decimals out
torch.rand(2,2)

# Main Tensor operation types:

   **1. Reshaping operations**
   <br><br>2. Element-wise operations
   <br>3. Reduction operations
   <br>4. Access operations


##    1. Reshaping operations 

In [None]:
# rank 2 tensor with two axes; one axis with length of 4 and made up of arrays,
#   the other axis has length of 3 and is made up of numbers
t = torch.tensor([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3]
], dtype=torch.float32)

In [None]:
t.size()

In [None]:
t.shape

In [None]:
# length of shape gives rank of a tensor
len(t.shape)

Find the number of elements in tensor product of dimensions or numel

In [None]:
torch.tensor(t.shape).prod()

In [None]:
t.numel()

All reshaping operations for this 3 by 4 tensor will always have a product of 12 b/c the elements (and # of elements) is unchanged

Sidenote: that's kinda why their called tensors. Tensor elements are unchanged when ungoing transformations.

In [None]:
t.reshape(1,12)

In [None]:
t.reshape(2,6)

In [None]:
t.reshape(6,2)

In [None]:
t.reshape(12,1)

In [None]:
# can change the rank!
t.reshape(2,2,3)

**Squeezing** allows to one to change the rank of tensor without even changing the shape (doesn't change rank)

In [None]:
print(t.reshape(1,12))
print(t.reshape(1,12).shape)

In [None]:
print(t.reshape(1,12).squeeze())
print(t.reshape(1,12).squeeze().shape)

In [None]:
print(t.reshape(1,12).squeeze().unsqueeze(dim=0))
print(t.reshape(1,12).squeeze().unsqueeze(dim=0).shape)

**Flattening** removes all axes except one, making it a rank 1 tensor (change rank to 1)


In [None]:
def flatten(t):
    t = t.reshape(1,-1)
    t = t.squeeze()
    return t

In [None]:
#reshape and flatten (removes a set of brackets!)!
flatten(t)

In [None]:
#reshape only
t.reshape(1,12)

In [None]:
#same thing as above
t.reshape(1,-1)

**Concatenating** tensors (doesn't change rank)

In [None]:
t1 = torch.tensor([
    [1,2],
    [3,4]
])
t2 = torch.tensor([
    [5,6],
    [7,8]
])

#original shape
print(t1.shape)

In [None]:
#combine t1 and t2 row-wise (axis-0)
torch.cat((t1, t2), dim=0)

In [None]:
#combine them column-wise (axis-1) 
torch.cat((t1, t2), dim=1)

In [None]:
#axis 0 because listed first --> adding on axis 0 adds 2 to first number
torch.cat((t1, t2), dim=0).shape

In [None]:
#likewise for axis 1
torch.cat((t1, t2), dim=1).shape

### CNN Flatten Operations - Tensor Batch Processing



In [None]:
# three rank 2 tensors with shape (4,4)
t1 = torch.tensor([
    [1,1,1,1],
    [1,1,1,1],
    [1,1,1,1],
    [1,1,1,1]
])

t2 = torch.tensor([
    [2,2,2,2],
    [2,2,2,2],
    [2,2,2,2],
    [2,2,2,2]
])

t3 = torch.tensor([
    [3,3,3,3],
    [3,3,3,3],
    [3,3,3,3],
    [3,3,3,3]
])

In [None]:
#stack is a concat function
t = torch.stack((t1,t2,t3))
t.shape

In [None]:
#rank 3 tensor
t

In [None]:
# as we've seen we change increase the rank of a tensor with the reshape operation
t = t.reshape(3,1,4,4)
t

That tensor has lots of information stored. For example if each original tensor (t1, t2, t3) were 4 by 4 pixel grayscale images, then this final tensor *t* would have information on (batch, channel, height, width)

Notice how the additional axis of length 1 as a channel doesn't change the number of elements in the tensor. This is because the product of the components values doesn't change when we multiply by one. 

#### The first axis has 3 elements. Each element of the first axis represents an image (an image in a batch). For each image, we have a single color channel on the channel axis (b/c it's grayscale). Each of these channels contain 4 arrays that contain 4 numbers or scalar components.

Observe this by indexing:

In [None]:
# the first image
t[0]

In [None]:
#first color channel in first image
t[0][0]

In [None]:
# first row of the first color channel of the first image
t[0][0][0]

In [None]:
# first pixel value in the first row of the first color channel of the first image
t[0][0][0][0]

In [None]:
# Flattens entire batch
t.flatten()

In [None]:
# Leave batch tensor intact to maintain images seperate
t.flatten(start_dim=1).shape

In [None]:
# Flatten by the batch index (1)
t.flatten(start_dim=1)

   1. Reshaping operations
  
   **2. Element-wise operations**
   <br><br>3. Reduction operations
   <br>4. Access operations

## Element-wise Tensor Operation

An element-wise operation operates on corresponding elements between tensors.

In [None]:
t1 = torch.tensor([
    [1,2],
    [3,4]
], dtype=torch.float32)

t2 = torch.tensor([
    [9,8],
    [7,6]
], dtype=torch.float32)

In [None]:
# Example of the first axis to call
print('first axis: ',t1[0])

# Example of the second axis to call
print('second axis: ',t1[0][0])

In [None]:
print(t1[0][0])
print(t2[0][0])

These are elements of a tensors. They can be called upon by their positions within the axes or dimensions within the tensor.

The correspondence of this tensor is defined by the indexes.

### All Arithmetic operations (add, subtract, multiply, divide) are Element-wise Operations

In [None]:
t1 + t2

In [None]:
t1 + 2

In [None]:
t2 - t1

In [None]:
t1 - 4

In [None]:
t1 * 2

In [None]:
 t1 / 2

In [None]:
t2 / t1

In [None]:
t1 * t2

In [None]:
# There are also built-in tensor object methods for arithmetic operations
print(t1.add(2),'\n')
print(t1.add(2),'\n')
print(t1.mul(2),'\n')
print(t1.div(2),'\n')

These examples are breaking the rule we established that said element-wise operations operate on tensors of the same shape. Scalar values are Rank-0 tensors, which means they have no shape, and our tensor t1 is a rank-2 tensor of shape 2 x 2.

So how does this fit in? Let's break it down.

The first solution that may come to mind is that the operation is simply using the single scalar value and operating on each element within the tensor. This logic kind of works. However, it's a bit misleading, and it breaks down in more general situations where we're note using a scalar.

To think about these operations differently, we need to introduce the concept of tensor broadcasting or broadcasting.
### Broadcasting tensors 
 Broadcasting is the concept whose implementation allows us to add scalars to higher dimensional tensors. 

In [None]:
np.broadcast_to(2, t1.shape)

In [None]:
t1 

In [None]:
# broadcasting is under the hood
t1+2

In [None]:
t1 + torch.tensor(
    np.broadcast_to(2, t1.shape),
    dtype=torch.float32
    )

Broadcasting is basically making the shapes the same for compatibility of the operation. It can even work for other tensors, not just scalars!

In [None]:
#rank 1 shape=[2, 2]
t1 = torch.ones([2,2], dtype=torch.float32)
t1

In [None]:
#rank2 shape=[2]
t2 = torch.tensor([2,4], dtype=torch.float32)
t2

In [None]:
#same thing
np.broadcast_to(t2.numpy(),[2,2]) 
np.broadcast_to(t2.numpy(),t1.shape) 

In [None]:
#tensor broadcasting to perform operations
t1 + t2

### Comparison operations are also element-wise operations.

For a given comparison operation between two tensors, a new tensor of the same shape is returned with each element containing either a torch.bool value of *True or False*. 

In [None]:
t = torch.tensor([
    [0,5,0],
    [6,0,7],
    [0,8,0]
], dtype=torch.float32)

In [None]:
t.eq(0)

In [109]:
t.ge(0)

tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

In [110]:
t.gt(0)

tensor([[False,  True, False],
        [ True, False,  True],
        [False,  True, False]])

In [111]:
t.le(0)

tensor([[ True, False,  True],
        [False,  True, False],
        [ True, False,  True]])

In [112]:
t.lt(0)

tensor([[False, False, False],
        [False, False, False],
        [False, False, False]])

In [113]:
# under the hood**
t <= torch.tensor(
    np.broadcast_to(7, t.shape)
    ,dtype=torch.float32
)

tensor([[ True,  True,  True],
        [ True,  True,  True],
        [ True, False,  True]])

In [115]:
# same thing (equivilant tensor under the hood
t <= torch.tensor([
    [7,7,7],
    [7,7,7],
    [7,7,7]
], dtype=torch.float32)

tensor([[ True,  True,  True],
        [ True,  True,  True],
        [ True, False,  True]])

With element-wise operations that are functions, it's fine to assume that the function is applied to each element of the tensor. 

In [116]:
t.abs()

tensor([[0., 5., 0.],
        [6., 0., 7.],
        [0., 8., 0.]])

In [117]:
t.sqrt()

tensor([[0.0000, 2.2361, 0.0000],
        [2.4495, 0.0000, 2.6458],
        [0.0000, 2.8284, 0.0000]])

In [118]:
t.neg()

tensor([[-0., -5., -0.],
        [-6., -0., -7.],
        [-0., -8., -0.]])

   1. Reshaping operations
   2. Element-wise operations
   **3. Reduction operations**
   <br>
   <br>
   4. Access operations

## Reduction Operations

A reduction operation on a tensor is an operation that reduces the number of elements contained within the tensor.
<br>
Tensors give us the ability to manage our data.
<br>
<br>
Reshaping operations gave us the ability to position our elements along particular axes. Element-wise operations allow us to perform operations on elements between two tensors, and **reduction operations allow us to perform operations on elements within a single tensor.**

In [120]:
t = torch.tensor([
    [0,1,0],
    [2,0,2],
    [0,3,0]
], dtype=torch.float32)

In [121]:
t.sum()

tensor(8.)

In [123]:
# get NUMber of ELements in a tensor
t.numel()

9

In [125]:
t.sum().numel()

1

In [128]:
t.sum().numel() < t.numel()

True

Since the number of elements have been reduced by the operation, we can conclude that the sum() method is a reduction operation.

**Other Common tensor reduction operations**:

In [129]:
t.sum()

tensor(8.)

In [130]:
t.prod()

tensor(0.)

In [131]:
t.mean()

tensor(0.8889)

In [132]:
t.std()

tensor(1.1667)

*Friendly Reminder<br>
To calculate the standard deviation of those numbers:*

    1. Work out the Mean (the simple average of the numbers)
    2. Then for each number: subtract the Mean and square the result
    3. Then work out the mean of those squared differences.
    4. Take the square root of that and we are done!
But...<br>
  Do reduction operations always reduce to a tensor with a single element? 
  <br>NO<br><br>
#### We can reduce specific axes at a time.<br>
It's just like we saw with reshaping when we aimed to flatten the image tensors within a batch while still maintaining the batch axis. 

In [136]:
#3 x 4 rank-2 tensor
t = torch.tensor([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3]
], dtype=torch.float32)

In [137]:
t.sum(dim=0)

tensor([6., 6., 6., 6.])

In [138]:
t.sum(dim=1)

tensor([ 4.,  8., 12.])

Confused?? Let's go over what happened.

### Understanding reductions by axes

We'll tackle the first axis first. When take the summation of the first axis, we are summing the elements of the first axis. 

In [158]:
t[0]

tensor([1., 1., 1., 1.])

In [141]:
t[1]

tensor([2., 2., 2., 2.])

In [142]:
t[2]

tensor([3., 3., 3., 3.])

In [145]:
t[0] + t[1]

tensor([3., 3., 3., 3.])

In [146]:
t[0] + t[1] + t[2]

tensor([6., 6., 6., 6.])

### Reduction by axes happens through element-wise operations, and specially by broadcasting tensors upon each other.

In [151]:
t[0].sum()

tensor(4.)

In [152]:
t[1].sum()

tensor(8.)

In [153]:
t[2].sum()

tensor(12.)

In [155]:
t.sum(dim=1)

tensor([ 4.,  8., 12.])

### Argmax tensor reduction operation 
**Argmax returns the index location of the maximum value inside a tensor.**

When we call the <code>argmax()</code> method on a tensor, the tensor is reduced to a new tensor that contains an index value indicating where the max value is inside the tensor.

In [159]:
t = torch.tensor([
    [1,0,0,2],
    [0,3,3,0],
    [4,0,0,5]
], dtype=torch.float32)

In [160]:
t.max()

tensor(5.)

In [161]:
t.argmax()

tensor(11)

In [163]:
t.flatten()

tensor([1., 0., 0., 2., 0., 3., 3., 0., 4., 0., 0., 5.])

#### If we don't specific an axis to the <code>argmax()</code> method, it returns the index location of the max value from the flattened tensor, which in this case is indeed 11. 

In [164]:
t.max(dim=0)

torch.return_types.max(
values=tensor([4., 3., 3., 5.]),
indices=tensor([2, 1, 1, 2]))

In [165]:
t.argmax(dim=0)

tensor([2, 1, 1, 2])

In [166]:
t.max(dim=1)

torch.return_types.max(
values=tensor([2., 3., 5.]),
indices=tensor([3, 1, 3]))

In [167]:
t.argmax(dim=1)

tensor([3, 1, 3])

Notice how the call to the **max() method returns two tensors**. The first tensor contains the max values and the second tensor contains the index locations for the max values. This is what argmax gives us. <br><br>
*First Tensor*: For the first axis, the max values are, 4, 3, 3, and 5. These values are determined by taking the **element-wise maximum across each array running across the first axis.** <br><br>
*Second Tensor*: For the second axis, the max values are 2, 3, and 5. These values are determined by taking the maximum inside each array of the first axis. We have three groups of four, which gives us 3 maximum values. 


<br>The argmax values here, tell the index inside each respective array where the max value lives.

### In practice, we often use the argmax() function on a network's output prediction tensor, to determine which category has the highest prediction value. 

   1. Reshaping operations
   2. Element-wise operations
   3. Reduction operations
   <br>
   <br>
   **4. Access operations**
## Accessing Elements Inside Tensors

In [168]:
t = torch.tensor([
    [1,2,3],
    [4,5,6],
    [7,8,9]
], dtype=torch.float32)

In [169]:
t.mean()

tensor(5.)

In [170]:
t.mean().item()

5.0

In [171]:
t.mean(dim=0).tolist()

[4.0, 5.0, 6.0]

In [172]:
t.mean(dim=0).numpy()

array([4., 5., 6.], dtype=float32)

We can **access** the numeric values by transforming the output tensor into a Python list or a NumPy array.