## 🔮 Deep Learning Frameworks

$\text{In Supervised Machine Learning:}$

- We have a dataset $(x,y)$ representing the true target function $f_t(x)$ (i.e., $y=f_t(x)$)

- Any machine learning model is equivalent to some hypothesis set represented by mathematical function $f(x;θ)$

- The objective of training is to learn the optimal value of the parameters $θ$ (i.e., find $θ^*$ such that $f_{θ^*}(x)≈f_t(x)$)

- Now use $f_{θ^*}(x)$ on real examples; it's a great approximate of $f_t(x)$

Hence, mathematically, supervised machine learning is just function approximation!

$\text{In Deep Learning :}$


Same concept exactly applies except that $f(x;θ)$ is a matrix function that may involve tons of matrix multiplications (one for each layer).

**Awesome Numpy can handle matrix and nd-array operations quite well.**

- Not so fast, because it can only do so on the CPU 

- If $f(x;θ)$ is has more matrices (deeper network), it will take much slower

- Meanwhile, GPUs were designed for graphical processing which as well involves tons of matrix multiplications

  - They are composed of many (e.g., 1000s) cores each with much smaller clock rate and a VRAM to store the computations (pixels for the screen) supported by higher memory bandwidth and higher memory clock rate.

  - 💡 Idea: use the GPU for the matrix computations (but ignore the screen-related facets)

  - 🚨 Problem: Numpy operates its array operations on the GPU

  - 💡 Solution: Deep Learning Frameworks provide this and many other features!

## Tensors

- `PyTorch` array type just like `Numpy` arrays but can run on the GPU

- Other distinction from Numpy arrays is that they support differentiation (will see why later)

 - It follows that most if not all operations we can perform in Numpy have their equivalents in PyTorch

Code in the following is adapted from PyTorch's official tutorial and [this article](https://medium.com/codex/tensor-basics-in-pytorch-252a34288f2).

In [1]:
import torch
import numpy as np

### 1. Initializing Tensors

In [2]:
import torch
import numpy as np

# from a list
z = torch.tensor([[1, 2],[3, 4]]) 

# from a Numpy array 
z = torch.tensor(np.array(z))

# an empty multi-dimensional (2 * 3 * 2) tensor
z = torch.empty(2, 3, 2)

# a 1*12 vector
z = torch.arange(12)

# a random 1*2 matrix 
z = torch.rand(1, 2)

# a random 1*2 matrix drawn from Gaussian distribution 
z = torch.randn(1, 2)

# a zero-filled 1*2 matrix 
z = torch.zeros(1, 2)

# a 1*2 matrix filled with only 1
z = torch.ones(1, 2)

# Specifying the type of the elements of the tensor
z = torch.ones(2, 2, dtype=torch.int)

# From another tensor
z = torch.zeros_like(z)

z

tensor([[0, 0],
        [0, 0]], dtype=torch.int32)

### 2. Attributes of a Tensor

In [3]:
Z = torch.rand(3,4)

print(f"Shape of tensor: {Z.shape}")
print(f"Datatype of tensor: {Z.dtype}")
print(f"Number of parameters: {Z.numel()}")      
print(f"Device tensor is stored on: {Z.device}")                   
print(f"Requires Gradient: {Z.requires_grad}")
print(f"Gradient: {Z.grad}")

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Number of parameters: 12
Device tensor is stored on: cpu
Requires Gradient: False
Gradient: None


### 3. Operations

Also mostly just like Numpy in indexing/slicing, masking, element-wise operations, mathematical operations, broadcasting, aggregation, etc. 

Let's explore some differences:

##### Reshaping:

In [20]:
### tensor.view is like tensor.reshape but assumes it can reshape without changing memory layout
x = Z.view(12)
x = Z.view(6, -1)      
print(x)
assert torch.allclose(x, Z.reshape(6, -1))

tensor([[0.4513, 0.6858],
        [0.3407, 0.5679],
        [0.0326, 0.5236],
        [0.8209, 0.5186],
        [0.2622, 0.8980],
        [0.5572, 0.8952]])


In Numpy, it `np.view` can only help change type of input.

##### Slicing

In [5]:
x = torch.rand(5, 3)

col_0 = x[:, 0]         # like Numpy
row_0 = x[0, :]         # like Numpy

elem_1_2 = x[1, 2]      # returns a Tensor!
elem_1_2.item()         # .item() converts to Python scalar    

0.9873909950256348

`axis` usually called `dim`

In [6]:
x1 = torch.rand(5, 3)
x2 = x = torch.rand(5, 3)

x12 = torch.cat((x1, x2), dim=0)            # also called concatenate in Numpy
x12.shape

torch.Size([10, 3])

Alias for doing operations in-place:

In [36]:
a = torch.tensor([1, 2, 3, 4])
a.add_(20)    #torch.add(a, 20, out=a)
a

tensor([21, 22, 23, 24])

Transpose

In [7]:
Count = torch.tensor([1, 2, 3, 4])
torch.save(Count, 'Count.pt')                   # np.save('Count.npy', a)
Count_revived = torch.load('Count.pt')

Type casting

In [86]:
Count.float()           # Count.astype(float) in Numpy

tensor([1., 2., 3., 4.])

In general, you can assume they follow the same syntax and search up quickly when an error occurs.

#### 4. GPU Support

In [45]:
# Creating a PyTorch tensor on GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

z = torch.tensor([[1, 2, 3], [4, 5, 6]], device=device)
print("z lives on", Z.device)

x_gpu = x.to(device)
print("x_gpu lives on", x_gpu.device, "and x lives on", x.device)

z lives on cpu
x_gpu lives on cpu and x lives on cpu


- Conversion to Numpy is possible: if only on CPU

In [10]:
z.cpu().numpy()                        

array([[0, 0],
       [0, 0]], dtype=int32)

#### 5. Storing Gradients 

Consider
$$Z_{1×1}=Y_{1×1}^2$$
where
$$Y_{1×1} = X_{1×4}^TW_{4×1}$$

We can implement this with:

In [11]:
import torch

# Define the tensors with actual values
x = torch.tensor([2.0, 3.0, 4.0, 5.0], requires_grad=True)
w = torch.tensor([1.0, 0.0, -1.0, 2.0], requires_grad=True)

# Compute the equations:
y = torch.dot(x, w)                 # automatically inherits requires_grad=True
z = y ** 2


Now suppose we want to find $\frac{\partial Z}{\partial W}$ then by chain rule we have:

$$\frac{\partial Z}{\partial W} = \frac{\partial Z}{\partial Y} * \frac{\partial Y}{\partial W} $$

Clearly, 
$$\frac{\partial Z}{\partial Y}=2Y \quad \text{and} \quad \frac{\partial Y}{\partial W}=X \quad \text{thus,} \quad \frac{\partial Z}{\partial W}=2*Y*W$$

In [77]:
მzⳆმw = 2 * y * x
მzⳆმw

tensor([32., 48., 64., 80.], grad_fn=<MulBackward0>)

Well, PyTorch can automatically do this for us!

In [12]:
z.backward()            # compute მzⳆმ☘️ for each ☘️ (leaf) in z's computational graph (let's draw it)

print(x.grad, w.grad)

tensor([ 16.,   0., -16.,  32.]) tensor([32., 48., 64., 80.])


**As we will see later** this (automatic differentiation) is the most fantastic feature provided by deep learning frameworks to deep learning engineers and researchers.

- Can be turned off

In [96]:
x = torch.tensor([2.0, 3.0, 4.0, 5.0], requires_grad=True)
w = torch.tensor([1.0, 0.0, -1.0, 2.0], requires_grad=True)

with torch.no_grad():       # sets all of the requires_grad flags to false for operations in the scope
    y = torch.dot(x, w)
    z = y ** 2

# z.backward() error!
print(y.requires_grad) 
print(y.grad)  

False
None


- Sometimes must be turned off

In [101]:
with torch.no_grad():
    w_numpy = w.cpu().numpy()

# alternatively:
w.detach().cpu().numpy()               # Numpy conversion: if it requires gradient need to detach it from the graph first!

<class 'numpy.ndarray'>


**We covered in this notebook:**

- Why GPUs are important for deep learning

- How deep learning frameworks (i.e., PyTorch) solved Numpy's lack for such feature

- Initialization, Attributes and Operations over PyTorch tensors

- GPU and automatic differentiation support!