<a href="https://colab.research.google.com/github/AndreSlavescu/EasyAI/blob/main/MLSystemsGroup_Lecture1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lecture 1: Introduction to PyTorch and Graph Compilation

## Basics of Model Definitions with 'nn.Module'

### Model Forward Pass

In PyTorch, defining models is very simple. You can create a model by subclassing `nn.Module` and defining the layers and forward pass.

### Model Backward Pass, Powered by Autodiff Semantics

PyTorch uses a technique called automatic differentiation (autodiff) to automatically compute gradients. This part of model definition is abstracted away from the developer, however this is a necessary component for training models as will be seen in future lectures.
 
### How Autodiff Works

Under the hood, PyTorch builds a dynamic computational graph as operations are performed on tensors. Each node in this graph represents a tensor, and edges represent the operations that produce the output tensors from input tensors.

### Explicit Backward Calls

When you call `.backward()` on a tensor, PyTorch traverses the previously mentioned dynamic computational graph from the output tensor back to the input tensors, computing gradients along the way. These gradients are then stored in the `.grad` attribute of the tensors, which can be used to update the model parameters.

In [1]:
"""
Simple Definition of a Neural Network
"""

import torch
import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.linear1 = nn.Linear(10, 5)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(5, 2)

    def forward(self, x):
        x = self.relu(self.linear1(x))
        x = self.linear2(x)
        return x

x = torch.randn(1, 10)
model = SimpleNet()
output = model(x)
print(output)

tensor([[ 0.5874, -0.0648]], grad_fn=<AddmmBackward0>)


## Representation of the Compute Graph

### The 'fx.Graph'

The `fx.Graph` in PyTorch is a tool that helps you see and understand the sequence of operations in your model. It creates a visual representation of the model's computation steps, making it easier to debug and optimize.

### Symbolic Tracing with `torch.fx`

PyTorch provides a module called `torch.fx` that allows for symbolic tracing of the computation graph. Symbolic tracing captures the operations performed on tensors and represents them in a graph structure. This can be useful for debugging, optimization, and understanding the flow of data through the model.

`torch.fx` works by recording the operations as they are executed and creating a graph representation of these operations. This graph can then be analyzed, transformed, and optimized. The `symbolic_trace` function is used to perform the tracing.

In [None]:
"""
Comparing the FX Graphs with and without torch.no_grad

Analysis:
  The `torch.no_grad` context manager is used to disable gradient calculation, 
  which can significantly speed up inference. This is particularly useful when
  you are only performing forward passes through the network and do not need to
  compute gradients or perform backpropagation. 
  
  It is important to note, when using torch.no_grad, the fx.Graph representation
  doesn't actually change!
"""

from torch.fx import symbolic_trace
import time

time_average_no_grad = 0
time_average_with_grad = 0
iters = 100

for _ in range(iters):
  with torch.no_grad():
      x = torch.randn(1, 10)
      start_no_grad = time.time()
      output_no_grad = model(x)
      end_no_grad = time.time()
      time_average_no_grad += end_no_grad - start_no_grad

  x = torch.randn(1, 10)
  start_with_grad = time.time()
  output_with_grad = model(x)
  end_with_grad = time.time()
  time_average_with_grad += end_with_grad - start_with_grad

print(f'Time with no_grad: {round(time_average_no_grad / iters, 6)} seconds')
print(f'Time with grad: {round(time_average_with_grad / iters, 6)} seconds')

with torch.no_grad():
    traced_model_no_grad = symbolic_trace(model)

traced_model_with_grad = symbolic_trace(model)

print("\nGraph with torch.no_grad:")
print(traced_model_no_grad.graph)

print("\nGraph without torch.no_grad:")
print(traced_model_with_grad.graph)

Time with no_grad: 5.3e-05 seconds
Time with grad: 0.000113 seconds

Graph with torch.no_grad:
graph():
    %x : [num_users=1] = placeholder[target=x]
    %linear1 : [num_users=1] = call_module[target=linear1](args = (%x,), kwargs = {})
    %relu : [num_users=1] = call_module[target=relu](args = (%linear1,), kwargs = {})
    %linear2 : [num_users=1] = call_module[target=linear2](args = (%relu,), kwargs = {})
    return linear2

Graph without torch.no_grad:
graph():
    %x : [num_users=1] = placeholder[target=x]
    %linear1 : [num_users=1] = call_module[target=linear1](args = (%x,), kwargs = {})
    %relu : [num_users=1] = call_module[target=relu](args = (%linear1,), kwargs = {})
    %linear2 : [num_users=1] = call_module[target=linear2](args = (%relu,), kwargs = {})
    return linear2


# Profiling

## Torch Profiler

The `torch.profiler` module enables you to collect detailed information about the execution of your model, including CPU and GPU activities, memory usage, and operator-level statistics. This information can insight into many areas of improvement for a given model you create.

## Trace

To view your model's trace in a very detailed manner, make sure to use the chrome trace tool. By doing this, you will see exactly how the operators are dispatched over time, and where the more granular bottlenecks are.


In [3]:
"""
Looking at dispatched operators with trace.json
"""

import torch.profiler

with torch.no_grad():
    with torch.profiler.profile(
        activities=[torch.profiler.ProfilerActivity.CPU],
        record_shapes=True,
        profile_memory=True
    ) as prof_no_grad:
        output_no_grad = model(x)

prof_no_grad.export_chrome_trace("trace_no_grad.json")

with torch.profiler.profile(
    activities=[torch.profiler.ProfilerActivity.CPU],
        record_shapes=True,
        profile_memory=True
) as prof_with_grad:
    output_with_grad = model(x)

prof_with_grad.export_chrome_trace("trace_with_grad.json")

# View Trace

Visit:

[chrome://tracing/](chrome://tracing)

to view the trace.