# Pytorch profiler
We will setup a simple resnet model and use it to explore the profiler.

## Steps
1. Import all necessary libraries
2. Instantiate a simple Resnet model
3. Using profiler to analyze execution time
4. Using profiler to analyze memory consumption
5. Using tracing functionality
6. Examining stack traces
7. Visualizing data as a flamegraph
8. Using profiler to analyze long-running jobs

###  1,2. Import libraries and instantiate resnet


In [2]:
import torch
import torchvision.models as models
from torch.profiler import profile, record_function, ProfilerActivity

model = models.resnet18()
inputs = torch.randn(5,3,224,244)

### 3. Using the profiles to analyze execution time
PyTorch profiler is enabled through the context manager and accepts various parameters such as:
- activities
  - ProfilerActivity.CPU
  - ProfilerActivity.CUDA
- record shapes - record shapes of operator inputs
- profile memory - report memory consumed by model's tensors
- use_cuda - measure execution time of CUDA kernels

Note that we can use `record_function` context manager to label abritrary code ranges with user provided names. `model_inference` is the label below.

Profiler allows us to check which operators were called during executioni of a code range wrapped with a profiler context manager.

In [3]:
with profile(activities=[ProfilerActivity.CPU], record_shapes=True) as prof:
    with record_function("model_inference"):
        model(inputs)

In [4]:
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))

---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                             Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  
---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                  model_inference         2.71%       5.570ms        98.72%     202.889ms     202.889ms             1  
                     aten::conv2d         0.08%     170.000us        64.62%     132.821ms       6.641ms            20  
                aten::convolution         0.09%     178.000us        64.54%     132.651ms       6.633ms            20  
               aten::_convolution         0.15%     309.000us        64.45%     132.473ms       6.624ms            20  
         aten::mkldnn_convolution        64.17%     131.884ms        64.30%     132.164ms       6.608ms            20  
                 aten::batch_norm       

To get a finer granularity we can pass `group_by_input_shape=True` (this requres running the profiler with `record_shapes=True` too)

In [5]:
print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_time_total", row_limit=10))

---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  --------------------------------------------------------------------------------  
                             Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls                                                                      Input Shapes  
---------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  --------------------------------------------------------------------------------  
                  model_inference         2.71%       5.570ms        98.72%     202.889ms     202.889ms             1                                                                                []  
                     aten::conv2d         0.02%      32.000us        15.52%      31.889ms       7.972ms             4                             [[5, 64, 56, 61], [64, 64, 3, 3], [], [], [], 

We can also use the rpofiler to analyse GPU based performance:

In [6]:
model = models.resnet18().cuda()
inputs = torch.randn(5, 3, 224, 224).cuda()

with profile(activities=[
        ProfilerActivity.CPU, ProfilerActivity.CUDA], record_shapes=True) as prof:
    with record_function("model_inference"):
        model(inputs)

print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))

-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                        model_inference         0.66%       2.617ms        99.99%     396.822ms     396.822ms       0.000us         0.00%       7.691ms       7.691ms             1  
                                           aten::conv2d         0.03%     120.000us        96.88%     384.472ms      19.224ms       0.000us         0.00%       5.861ms     293.050us            20  
         

# More information
More information on profiling memory, stack traces and more can be found in the [tutorial](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html#using-profiler-to-analyze-memory-consumption)