# Profiling pytorch models

https://pytorch.org/tutorials/beginner/profiler.html

Identifies time and memory costs of pytorch operations. can be printed as a table or returned in JSON

In [3]:
import torch
import numpy as np
from torch import nn
import torch.autograd.profiler as profiler

define a simple model

In [4]:
class MyModule(nn.Module):
  def __init__(self,
               in_features: int,
               out_features: int,
               bias: bool = True):
    super(MyModule, self).__init__()
    self.linear = nn.Linear(in_features, out_features, bias)
    
  def forward(self, inputs, mask):
    with profiler.record_function("LINEAR PASS"):
      out = self.linear(inputs)
    
    with profiler.record_function("MASK INDICES"):
      threshold = out.sum(axis=1).mean().item()
      hi_idx = np.argwhere(mask.cpu().numpy() > threshold)
      hi_idx = torch.from_numpy(hi_idx).cuda()

    return out, hi_idx

## profile forward pass

initialize input, mask, and model

warm-up cuda (for accuracy), then do forward pass in context manager

`with_stack=True` - appends file and line number of operation in trace

In [5]:
model = MyModule(100, 10).cuda()
inputs = torch.rand(32, 100).cuda()
mask = torch.rand((100, 100, 100), dtype=torch.double).cuda()

# warm up 
model(inputs, mask)

with profiler.profile(with_stack=True, profile_memory=True) as prof:
  out, idx = model(inputs, mask)

## Printing profiler results

`profiler.key_averages`- agg results by name, input shapes, and/or stack traces. 

Use grou_by_stack_n = 5 to aggregate by operation and traceback

In [6]:
print(prof.key_averages(group_by_stack_n=5).table(sort_by='self_cpu_time_total', row_limit=5))

-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------------------------------------  
                         Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls  Source Location                                                              
-----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------------------------------------  
                  aten::zeros        90.70%     224.923ms        90.80%     225.173ms     225.173ms           4 b           0 b           0 b           0 b             1  ..._3/lib/python3.7/site-packages/torch/autograd/profiler.py(611): __init__  
   