A visualization tool to investigate bottlenecks of the computational graph in PyTorch.
This tool embeds NVIDIA profiling data into the execution graph of the model. Profiling data, such as:
- Kernel latency (a.k.a. duration)
- Kernel memory and computation throughput usage
- Tensor shapes
The main feature present in this tool is to correlate NVIDIA NSight Compute
profiled low-level kernels (e.g. volta_sgemm_XXXX
) with PyTorch high-level
operations (e.g. torch.bmm
).
torch-graph-visualizer
works in 2 steps:
- profiling with NVTX annotations
- drawing
The tool assumes that both steps have access to the same model (same optimizations applied). So, when drawing, you should provide the same (optimized) model. In order to make things easier, this tool provides basic boilerplate for using PyTorch JIT options:
torch_graph_visualizer.run_model
: uses NVTX to annotate and run the modeltorch_graph_visualizer.default_draw_model
: parses the profiling data, assuming a set of NVTX annotations
See this file for an example using VGG.