pytest benchmark reporting incorrect benchmark time

This is a bizarre issue.

The observed behavior is that, our pytest benchmark using `torch.profiler.profile` seems to be non-deterministically dropping events in consecutive benchmark runs.

In PR branch #3743, running backward benchmark as a whole generates numbers like this (on H100) 
running `NVFUSER_DISABLE=kernel_reuse pytest --benchmark-thunder test_rope.py -k bwd`

```
Name (time in us)                                                                    Mean                Median
-----------------------------------------------------------------------------------------------------------------------
test_rope_bwd_benchmark[executor='thunder'-variation='llama_2_7b_hf_rope']       871.5996 (5.23)       871.6050 (5.24)
test_rope_bwd_benchmark[executor='thunder'-variation='llama_3_8B_rope']        1,443.0095 (8.66)     1,442.9955 (8.67)
test_rope_bwd_benchmark[executor='thunder'-variation='hf_mistral_nemo_rope']     166.5515 (1.0)        166.4480 (1.0)
test_rope_bwd_benchmark[executor='thunder'-variation='hf_qwen2_rope']            386.4463 (2.32)       386.5565 (2.32)
test_rope_bwd_benchmark[executor='thunder'-variation='hf_phi3_rope']             452.3351 (2.72)       452.0685 (2.72)
-----------------------------------------------------------------------------------------------------------------------
```

In that example, if we comment out the other variants inside `benchmarks/python/test_rope.py` to run only hf_phi3 for example, we are getting numbers like

```
test_rope_bwd_benchmark[executor='thunder'-variation='hf_phi3_rope']           514.7512  514.4900
```

Further debugging went down here:
https://github.com/NVIDIA/Fuser/blob/8ea30c7f796452bd696fbda79327d8851823df2b/benchmarks/python/core.py#L156-L157

noticing that the benchmark discrepancy is coming from dropping cuda event in consecutive runs.
i.e. when we have 6 kernels running in the backward path, only 5 of them are recorded in the profiler.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pytest benchmark reporting incorrect benchmark time #3753

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	prof_averages = self.prof.key_averages()
	elapsed_cuda_time = self._get_kernel_time(prof_averages)

pytest benchmark reporting incorrect benchmark time #3753

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions