Skip to content

pytest benchmark reporting incorrect benchmark time #3753

@jjsjann123

Description

@jjsjann123

This is a bizarre issue.

The observed behavior is that, our pytest benchmark using torch.profiler.profile seems to be non-deterministically dropping events in consecutive benchmark runs.

In PR branch #3743, running backward benchmark as a whole generates numbers like this (on H100)
running NVFUSER_DISABLE=kernel_reuse pytest --benchmark-thunder test_rope.py -k bwd

Name (time in us)                                                                    Mean                Median
-----------------------------------------------------------------------------------------------------------------------
test_rope_bwd_benchmark[executor='thunder'-variation='llama_2_7b_hf_rope']       871.5996 (5.23)       871.6050 (5.24)
test_rope_bwd_benchmark[executor='thunder'-variation='llama_3_8B_rope']        1,443.0095 (8.66)     1,442.9955 (8.67)
test_rope_bwd_benchmark[executor='thunder'-variation='hf_mistral_nemo_rope']     166.5515 (1.0)        166.4480 (1.0)
test_rope_bwd_benchmark[executor='thunder'-variation='hf_qwen2_rope']            386.4463 (2.32)       386.5565 (2.32)
test_rope_bwd_benchmark[executor='thunder'-variation='hf_phi3_rope']             452.3351 (2.72)       452.0685 (2.72)
-----------------------------------------------------------------------------------------------------------------------

In that example, if we comment out the other variants inside benchmarks/python/test_rope.py to run only hf_phi3 for example, we are getting numbers like

test_rope_bwd_benchmark[executor='thunder'-variation='hf_phi3_rope']           514.7512  514.4900

Further debugging went down here:

prof_averages = self.prof.key_averages()
elapsed_cuda_time = self._get_kernel_time(prof_averages)

noticing that the benchmark discrepancy is coming from dropping cuda event in consecutive runs.
i.e. when we have 6 kernels running in the backward path, only 5 of them are recorded in the profiler.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions