Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update to triton 2.0 backend #307

Merged
merged 14 commits into from
Mar 15, 2023
Merged

feat: update to triton 2.0 backend #307

merged 14 commits into from
Mar 15, 2023

Conversation

pommedeterresautee
Copy link
Member

@pommedeterresautee pommedeterresautee commented Mar 4, 2023

Triton 2.0 requires some change in the attention kernel
This is a first PR to just make things work
A second PR should focus on time optimization.
Moreover an issue is opened here triton-lang/triton#1273
It follows our findings for this PR (Triton being too sensitive to small code change without any semantic impact)

A bunch of LoC are kept in comments -> it's intended, to ease work to reintroduce some optimizations in the future.

@pommedeterresautee pommedeterresautee self-assigned this Mar 4, 2023
@pommedeterresautee pommedeterresautee added the dependencies Pull requests that update a dependency file label Mar 4, 2023
@github-actions github-actions bot added feature and removed feature labels Mar 4, 2023
@pommedeterresautee
Copy link
Member Author

pommedeterresautee commented Mar 4, 2023

e2e tests pass

❯ pytest test/test_torchdynamo.py
==================================================================================================== test session starts =====================================================================================================
platform linux -- Python 3.9.16, pytest-7.2.2, pluggy-1.0.0
rootdir: /mnt/workspace/kernl
plugins: anyio-3.6.2
collected 270 items                                                                                                                                                                                                          

test/test_torchdynamo.py ............................................................................................................................................................................................. [ 70%]
.................................................................................                                                                                                                                      [100%]
test/test_torchdynamo.py::test_benchmark_implementations[baseline-16x128-bert-base-uncased]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)    Median        Mean           Min            Max
-----------------------------------------------------------------  ---------------  -------------  -------------  ------------  ------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-16x128-bert-base-uncased]  13.0458 (1.0)    13.0466 (1.0)  13.0386 (1.0)  13.055 (1.0)  12.331 (1.0)  12.4391 (1.0)  12.1413 (1.0)  12.9633 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-16x128-t5-small]
Name                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
--------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-16x128-t5-small]  13.2157 (1.0)    13.2647 (1.0)  13.0824 (1.0)  13.6212 (1.0)  13.5948 (1.0)  13.8216 (1.0)  13.5034 (1.0)  15.0793 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-16x16-bert-base-uncased]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean           Min           Max
----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  -------------  ------------  -------------
test_benchmark_implementations[baseline-16x16-bert-base-uncased]  9.557 (1.0)      9.5833 (1.0)   9.4814 (1.0)  9.7516 (1.0)  9.9677 (1.0)  10.0233 (1.0)  9.8495 (1.0)  10.5679 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-16x16-t5-small]
Name                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-16x16-t5-small]  13.9481 (1.0)    13.9915 (1.0)  13.3161 (1.0)  14.7417 (1.0)  14.1747 (1.0)  14.4041 (1.0)  13.8712 (1.0)  14.9938 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-16x256-bert-base-uncased]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-----------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-16x256-bert-base-uncased]  22.5712 (1.0)    22.8765 (1.0)  22.5464 (1.0)  23.4242 (1.0)  22.7896 (1.0)  23.0505 (1.0)  22.7823 (1.0)  23.8115 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-16x256-t5-small]
Name                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
--------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-16x256-t5-small]  22.2698 (1.0)    22.625 (1.0)   22.2607 (1.0)  23.2028 (1.0)  22.6186 (1.0)  22.6694 (1.0)  22.5213 (1.0)  22.8926 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-16x32-bert-base-uncased]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  -------------
test_benchmark_implementations[baseline-16x32-bert-base-uncased]  8.7081 (1.0)     8.7476 (1.0)   8.5883 (1.0)  8.9961 (1.0)  9.1279 (1.0)  9.2731 (1.0)  9.0018 (1.0)  10.2293 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-16x32-t5-small]
Name                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-16x32-t5-small]  13.5926 (1.0)    13.8477 (1.0)  13.5025 (1.0)  14.3749 (1.0)  14.0607 (1.0)  14.3911 (1.0)  13.5767 (1.0)  15.8427 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-16x33-bert-base-uncased]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  -------------
test_benchmark_implementations[baseline-16x33-bert-base-uncased]  9.2006 (1.0)     9.1891 (1.0)   8.9303 (1.0)  9.5631 (1.0)  9.2021 (1.0)  9.3719 (1.0)  9.0037 (1.0)  10.4695 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-16x33-t5-small]
Name                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-16x33-t5-small]  14.4865 (1.0)    14.9282 (1.0)  14.3585 (1.0)  16.7721 (1.0)  13.8655 (1.0)  14.0286 (1.0)  13.5898 (1.0)  15.2448 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-16x384-bert-base-uncased]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-----------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-16x384-bert-base-uncased]  37.0616 (1.0)    37.0831 (1.0)  37.0616 (1.0)  37.1046 (1.0)  37.3064 (1.0)  37.7519 (1.0)  37.3064 (1.0)  38.1974 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-16x384-t5-small]
Name                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
--------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-16x384-t5-small]  37.2429 (1.0)    37.2818 (1.0)  37.2429 (1.0)  37.3207 (1.0)  37.6701 (1.0)  39.1395 (1.0)  37.6701 (1.0)  40.6089 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-16x512-bert-base-uncased]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-----------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-16x512-bert-base-uncased]  54.1819 (1.0)    54.1819 (1.0)  54.1819 (1.0)  54.1819 (1.0)  56.9548 (1.0)  56.9548 (1.0)  56.9548 (1.0)  56.9548 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-16x512-t5-small]
Name                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
--------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-16x512-t5-small]  61.6796 (1.0)    61.6796 (1.0)  61.6796 (1.0)  61.6796 (1.0)  67.9079 (1.0)  67.9079 (1.0)  67.9079 (1.0)  67.9079 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-1x128-bert-base-uncased]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)     Median        Mean          Min           Max
----------------------------------------------------------------  ---------------  -------------  ------------  -------------  ------------  ------------  ------------  -------------
test_benchmark_implementations[baseline-1x128-bert-base-uncased]  9.773 (1.0)      10.4093 (1.0)  9.3677 (1.0)  12.8543 (1.0)  9.5273 (1.0)  9.6261 (1.0)  9.1325 (1.0)  10.1429 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-1x128-t5-small]
Name                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-1x128-t5-small]  14.7111 (1.0)    15.0332 (1.0)  13.6008 (1.0)  16.8233 (1.0)  14.6558 (1.0)  14.8295 (1.0)  14.5679 (1.0)  15.6316 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-1x16-bert-base-uncased]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
---------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[baseline-1x16-bert-base-uncased]  9.0554 (1.0)     9.0842 (1.0)   8.4582 (1.0)  9.7843 (1.0)  8.8758 (1.0)  9.0132 (1.0)  8.7556 (1.0)  9.6574 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-1x16-t5-small]
Name                                                    Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median       Mean          Min            Max
------------------------------------------------------  ---------------  -------------  -------------  -------------  -----------  ------------  -------------  ------------
test_benchmark_implementations[baseline-1x16-t5-small]  13.1092 (1.0)    13.05 (1.0)    12.3474 (1.0)  13.4801 (1.0)  13.68 (1.0)  13.601 (1.0)  13.0328 (1.0)  14.196 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-1x256-bert-base-uncased]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[baseline-1x256-bert-base-uncased]  8.6497 (1.0)     8.7165 (1.0)   8.3569 (1.0)  9.0163 (1.0)  9.0662 (1.0)  9.1143 (1.0)  8.8639 (1.0)  9.4783 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-1x256-t5-small]
Name                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-1x256-t5-small]  13.733 (1.0)     13.584 (1.0)   12.7846 (1.0)  13.9725 (1.0)  13.3921 (1.0)  13.4996 (1.0)  13.2098 (1.0)  14.2791 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-1x32-bert-base-uncased]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median       Mean          Min           Max
---------------------------------------------------------------  ---------------  -------------  ------------  ------------  -----------  ------------  ------------  ------------
test_benchmark_implementations[baseline-1x32-bert-base-uncased]  8.6172 (1.0)     8.7627 (1.0)   8.4612 (1.0)  9.2692 (1.0)  9.003 (1.0)  9.0773 (1.0)  8.7968 (1.0)  9.8768 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-1x32-t5-small]
Name                                                    Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-1x32-t5-small]  13.0859 (1.0)    13.1182 (1.0)  12.9413 (1.0)  13.3417 (1.0)  13.7977 (1.0)  13.6303 (1.0)  12.8526 (1.0)  14.2483 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-1x33-bert-base-uncased]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
---------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  -----------
test_benchmark_implementations[baseline-1x33-bert-base-uncased]  8.9631 (1.0)     8.9961 (1.0)   8.6508 (1.0)  9.3368 (1.0)  9.0925 (1.0)  9.2992 (1.0)  8.8863 (1.0)  9.804 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-1x33-t5-small]
Name                                                    Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-1x33-t5-small]  14.4927 (1.0)    14.6086 (1.0)  14.3768 (1.0)  14.8603 (1.0)  14.9939 (1.0)  15.8995 (1.0)  14.3601 (1.0)  18.5495 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-1x384-bert-base-uncased]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ----------
test_benchmark_implementations[baseline-1x384-bert-base-uncased]  8.7114 (1.0)     8.8297 (1.0)   8.5914 (1.0)  9.3972 (1.0)  9.0645 (1.0)  9.1505 (1.0)  8.8106 (1.0)  9.97 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-1x384-t5-small]
Name                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-1x384-t5-small]  14.252 (1.0)     14.2193 (1.0)  12.6566 (1.0)  16.0461 (1.0)  13.3462 (1.0)  13.6292 (1.0)  12.9942 (1.0)  14.5193 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-1x512-bert-base-uncased]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[baseline-1x512-bert-base-uncased]  8.49 (1.0)       8.6102 (1.0)   8.3323 (1.0)  9.3704 (1.0)  8.7184 (1.0)  8.8677 (1.0)  8.6373 (1.0)  9.9458 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-1x512-t5-small]
Name                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)    Median         Mean           Min            Max
-------------------------------------------------------  ---------------  -------------  -------------  ------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-1x512-t5-small]  13.7134 (1.0)    13.7128 (1.0)  12.7088 (1.0)  14.38 (1.0)   14.0017 (1.0)  14.0987 (1.0)  13.3253 (1.0)  15.7377 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-32x128-bert-base-uncased]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-----------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-32x128-bert-base-uncased]  20.2496 (1.0)    20.3011 (1.0)  20.0847 (1.0)  20.4401 (1.0)  19.8291 (1.0)  20.0253 (1.0)  19.7915 (1.0)  20.2759 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-32x128-t5-small]
Name                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
--------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-32x128-t5-small]  18.9778 (1.0)    19.0581 (1.0)  18.9614 (1.0)  19.3884 (1.0)  19.7272 (1.0)  20.0157 (1.0)  19.1129 (1.0)  21.0563 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-32x16-bert-base-uncased]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[baseline-32x16-bert-base-uncased]  9.4728 (1.0)     9.4582 (1.0)   8.8045 (1.0)  10.026 (1.0)  9.7987 (1.0)  9.9183 (1.0)  8.9803 (1.0)  12.393 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-32x16-t5-small]
Name                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-32x16-t5-small]  13.3152 (1.0)    13.3717 (1.0)  13.1338 (1.0)  13.9313 (1.0)  13.6671 (1.0)  14.0484 (1.0)  13.4868 (1.0)  15.2404 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-32x256-bert-base-uncased]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)     Median         Mean           Min            Max
-----------------------------------------------------------------  ---------------  -------------  ------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-32x256-bert-base-uncased]  44.201 (1.0)     44.9367 (1.0)  44.201 (1.0)  45.6724 (1.0)  44.6709 (1.0)  45.3158 (1.0)  44.6709 (1.0)  45.9606 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-32x256-t5-small]
Name                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean         Min            Max
--------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -----------  -------------  -------------
test_benchmark_implementations[baseline-32x256-t5-small]  41.7872 (1.0)    41.7894 (1.0)  41.7872 (1.0)  41.7915 (1.0)  42.2025 (1.0)  42.72 (1.0)  42.2025 (1.0)  43.2375 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-32x32-bert-base-uncased]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  -------------
test_benchmark_implementations[baseline-32x32-bert-base-uncased]  8.8842 (1.0)     8.9151 (1.0)   8.6405 (1.0)  9.2467 (1.0)  9.4406 (1.0)  9.5505 (1.0)  9.2105 (1.0)  10.1926 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-32x32-t5-small]
Name                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------  ---------------  -------------  ------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-32x32-t5-small]  12.9311 (1.0)    13.2227 (1.0)  12.673 (1.0)  14.4117 (1.0)  13.2069 (1.0)  13.6548 (1.0)  13.1213 (1.0)  15.5567 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-32x33-bert-base-uncased]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[baseline-32x33-bert-base-uncased]  8.746 (1.0)      8.8255 (1.0)   8.5299 (1.0)  9.3266 (1.0)  9.2535 (1.0)  9.2978 (1.0)  8.9536 (1.0)  9.8502 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-32x33-t5-small]
Name                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min          Max
-------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -----------  -------------
test_benchmark_implementations[baseline-32x33-t5-small]  14.1463 (1.0)    14.8623 (1.0)  14.0841 (1.0)  18.9422 (1.0)  13.1707 (1.0)  13.2962 (1.0)  13.06 (1.0)  14.1388 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-8x128-bert-base-uncased]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  -----------
test_benchmark_implementations[baseline-8x128-bert-base-uncased]  8.8361 (1.0)     8.9329 (1.0)   8.4531 (1.0)  9.6156 (1.0)  9.1264 (1.0)  9.1896 (1.0)  8.8243 (1.0)  9.753 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-8x128-t5-small]
Name                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-8x128-t5-small]  13.0314 (1.0)    13.2814 (1.0)  12.8278 (1.0)  14.6964 (1.0)  13.7784 (1.0)  14.2103 (1.0)  13.6838 (1.0)  16.2541 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-8x16-bert-base-uncased]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
---------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[baseline-8x16-bert-base-uncased]  9.0061 (1.0)     9.1769 (1.0)   8.7818 (1.0)  9.5304 (1.0)  9.5342 (1.0)  9.6431 (1.0)  9.2515 (1.0)  10.163 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-8x16-t5-small]
Name                                                    Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-8x16-t5-small]  14.4589 (1.0)    14.5857 (1.0)  14.2254 (1.0)  14.9617 (1.0)  14.0895 (1.0)  14.3263 (1.0)  14.0492 (1.0)  15.3025 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-8x256-bert-base-uncased]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)    Median         Mean           Min            Max
----------------------------------------------------------------  ---------------  -------------  -------------  ------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-8x256-bert-base-uncased]  13.4728 (1.0)    13.5218 (1.0)  13.3847 (1.0)  13.739 (1.0)  13.5372 (1.0)  13.6341 (1.0)  13.3639 (1.0)  14.1191 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-8x256-t5-small]
Name                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-8x256-t5-small]  13.7586 (1.0)    13.8865 (1.0)  13.3233 (1.0)  14.7854 (1.0)  14.0308 (1.0)  14.3584 (1.0)  13.6141 (1.0)  16.0368 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-8x32-bert-base-uncased]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median        Mean          Min           Max
---------------------------------------------------------------  ---------------  -------------  -------------  -------------  ------------  ------------  ------------  -------------
test_benchmark_implementations[baseline-8x32-bert-base-uncased]  10.8383 (1.0)    11.354 (1.0)   10.0897 (1.0)  15.1972 (1.0)  9.9455 (1.0)  9.9997 (1.0)  9.8209 (1.0)  10.4249 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-8x32-t5-small]
Name                                                    Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-8x32-t5-small]  13.9112 (1.0)    14.1325 (1.0)  13.8097 (1.0)  14.4794 (1.0)  14.7036 (1.0)  14.8959 (1.0)  14.4515 (1.0)  15.7845 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-8x33-bert-base-uncased]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
---------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  -------------
test_benchmark_implementations[baseline-8x33-bert-base-uncased]  9.182 (1.0)      9.1781 (1.0)   8.79 (1.0)    9.5581 (1.0)  9.1942 (1.0)  9.3916 (1.0)  8.9789 (1.0)  10.2219 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-8x33-t5-small]
Name                                                    Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min           Max
------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  ------------  -------------
test_benchmark_implementations[baseline-8x33-t5-small]  13.7769 (1.0)    13.9735 (1.0)  13.5373 (1.0)  14.6678 (1.0)  13.7219 (1.0)  13.9506 (1.0)  13.666 (1.0)  15.1304 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-8x384-bert-base-uncased]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)    Median         Mean           Min           Max
----------------------------------------------------------------  ---------------  -------------  -------------  ------------  -------------  -------------  ------------  -------------
test_benchmark_implementations[baseline-8x384-bert-base-uncased]  19.3595 (1.0)    19.6332 (1.0)  19.2656 (1.0)  19.96 (1.0)   19.3213 (1.0)  19.2561 (1.0)  18.976 (1.0)  19.3873 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-8x384-t5-small]
Name                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median        Mean           Min            Max
-------------------------------------------------------  ---------------  -------------  -------------  -------------  ------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-8x384-t5-small]  20.5619 (1.0)    20.5862 (1.0)  20.5558 (1.0)  20.6377 (1.0)  20.937 (1.0)  21.0072 (1.0)  20.9004 (1.0)  21.1976 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-8x512-bert-base-uncased]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean          Min            Max
----------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  ------------  -------------  -------------
test_benchmark_implementations[baseline-8x512-bert-base-uncased]  28.0556 (1.0)    28.1975 (1.0)  27.8077 (1.0)  28.7292 (1.0)  27.9692 (1.0)  28.031 (1.0)  27.4732 (1.0)  28.6507 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[baseline-8x512-t5-small]
Name                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[baseline-8x512-t5-small]  32.1751 (1.0)    32.2171 (1.0)  32.1751 (1.0)  32.2591 (1.0)  33.0635 (1.0)  33.2596 (1.0)  33.0635 (1.0)  33.4557 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-16x128-bert-base-uncased]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-16x128-bert-base-uncased]  11.7944 (1.0)    11.8386 (1.0)  11.7432 (1.0)  12.1969 (1.0)  12.9573 (1.0)  12.9215 (1.0)  12.8323 (1.0)  12.9965 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-16x128-t5-small]
Name                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------------------  ---------------  -------------  ------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-16x128-t5-small]  9.8273 (1.0)     9.9659 (1.0)   9.3809 (1.0)  11.5098 (1.0)  10.7282 (1.0)  10.8181 (1.0)  10.1671 (1.0)  11.5787 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-16x16-bert-base-uncased]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-16x16-bert-base-uncased]  2.1668 (1.0)     2.211 (1.0)    2.1637 (1.0)  2.7955 (1.0)  3.2696 (1.0)  3.2992 (1.0)  3.1677 (1.0)  3.5459 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-16x16-t5-small]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-16x16-t5-small]  2.3849 (1.0)     2.5092 (1.0)   2.0828 (1.0)  3.1017 (1.0)  3.6655 (1.0)  3.6982 (1.0)  3.5985 (1.0)  4.0614 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-16x256-bert-base-uncased]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------------------  ---------------  -------------  ------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-16x256-bert-base-uncased]  22.4082 (1.0)    22.4734 (1.0)  22.402 (1.0)  22.6743 (1.0)  23.2754 (1.0)  23.3694 (1.0)  22.7829 (1.0)  23.7874 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-16x256-t5-small]
Name                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-16x256-t5-small]  21.7098 (1.0)    21.7211 (1.0)  21.6709 (1.0)  21.7825 (1.0)  23.5671 (1.0)  23.6356 (1.0)  22.6754 (1.0)  24.6643 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-16x32-bert-base-uncased]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min          Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  -----------  -----------
test_benchmark_implementations[dynamo_cuda_graphs-16x32-bert-base-uncased]  3.7857 (1.0)     3.8212 (1.0)   3.4785 (1.0)  4.6776 (1.0)  4.4412 (1.0)  4.5418 (1.0)  4.326 (1.0)  5.307 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-16x32-t5-small]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-16x32-t5-small]  3.0444 (1.0)     3.1022 (1.0)   3.0392 (1.0)  3.7243 (1.0)  4.2531 (1.0)  4.2625 (1.0)  3.9191 (1.0)  4.9246 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-16x33-bert-base-uncased]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median       Mean          Min           Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  -----------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-16x33-bert-base-uncased]  3.6762 (1.0)     3.6646 (1.0)   3.369 (1.0)   3.6792 (1.0)  4.518 (1.0)  4.5788 (1.0)  4.5038 (1.0)  5.0986 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-16x33-t5-small]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-16x33-t5-small]  3.5092 (1.0)     3.5624 (1.0)   3.1969 (1.0)  4.353 (1.0)   4.0736 (1.0)  4.0918 (1.0)  4.0241 (1.0)  4.3901 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-16x384-bert-base-uncased]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-16x384-bert-base-uncased]  40.0026 (1.0)    40.0245 (1.0)  40.0026 (1.0)  40.0465 (1.0)  40.0112 (1.0)  40.7506 (1.0)  40.0112 (1.0)  41.4901 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-16x384-t5-small]
Name                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean          Min            Max
------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  ------------  -------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-16x384-t5-small]  36.8435 (1.0)    36.8471 (1.0)  36.8435 (1.0)  36.8507 (1.0)  36.9535 (1.0)  37.415 (1.0)  36.9535 (1.0)  37.8766 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-16x512-bert-base-uncased]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-16x512-bert-base-uncased]  54.1993 (1.0)    54.1993 (1.0)  54.1993 (1.0)  54.1993 (1.0)  58.8893 (1.0)  58.8893 (1.0)  58.8893 (1.0)  58.8893 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-16x512-t5-small]
Name                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-16x512-t5-small]  66.1453 (1.0)    66.1453 (1.0)  66.1453 (1.0)  66.1453 (1.0)  67.4398 (1.0)  67.4398 (1.0)  67.4398 (1.0)  67.4398 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-1x128-bert-base-uncased]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-1x128-bert-base-uncased]  1.7766 (1.0)     1.7771 (1.0)   1.7746 (1.0)  1.7818 (1.0)  2.9845 (1.0)  2.9911 (1.0)  2.9376 (1.0)  3.2232 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-1x128-t5-small]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-1x128-t5-small]  2.3624 (1.0)     2.371 (1.0)    2.0746 (1.0)  2.6276 (1.0)  3.5201 (1.0)  3.5818 (1.0)  3.4362 (1.0)  4.8786 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-1x16-bert-base-uncased]
Name                                                                       Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-1x16-bert-base-uncased]  1.1233 (1.0)     1.1366 (1.0)   1.0025 (1.0)  1.6036 (1.0)  2.3768 (1.0)  2.3687 (1.0)  2.2759 (1.0)  2.6129 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-1x16-t5-small]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-1x16-t5-small]  2.263 (1.0)      2.2941 (1.0)   2.2159 (1.0)  2.4627 (1.0)  3.2469 (1.0)  3.3204 (1.0)  3.1871 (1.0)  4.3255 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-1x256-bert-base-uncased]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-1x256-bert-base-uncased]  2.2794 (1.0)     2.2799 (1.0)   2.2774 (1.0)  2.2835 (1.0)  3.4291 (1.0)  3.4727 (1.0)  3.2924 (1.0)  3.9123 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-1x256-t5-small]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-1x256-t5-small]  2.5805 (1.0)     2.6245 (1.0)   2.5774 (1.0)  4.0663 (1.0)  4.0455 (1.0)  4.1624 (1.0)  3.9139 (1.0)  5.0839 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-1x32-bert-base-uncased]
Name                                                                       Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-1x32-bert-base-uncased]  1.2698 (1.0)     1.3018 (1.0)   1.2677 (1.0)  1.7275 (1.0)  2.5645 (1.0)  2.5712 (1.0)  2.5045 (1.0)  2.9123 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-1x32-t5-small]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-1x32-t5-small]  2.2344 (1.0)     2.2804 (1.0)   2.2098 (1.0)  2.4566 (1.0)  3.2721 (1.0)  3.2938 (1.0)  3.1985 (1.0)  3.6614 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-1x33-bert-base-uncased]
Name                                                                       Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-1x33-bert-base-uncased]  1.3097 (1.0)     1.2893 (1.0)   1.1561 (1.0)  1.5985 (1.0)  2.4837 (1.0)  2.5032 (1.0)  2.4304 (1.0)  2.9725 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-1x33-t5-small]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-1x33-t5-small]  2.2743 (1.0)     2.3144 (1.0)   2.2139 (1.0)  2.8662 (1.0)  3.3531 (1.0)  3.3825 (1.0)  3.3379 (1.0)  3.7907 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-1x384-bert-base-uncased]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-1x384-bert-base-uncased]  3.1048 (1.0)     3.3712 (1.0)   3.0976 (1.0)  4.1533 (1.0)  4.2385 (1.0)  4.2818 (1.0)  4.0632 (1.0)  4.7228 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-1x384-t5-small]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-1x384-t5-small]  3.3485 (1.0)     3.349 (1.0)    3.3464 (1.0)  3.3526 (1.0)  4.3072 (1.0)  4.3357 (1.0)  4.1253 (1.0)  4.9517 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-1x512-bert-base-uncased]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-1x512-bert-base-uncased]  4.9142 (1.0)     4.9736 (1.0)   4.3807 (1.0)  5.7528 (1.0)  5.6826 (1.0)  5.7722 (1.0)  5.6466 (1.0)  6.6524 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-1x512-t5-small]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean         Min           Max
-----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  -----------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-1x512-t5-small]  4.8568 (1.0)     4.7617 (1.0)   4.5834 (1.0)  4.8773 (1.0)  5.5254 (1.0)  5.549 (1.0)  5.4646 (1.0)  5.8179 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-32x128-bert-base-uncased]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-32x128-bert-base-uncased]  19.6987 (1.0)    19.7018 (1.0)  19.6844 (1.0)  19.7222 (1.0)  21.1323 (1.0)  21.0646 (1.0)  20.1643 (1.0)  21.7468 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-32x128-t5-small]
Name                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-32x128-t5-small]  18.388 (1.0)     18.4146 (1.0)  18.3675 (1.0)  18.4832 (1.0)  19.1271 (1.0)  19.1278 (1.0)  18.9735 (1.0)  19.3403 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-32x16-bert-base-uncased]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean         Min           Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  -----------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-32x16-bert-base-uncased]  3.7519 (1.0)     3.7373 (1.0)   3.4621 (1.0)  4.2271 (1.0)  4.5952 (1.0)  4.599 (1.0)  4.3223 (1.0)  5.1102 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-32x16-t5-small]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  -----------
test_benchmark_implementations[dynamo_cuda_graphs-32x16-t5-small]  2.9932 (1.0)     2.9105 (1.0)   2.688 (1.0)   2.9983 (1.0)  4.0861 (1.0)  4.1283 (1.0)  3.9455 (1.0)  5.104 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-32x256-bert-base-uncased]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)    Median        Mean           Min           Max
---------------------------------------------------------------------------  ---------------  -------------  -------------  ------------  ------------  -------------  ------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-32x256-bert-base-uncased]  43.7985 (1.0)    43.8308 (1.0)  43.7985 (1.0)  43.863 (1.0)  43.921 (1.0)  44.6218 (1.0)  43.921 (1.0)  45.3227 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-32x256-t5-small]
Name                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-32x256-t5-small]  45.7554 (1.0)    46.1491 (1.0)  45.7554 (1.0)  46.5428 (1.0)  43.8758 (1.0)  44.6921 (1.0)  43.8758 (1.0)  45.5083 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-32x32-bert-base-uncased]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-32x32-bert-base-uncased]  6.2812 (1.0)     6.2805 (1.0)   6.2771 (1.0)  6.2822 (1.0)  6.8505 (1.0)  6.9071 (1.0)  6.8274 (1.0)  7.1475 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-32x32-t5-small]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-32x32-t5-small]  4.5773 (1.0)     4.5773 (1.0)   4.5742 (1.0)  4.5814 (1.0)  4.9817 (1.0)  4.9979 (1.0)  4.9529 (1.0)  5.1989 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-32x33-bert-base-uncased]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-32x33-bert-base-uncased]  6.658 (1.0)      6.5818 (1.0)   6.1276 (1.0)  6.6621 (1.0)  7.2029 (1.0)  7.2318 (1.0)  7.1622 (1.0)  7.4013 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-32x33-t5-small]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-32x33-t5-small]  5.3463 (1.0)     5.3377 (1.0)   4.9725 (1.0)  6.1123 (1.0)  5.8142 (1.0)  5.7338 (1.0)  5.3958 (1.0)  6.0101 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-8x128-bert-base-uncased]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-8x128-bert-base-uncased]  7.1752 (1.0)     7.2581 (1.0)   6.8147 (1.0)  7.8838 (1.0)  7.7903 (1.0)  7.8848 (1.0)  7.3682 (1.0)  8.6318 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-8x128-t5-small]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-8x128-t5-small]  5.1415 (1.0)     5.3741 (1.0)   5.1364 (1.0)  5.8419 (1.0)  5.5419 (1.0)  5.7665 (1.0)  5.4966 (1.0)  6.4326 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-8x16-bert-base-uncased]
Name                                                                       Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median       Mean          Min           Max
-------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  -----------  ------------  ------------  -----------
test_benchmark_implementations[dynamo_cuda_graphs-8x16-bert-base-uncased]  1.8115 (1.0)     1.8331 (1.0)   1.8084 (1.0)  2.2886 (1.0)  2.965 (1.0)  2.9613 (1.0)  2.8703 (1.0)  3.154 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-8x16-t5-small]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min          Max
----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  -----------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-8x16-t5-small]  2.3388 (1.0)     2.3733 (1.0)   2.091 (1.0)   2.817 (1.0)   3.5061 (1.0)  3.5425 (1.0)  3.375 (1.0)  3.9803 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-8x256-bert-base-uncased]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min           Max
--------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  ------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-8x256-bert-base-uncased]  14.6166 (1.0)    14.7958 (1.0)  14.5828 (1.0)  15.0231 (1.0)  14.7475 (1.0)  14.8354 (1.0)  14.224 (1.0)  15.3724 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-8x256-t5-small]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-----------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-8x256-t5-small]  17.6189 (1.0)    17.8755 (1.0)  17.2759 (1.0)  18.9297 (1.0)  15.4227 (1.0)  16.4869 (1.0)  12.7024 (1.0)  19.6909 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-8x32-bert-base-uncased]
Name                                                                       Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-8x32-bert-base-uncased]  2.177 (1.0)      2.3693 (1.0)   1.9466 (1.0)  3.8574 (1.0)  3.3955 (1.0)  3.4334 (1.0)  3.2215 (1.0)  4.0768 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-8x32-t5-small]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-8x32-t5-small]  2.5252 (1.0)     2.784 (1.0)    2.4095 (1.0)  5.246 (1.0)   3.8511 (1.0)  3.9285 (1.0)  3.6216 (1.0)  4.9474 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-8x33-bert-base-uncased]
Name                                                                       Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-8x33-bert-base-uncased]  2.2723 (1.0)     2.3958 (1.0)   2.2702 (1.0)  3.1037 (1.0)  3.2951 (1.0)  3.3133 (1.0)  3.2589 (1.0)  3.5847 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-8x33-t5-small]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  -----------
test_benchmark_implementations[dynamo_cuda_graphs-8x33-t5-small]  2.6163 (1.0)     2.6988 (1.0)   2.4269 (1.0)  3.5082 (1.0)  3.8296 (1.0)  3.8831 (1.0)  3.7152 (1.0)  4.688 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-8x384-bert-base-uncased]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
--------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-8x384-bert-base-uncased]  19.5779 (1.0)    19.6183 (1.0)  19.2492 (1.0)  20.0581 (1.0)  19.8876 (1.0)  23.2885 (1.0)  19.6796 (1.0)  32.7064 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-8x384-t5-small]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)    Median         Mean           Min            Max
-----------------------------------------------------------------  ---------------  -------------  -------------  ------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-8x384-t5-small]  20.8609 (1.0)    20.7186 (1.0)  20.2824 (1.0)  20.866 (1.0)  20.7692 (1.0)  20.7659 (1.0)  20.5318 (1.0)  20.9034 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-8x512-bert-base-uncased]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median        Mean           Min           Max
--------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  ------------  -------------  ------------  -------------
test_benchmark_implementations[dynamo_cuda_graphs-8x512-bert-base-uncased]  44.5215 (1.0)    44.609 (1.0)   44.5215 (1.0)  44.6966 (1.0)  34.621 (1.0)  35.0703 (1.0)  34.621 (1.0)  35.5196 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_cuda_graphs-8x512-t5-small]
Name                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-----------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  ------------
test_benchmark_implementations[dynamo_cuda_graphs-8x512-t5-small]  32.0154 (1.0)    32.512 (1.0)   32.0154 (1.0)  33.0086 (1.0)  32.9243 (1.0)  33.8337 (1.0)  32.9243 (1.0)  34.743 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-16x128-bert-base-uncased]
Name                                                                       Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)     Median        Mean           Min            Max
-------------------------------------------------------------------------  ---------------  -------------  ------------  -------------  ------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-16x128-bert-base-uncased]  15.529 (1.0)     15.6979 (1.0)  15.438 (1.0)  16.1258 (1.0)  16.225 (1.0)  16.4561 (1.0)  16.1807 (1.0)  16.8411 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-16x128-t5-small]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)    Median         Mean           Min            Max
----------------------------------------------------------------  ---------------  -------------  -------------  ------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-16x128-t5-small]  26.4632 (1.0)    27.3261 (1.0)  25.8621 (1.0)  29.653 (1.0)  27.6292 (1.0)  27.7435 (1.0)  26.5987 (1.0)  29.0026 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-16x16-bert-base-uncased]
Name                                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean          Min            Max
------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  ------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-16x16-bert-base-uncased]  15.7409 (1.0)    15.7063 (1.0)  15.3221 (1.0)  15.9406 (1.0)  17.1301 (1.0)  17.382 (1.0)  17.0683 (1.0)  18.1772 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-16x16-t5-small]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min           Max
---------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  ------------  -------------
test_benchmark_implementations[dynamo_optimized-16x16-t5-small]  23.721 (1.0)     23.7519 (1.0)  23.5756 (1.0)  23.9278 (1.0)  23.6769 (1.0)  23.8569 (1.0)  23.602 (1.0)  24.4377 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-16x256-bert-base-uncased]
Name                                                                       Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean          Min            Max
-------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  ------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-16x256-bert-base-uncased]  15.7143 (1.0)    15.9845 (1.0)  15.5546 (1.0)  16.5663 (1.0)  16.6238 (1.0)  16.673 (1.0)  16.2893 (1.0)  17.1694 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-16x256-t5-small]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min           Max
----------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  ------------  -------------
test_benchmark_implementations[dynamo_optimized-16x256-t5-small]  23.9063 (1.0)    24.0725 (1.0)  23.6196 (1.0)  24.4328 (1.0)  25.6272 (1.0)  25.9329 (1.0)  25.606 (1.0)  26.5789 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-16x32-bert-base-uncased]
Name                                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-16x32-bert-base-uncased]  16.1167 (1.0)    16.1663 (1.0)  15.2975 (1.0)  17.0209 (1.0)  15.9318 (1.0)  16.1072 (1.0)  15.6598 (1.0)  16.8896 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-16x32-t5-small]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median        Mean           Min            Max
---------------------------------------------------------------  ---------------  -------------  -------------  -------------  ------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-16x32-t5-small]  23.8234 (1.0)    23.7718 (1.0)  23.5592 (1.0)  23.9329 (1.0)  24.332 (1.0)  24.3922 (1.0)  23.4967 (1.0)  25.3479 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-16x33-bert-base-uncased]
Name                                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-16x33-bert-base-uncased]  17.2329 (1.0)    17.3587 (1.0)  16.7537 (1.0)  18.0122 (1.0)  18.1626 (1.0)  18.3131 (1.0)  17.5168 (1.0)  19.8203 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-16x33-t5-small]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-16x33-t5-small]  25.6993 (1.0)    25.9232 (1.0)  25.6686 (1.0)  26.4018 (1.0)  25.6834 (1.0)  25.8013 (1.0)  25.2049 (1.0)  26.5156 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-16x384-bert-base-uncased]
Name                                                                       Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-16x384-bert-base-uncased]  24.329 (1.0)     24.3832 (1.0)  24.3241 (1.0)  24.5504 (1.0)  24.4903 (1.0)  24.5692 (1.0)  23.9155 (1.0)  25.0286 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-16x384-t5-small]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
----------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-16x384-t5-small]  29.0191 (1.0)    29.0758 (1.0)  28.9075 (1.0)  29.3007 (1.0)  29.0773 (1.0)  29.0093 (1.0)  28.7828 (1.0)  29.1679 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-16x512-bert-base-uncased]
Name                                                                       Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-16x512-bert-base-uncased]  33.1151 (1.0)    33.1028 (1.0)  33.0742 (1.0)  33.1192 (1.0)  33.2799 (1.0)  32.6101 (1.0)  31.2428 (1.0)  33.3076 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-16x512-t5-small]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
----------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-16x512-t5-small]  41.2908 (1.0)    41.3435 (1.0)  41.2908 (1.0)  41.3962 (1.0)  42.2976 (1.0)  42.4016 (1.0)  42.2976 (1.0)  42.5056 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-1x128-bert-base-uncased]
Name                                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-1x128-bert-base-uncased]  15.5269 (1.0)    15.7179 (1.0)  15.3795 (1.0)  16.3412 (1.0)  15.8146 (1.0)  15.9305 (1.0)  15.5723 (1.0)  16.3141 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-1x128-t5-small]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------  ---------------  -------------  ------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-1x128-t5-small]  24.2033 (1.0)    24.5777 (1.0)  24.148 (1.0)  25.3819 (1.0)  27.0884 (1.0)  27.0684 (1.0)  26.3607 (1.0)  27.7561 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-1x16-bert-base-uncased]
Name                                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-----------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-1x16-bert-base-uncased]  16.6656 (1.0)    16.6716 (1.0)  16.3772 (1.0)  16.9892 (1.0)  17.5615 (1.0)  17.7133 (1.0)  17.4492 (1.0)  18.0874 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-1x16-t5-small]
Name                                                            Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min           Max
--------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized-1x16-t5-small]  23.9698 (1.0)    24.3167 (1.0)  23.9454 (1.0)  24.7183 (1.0)  24.5186 (1.0)  25.1447 (1.0)  24.329 (1.0)  26.739 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-1x256-bert-base-uncased]
Name                                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-1x256-bert-base-uncased]  16.6977 (1.0)    16.6216 (1.0)  16.3543 (1.0)  16.8458 (1.0)  16.1359 (1.0)  16.2182 (1.0)  15.8811 (1.0)  16.6709 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-1x256-t5-small]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-1x256-t5-small]  23.5766 (1.0)    23.6293 (1.0)  23.4783 (1.0)  23.8121 (1.0)  24.2133 (1.0)  24.2318 (1.0)  23.7578 (1.0)  24.5241 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-1x32-bert-base-uncased]
Name                                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-----------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-1x32-bert-base-uncased]  15.9468 (1.0)    15.8807 (1.0)  15.5997 (1.0)  16.0594 (1.0)  16.0345 (1.0)  16.3466 (1.0)  15.7764 (1.0)  17.4041 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-1x32-t5-small]
Name                                                            Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
--------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-1x32-t5-small]  25.3676 (1.0)    25.3402 (1.0)  25.1351 (1.0)  25.4567 (1.0)  24.2463 (1.0)  24.5066 (1.0)  23.8067 (1.0)  25.5845 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-1x33-bert-base-uncased]
Name                                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-----------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -----------
test_benchmark_implementations[dynamo_optimized-1x33-bert-base-uncased]  16.7424 (1.0)    17.2884 (1.0)  15.7512 (1.0)  20.6316 (1.0)  16.5943 (1.0)  16.6481 (1.0)  15.6073 (1.0)  17.66 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-1x33-t5-small]
Name                                                            Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)    Median         Mean           Min            Max
--------------------------------------------------------------  ---------------  -------------  -------------  ------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-1x33-t5-small]  23.2101 (1.0)    23.4077 (1.0)  23.0103 (1.0)  23.894 (1.0)  23.1794 (1.0)  23.5759 (1.0)  23.1579 (1.0)  24.5326 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-1x384-bert-base-uncased]
Name                                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean          Min            Max
------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  ------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-1x384-bert-base-uncased]  15.5044 (1.0)    15.5597 (1.0)  15.3446 (1.0)  15.8996 (1.0)  16.6962 (1.0)  16.951 (1.0)  16.3983 (1.0)  17.8309 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-1x384-t5-small]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-1x384-t5-small]  23.9964 (1.0)    51.2294 (1.0)  23.5766 (1.0)  133.077 (1.0)  23.3846 (1.0)  23.7647 (1.0)  23.3775 (1.0)  24.7859 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-1x512-bert-base-uncased]
Name                                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean        Min            Max
------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  ----------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-1x512-bert-base-uncased]  28.7365 (1.0)    28.7374 (1.0)  26.8401 (1.0)  29.8246 (1.0)  26.1554 (1.0)  27.0 (1.0)  23.7694 (1.0)  30.6162 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-1x512-t5-small]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-1x512-t5-small]  23.713 (1.0)     23.8196 (1.0)  23.6041 (1.0)  24.1418 (1.0)  24.8215 (1.0)  24.6428 (1.0)  24.1424 (1.0)  24.9645 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-32x128-bert-base-uncased]
Name                                                                       Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-32x128-bert-base-uncased]  16.2375 (1.0)    16.0354 (1.0)  15.2883 (1.0)  16.3267 (1.0)  17.1923 (1.0)  17.2948 (1.0)  16.8946 (1.0)  18.0403 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-32x128-t5-small]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
----------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-32x128-t5-small]  23.8367 (1.0)    24.2241 (1.0)  23.8226 (1.0)  25.1844 (1.0)  23.8468 (1.0)  24.4026 (1.0)  23.7629 (1.0)  25.5069 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-32x16-bert-base-uncased]
Name                                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean          Min           Max
------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized-32x16-bert-base-uncased]  15.606 (1.0)     15.6388 (1.0)  15.2741 (1.0)  16.0316 (1.0)  16.3883 (1.0)  16.374 (1.0)  15.828 (1.0)  16.723 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-32x16-t5-small]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)    Median         Mean           Min            Max
---------------------------------------------------------------  ---------------  -------------  -------------  ------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-32x16-t5-small]  23.9964 (1.0)    24.035 (1.0)   23.7865 (1.0)  24.322 (1.0)  26.0283 (1.0)  25.9788 (1.0)  25.8128 (1.0)  26.0953 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-32x256-bert-base-uncased]
Name                                                                       Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-32x256-bert-base-uncased]  32.0645 (1.0)    32.0843 (1.0)  32.0563 (1.0)  32.1321 (1.0)  33.8077 (1.0)  34.0796 (1.0)  33.7087 (1.0)  34.7225 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-32x256-t5-small]
Name                                                              Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median        Mean           Min            Max
----------------------------------------------------------------  ---------------  -------------  -------------  -------------  ------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-32x256-t5-small]  31.0989 (1.0)    31.261 (1.0)   30.8255 (1.0)  31.8587 (1.0)  31.402 (1.0)  31.5264 (1.0)  31.3423 (1.0)  31.8349 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-32x32-bert-base-uncased]
Name                                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-32x32-bert-base-uncased]  15.8147 (1.0)    15.8582 (1.0)  15.4296 (1.0)  16.2837 (1.0)  16.0405 (1.0)  16.1532 (1.0)  15.8502 (1.0)  16.4471 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-32x32-t5-small]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-32x32-t5-small]  24.4255 (1.0)    24.6106 (1.0)  23.8551 (1.0)  25.6051 (1.0)  24.2734 (1.0)  25.0007 (1.0)  24.2047 (1.0)  26.4076 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-32x33-bert-base-uncased]
Name                                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median         Mean           Min            Max
------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-32x33-bert-base-uncased]  15.8065 (1.0)    16.0351 (1.0)  15.703 (1.0)  16.43 (1.0)   16.7272 (1.0)  16.8523 (1.0)  16.5661 (1.0)  17.2594 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-32x33-t5-small]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-32x33-t5-small]  29.0642 (1.0)    29.1427 (1.0)  28.5164 (1.0)  29.8476 (1.0)  37.8893 (1.0)  38.0107 (1.0)  37.7777 (1.0)  38.3652 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-8x128-bert-base-uncased]
Name                                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median        Mean           Min            Max
------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  ------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-8x128-bert-base-uncased]  18.7959 (1.0)    20.8643 (1.0)  17.5032 (1.0)  25.2374 (1.0)  17.724 (1.0)  18.2251 (1.0)  17.2782 (1.0)  19.7773 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-8x128-t5-small]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-8x128-t5-small]  23.8479 (1.0)    24.2463 (1.0)  23.5612 (1.0)  25.3297 (1.0)  24.0852 (1.0)  24.6482 (1.0)  23.6088 (1.0)  26.2506 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-8x16-bert-base-uncased]
Name                                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-----------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-8x16-bert-base-uncased]  15.4501 (1.0)    15.4423 (1.0)  15.1419 (1.0)  15.8003 (1.0)  16.0263 (1.0)  16.1594 (1.0)  15.6526 (1.0)  16.8285 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-8x16-t5-small]
Name                                                            Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
--------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-8x16-t5-small]  23.7005 (1.0)    23.7949 (1.0)  23.3994 (1.0)  24.2442 (1.0)  23.8631 (1.0)  23.9429 (1.0)  23.3203 (1.0)  24.3189 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-8x256-bert-base-uncased]
Name                                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-8x256-bert-base-uncased]  16.213 (1.0)     16.265 (1.0)   15.8083 (1.0)  16.7579 (1.0)  16.2113 (1.0)  16.2997 (1.0)  15.8138 (1.0)  17.2422 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-8x256-t5-small]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-8x256-t5-small]  26.0355 (1.0)    26.1356 (1.0)  25.9348 (1.0)  26.4366 (1.0)  25.8355 (1.0)  26.1079 (1.0)  25.8069 (1.0)  26.6813 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-8x32-bert-base-uncased]
Name                                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)    Median         Mean           Min            Max
-----------------------------------------------------------------------  ---------------  -------------  -------------  ------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-8x32-bert-base-uncased]  15.6795 (1.0)    15.7617 (1.0)  15.4583 (1.0)  16.224 (1.0)  15.9633 (1.0)  16.0919 (1.0)  15.8914 (1.0)  16.5325 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-8x32-t5-small]
Name                                                            Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
--------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-8x32-t5-small]  25.9401 (1.0)    25.9404 (1.0)  25.1832 (1.0)  26.6977 (1.0)  26.9852 (1.0)  26.5265 (1.0)  25.4022 (1.0)  27.1921 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-8x33-bert-base-uncased]
Name                                                                     Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)    Median         Mean           Min            Max
-----------------------------------------------------------------------  ---------------  -------------  -------------  ------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-8x33-bert-base-uncased]  16.1691 (1.0)    16.1903 (1.0)  16.0369 (1.0)  16.428 (1.0)  17.0673 (1.0)  17.0381 (1.0)  16.1255 (1.0)  17.5792 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-8x33-t5-small]
Name                                                            Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
--------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-8x33-t5-small]  28.0924 (1.0)    28.2017 (1.0)  27.9777 (1.0)  28.5349 (1.0)  27.7067 (1.0)  27.6305 (1.0)  26.1777 (1.0)  29.0071 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-8x384-bert-base-uncased]
Name                                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)    Median         Mean           Min            Max
------------------------------------------------------------------------  ---------------  -------------  -------------  ------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-8x384-bert-base-uncased]  16.2949 (1.0)    16.3457 (1.0)  15.9232 (1.0)  16.726 (1.0)  17.2603 (1.0)  18.3867 (1.0)  17.1737 (1.0)  23.6282 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-8x384-t5-small]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean          Min            Max
---------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  ------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-8x384-t5-small]  23.8346 (1.0)    23.9993 (1.0)  23.7619 (1.0)  24.5208 (1.0)  24.0682 (1.0)  24.717 (1.0)  23.8535 (1.0)  25.9963 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-8x512-bert-base-uncased]
Name                                                                      Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-8x512-bert-base-uncased]  17.1407 (1.0)    17.1141 (1.0)  16.9728 (1.0)  17.2227 (1.0)  17.7703 (1.0)  17.8275 (1.0)  17.0472 (1.0)  18.5648 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized-8x512-t5-small]
Name                                                             Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized-8x512-t5-small]  23.6349 (1.0)    23.5837 (1.0)  23.4465 (1.0)  23.6698 (1.0)  25.7929 (1.0)  25.6475 (1.0)  25.0071 (1.0)  26.1426 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x128-bert-base-uncased]
Name                                                                                   Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x128-bert-base-uncased]  8.7491 (1.0)     8.7506 (1.0)   8.7409 (1.0)  8.7654 (1.0)  9.3622 (1.0)  9.7079 (1.0)  9.0844 (1.0)  12.7443 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x128-t5-small]
Name                                                                          Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x128-t5-small]  7.8141 (1.0)     7.8142 (1.0)   7.8039 (1.0)  7.8203 (1.0)  8.1241 (1.0)  8.2992 (1.0)  7.9182 (1.0)  9.5537 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x16-bert-base-uncased]
Name                                                                                  Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x16-bert-base-uncased]  2.6368 (1.0)     2.7535 (1.0)   2.3951 (1.0)  5.1958 (1.0)  3.6353 (1.0)  3.6478 (1.0)  3.6158 (1.0)  3.8951 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x16-t5-small]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
---------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x16-t5-small]  4.8333 (1.0)     4.9 (1.0)      4.821 (1.0)   5.7539 (1.0)  5.4462 (1.0)  5.6183 (1.0)  5.3618 (1.0)  7.4077 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x256-bert-base-uncased]
Name                                                                                   Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x256-bert-base-uncased]  15.3969 (1.0)    15.4245 (1.0)  15.2996 (1.0)  15.5556 (1.0)  16.6515 (1.0)  16.4521 (1.0)  15.7089 (1.0)  17.0655 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x256-t5-small]
Name                                                                          Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
----------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x256-t5-small]  17.3128 (1.0)    18.2313 (1.0)  17.2933 (1.0)  19.9639 (1.0)  17.4376 (1.0)  17.6459 (1.0)  16.9861 (1.0)  18.9684 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x32-bert-base-uncased]
Name                                                                                  Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x32-bert-base-uncased]  3.8011 (1.0)     3.78 (1.0)     3.4744 (1.0)  4.0294 (1.0)  4.9561 (1.0)  4.9683 (1.0)  4.6416 (1.0)  5.7431 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x32-t5-small]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
---------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x32-t5-small]  5.3873 (1.0)     5.3894 (1.0)   5.375 (1.0)   5.4026 (1.0)  5.9976 (1.0)  6.1153 (1.0)  5.8567 (1.0)  7.0555 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x33-bert-base-uncased]
Name                                                                                  Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x33-bert-base-uncased]  3.5994 (1.0)     3.6782 (1.0)   3.499 (1.0)   3.8318 (1.0)  4.6808 (1.0)  4.7164 (1.0)  4.6479 (1.0)  4.9636 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x33-t5-small]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x33-t5-small]  13.9551 (1.0)    14.1346 (1.0)  13.9254 (1.0)  14.5162 (1.0)  14.8485 (1.0)  15.0942 (1.0)  14.7619 (1.0)  16.201 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x384-bert-base-uncased]
Name                                                                                   Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x384-bert-base-uncased]  23.6165 (1.0)    23.568 (1.0)   23.4301 (1.0)  23.6575 (1.0)  23.6695 (1.0)  23.9327 (1.0)  23.2845 (1.0)  24.8439 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x384-t5-small]
Name                                                                          Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
----------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x384-t5-small]  27.4985 (1.0)    27.4934 (1.0)  27.4811 (1.0)  27.5005 (1.0)  28.0581 (1.0)  28.1259 (1.0)  27.8485 (1.0)  28.4713 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x512-bert-base-uncased]
Name                                                                                   Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x512-bert-base-uncased]  31.5648 (1.0)    31.5525 (1.0)  31.5146 (1.0)  31.5781 (1.0)  32.0504 (1.0)  33.3326 (1.0)  30.5893 (1.0)  37.3581 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x512-t5-small]
Name                                                                          Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)    Median         Mean           Min            Max
----------------------------------------------------------------------------  ---------------  -------------  -------------  ------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-16x512-t5-small]  40.6579 (1.0)    41.705 (1.0)   40.6579 (1.0)  42.752 (1.0)  41.4315 (1.0)  41.6304 (1.0)  41.4315 (1.0)  41.8293 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x128-bert-base-uncased]
Name                                                                                  Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x128-bert-base-uncased]  1.6732 (1.0)     1.6127 (1.0)   1.4899 (1.0)  1.6794 (1.0)  2.9271 (1.0)  2.9294 (1.0)  2.7838 (1.0)  3.2261 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x128-t5-small]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean         Min           Max
---------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  -----------  ------------  -----------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x128-t5-small]  2.2886 (1.0)     2.3023 (1.0)   2.175 (1.0)   2.5016 (1.0)  3.4154 (1.0)  3.449 (1.0)  3.3317 (1.0)  3.825 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x16-bert-base-uncased]
Name                                                                                 Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median       Mean          Min           Max
-----------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  -----------  ------------  ------------  -----------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x16-bert-base-uncased]  1.0568 (1.0)     1.0694 (1.0)   1.025 (1.0)   1.2442 (1.0)  2.248 (1.0)  2.3706 (1.0)  2.1688 (1.0)  4.177 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x16-t5-small]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x16-t5-small]  2.3634 (1.0)     2.3308 (1.0)   2.2047 (1.0)  2.6051 (1.0)  3.6088 (1.0)  3.6252 (1.0)  3.5545 (1.0)  4.0145 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x256-bert-base-uncased]
Name                                                                                  Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median       Mean          Min           Max
------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  -----------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x256-bert-base-uncased]  2.1084 (1.0)     2.1018 (1.0)   1.9046 (1.0)  2.4238 (1.0)  3.135 (1.0)  3.1665 (1.0)  3.1208 (1.0)  3.4383 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x256-t5-small]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
---------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x256-t5-small]  2.7566 (1.0)     2.7567 (1.0)   2.7546 (1.0)  2.7607 (1.0)  3.8914 (1.0)  3.9539 (1.0)  3.8232 (1.0)  4.9762 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x32-bert-base-uncased]
Name                                                                                 Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min          Max
-----------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  -----------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x32-bert-base-uncased]  1.1991 (1.0)     1.223 (1.0)    0.9175 (1.0)  1.8452 (1.0)  2.2317 (1.0)  2.2384 (1.0)  2.096 (1.0)  2.6554 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x32-t5-small]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean         Min           Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  -----------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x32-t5-small]  2.3173 (1.0)     2.3312 (1.0)   2.2067 (1.0)  2.5846 (1.0)  3.5409 (1.0)  3.566 (1.0)  3.4507 (1.0)  4.0277 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x33-bert-base-uncased]
Name                                                                                 Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-----------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  -----------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x33-bert-base-uncased]  1.0609 (1.0)     1.2429 (1.0)   1.0394 (1.0)  2.2856 (1.0)  2.3841 (1.0)  2.4031 (1.0)  2.3221 (1.0)  2.653 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x33-t5-small]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x33-t5-small]  4.48 (1.0)       4.4813 (1.0)   4.4759 (1.0)  4.4882 (1.0)  6.1835 (1.0)  6.3432 (1.0)  6.1048 (1.0)  8.4229 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x384-bert-base-uncased]
Name                                                                                  Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x384-bert-base-uncased]  2.3204 (1.0)     2.3207 (1.0)   2.3173 (1.0)  2.3255 (1.0)  3.5498 (1.0)  3.7176 (1.0)  3.4913 (1.0)  5.9926 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x384-t5-small]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
---------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x384-t5-small]  3.8318 (1.0)     3.8328 (1.0)   3.8226 (1.0)  3.84 (1.0)    4.6151 (1.0)  4.6644 (1.0)  4.5904 (1.0)  5.0334 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x512-bert-base-uncased]
Name                                                                                  Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x512-bert-base-uncased]  3.4263 (1.0)     3.4282 (1.0)   3.4222 (1.0)  3.4386 (1.0)  4.3116 (1.0)  4.4197 (1.0)  4.2515 (1.0)  5.6136 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x512-t5-small]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
---------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-1x512-t5-small]  4.4984 (1.0)     4.4948 (1.0)   4.0602 (1.0)  5.4047 (1.0)  4.9184 (1.0)  5.0644 (1.0)  4.8824 (1.0)  7.4166 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x128-bert-base-uncased]
Name                                                                                   Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x128-bert-base-uncased]  15.0958 (1.0)    16.02 (1.0)    15.0845 (1.0)  19.3321 (1.0)  15.9286 (1.0)  15.6944 (1.0)  15.1123 (1.0)  16.0707 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x128-t5-small]
Name                                                                          Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean          Min            Max
----------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  ------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x128-t5-small]  13.9008 (1.0)    13.9116 (1.0)  13.8967 (1.0)  13.9284 (1.0)  14.4178 (1.0)  14.374 (1.0)  14.0388 (1.0)  14.6666 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x16-bert-base-uncased]
Name                                                                                  Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x16-bert-base-uncased]  4.3233 (1.0)     4.5845 (1.0)   4.3151 (1.0)  7.0554 (1.0)  5.1338 (1.0)  5.2188 (1.0)  5.1134 (1.0)  5.9856 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x16-t5-small]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
---------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x16-t5-small]  7.9278 (1.0)     8.0653 (1.0)   7.8868 (1.0)  8.4019 (1.0)  8.8323 (1.0)  8.9414 (1.0)  8.7879 (1.0)  9.7291 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x256-bert-base-uncased]
Name                                                                                   Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median        Mean           Min            Max
-------------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  ------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x256-bert-base-uncased]  30.0083 (1.0)    30.1182 (1.0)  30.0012 (1.0)  30.3452 (1.0)  31.094 (1.0)  30.5445 (1.0)  28.9245 (1.0)  31.6148 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x256-t5-small]
Name                                                                          Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median        Mean           Min            Max
----------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  ------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x256-t5-small]  30.1978 (1.0)    30.1981 (1.0)  30.1793 (1.0)  30.2172 (1.0)  30.911 (1.0)  32.3733 (1.0)  30.3638 (1.0)  35.8452 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x32-bert-base-uncased]
Name                                                                                  Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x32-bert-base-uncased]  6.5321 (1.0)     6.648 (1.0)    6.5249 (1.0)  8.1408 (1.0)  7.1324 (1.0)  7.3546 (1.0)  7.0795 (1.0)  9.7389 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x32-t5-small]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------------------  ---------------  -------------  ------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x32-t5-small]  9.9359 (1.0)     10.4582 (1.0)  9.9174 (1.0)  14.6524 (1.0)  10.2481 (1.0)  10.3574 (1.0)  10.2172 (1.0)  10.9932 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x33-bert-base-uncased]
Name                                                                                  Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x33-bert-base-uncased]  6.5649 (1.0)     6.5682 (1.0)   6.5567 (1.0)  6.5853 (1.0)  7.2268 (1.0)  7.2579 (1.0)  7.1627 (1.0)  7.4454 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x33-t5-small]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean          Min            Max
---------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  ------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-32x33-t5-small]  28.1518 (1.0)    28.1269 (1.0)  28.0637 (1.0)  28.1651 (1.0)  28.2557 (1.0)  28.256 (1.0)  28.2241 (1.0)  28.2883 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x128-bert-base-uncased]
Name                                                                                  Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x128-bert-base-uncased]  5.4497 (1.0)     5.5963 (1.0)   5.4405 (1.0)  6.1788 (1.0)  6.1121 (1.0)  6.3158 (1.0)  6.0377 (1.0)  8.7535 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x128-t5-small]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
---------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x128-t5-small]  4.3715 (1.0)     4.3727 (1.0)   4.3653 (1.0)  4.3889 (1.0)  4.7905 (1.0)  4.8164 (1.0)  4.7768 (1.0)  5.1417 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x16-bert-base-uncased]
Name                                                                                 Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-----------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x16-bert-base-uncased]  2.0122 (1.0)     2.0124 (1.0)   2.0091 (1.0)  2.0183 (1.0)  3.2205 (1.0)  3.3522 (1.0)  3.0683 (1.0)  4.5546 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x16-t5-small]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x16-t5-small]  2.9051 (1.0)     3.0527 (1.0)   2.8979 (1.0)  4.6223 (1.0)  4.0484 (1.0)  4.0963 (1.0)  3.9948 (1.0)  5.0668 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x256-bert-base-uncased]
Name                                                                                  Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x256-bert-base-uncased]  9.0112 (1.0)     9.0134 (1.0)   9.001 (1.0)   9.0276 (1.0)  9.5772 (1.0)  9.8515 (1.0)  9.2813 (1.0)  12.2523 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x256-t5-small]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)     Median        Mean          Min           Max
---------------------------------------------------------------------------  ---------------  -------------  ------------  -------------  ------------  ------------  ------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x256-t5-small]  8.8463 (1.0)     9.5161 (1.0)   8.5402 (1.0)  15.2586 (1.0)  9.2504 (1.0)  9.3852 (1.0)  9.1572 (1.0)  10.4065 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x32-bert-base-uncased]
Name                                                                                 Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-----------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x32-bert-base-uncased]  2.3654 (1.0)     2.4913 (1.0)   2.1514 (1.0)  3.7294 (1.0)  3.3994 (1.0)  3.4238 (1.0)  3.3824 (1.0)  3.6471 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x32-t5-small]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean         Min           Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  -----------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x32-t5-small]  3.3577 (1.0)     3.3581 (1.0)   3.3516 (1.0)  3.3669 (1.0)  4.2836 (1.0)  4.313 (1.0)  4.2255 (1.0)  4.6235 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x33-bert-base-uncased]
Name                                                                                 Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-----------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x33-bert-base-uncased]  2.4494 (1.0)     2.5648 (1.0)   2.4402 (1.0)  3.7151 (1.0)  3.5016 (1.0)  3.5312 (1.0)  3.4402 (1.0)  4.3809 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x33-t5-small]
Name                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min          Max
--------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  -----------  ------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x33-t5-small]  8.108 (1.0)      8.262 (1.0)    8.0988 (1.0)  9.217 (1.0)   8.8614 (1.0)  8.9111 (1.0)  8.777 (1.0)  9.2576 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x384-bert-base-uncased]
Name                                                                                  Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
------------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x384-bert-base-uncased]  12.674 (1.0)     12.5699 (1.0)  12.2624 (1.0)  12.6812 (1.0)  13.1047 (1.0)  13.0645 (1.0)  12.6159 (1.0)  13.4931 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x384-t5-small]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x384-t5-small]  15.276 (1.0)     16.2091 (1.0)  15.1962 (1.0)  20.5885 (1.0)  15.8726 (1.0)  16.1358 (1.0)  15.7158 (1.0)  17.1339 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x512-bert-base-uncased]
Name                                                                                  Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)    Median        Mean           Min            Max
------------------------------------------------------------------------------------  ---------------  -------------  -------------  ------------  ------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x512-bert-base-uncased]  16.0768 (1.0)    16.0786 (1.0)  16.0748 (1.0)  16.087 (1.0)  17.129 (1.0)  16.8343 (1.0)  16.1482 (1.0)  17.3673 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x512-t5-small]
Name                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
---------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimized_cuda_graphs-8x512-t5-small]  23.2643 (1.0)    23.7468 (1.0)  21.9023 (1.0)  25.4024 (1.0)  22.5516 (1.0)  22.5673 (1.0)  22.4499 (1.0)  22.6799 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x128-bert-base-uncased]
Name                                                                                          Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
--------------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x128-bert-base-uncased]  8.9252 (1.0)     9.0552 (1.0)   8.9119 (1.0)  9.9461 (1.0)  9.4913 (1.0)  9.4721 (1.0)  9.2338 (1.0)  9.6055 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x128-t5-small]
Name                                                                                 Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median       Mean          Min           Max
-----------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  -----------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x128-t5-small]  7.7906 (1.0)     7.7317 (1.0)   7.3062 (1.0)  8.5166 (1.0)  7.989 (1.0)  8.0024 (1.0)  7.8809 (1.0)  8.1638 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x16-bert-base-uncased]
Name                                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean         Min           Max
-------------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  -----------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x16-bert-base-uncased]  2.9829 (1.0)     3.0462 (1.0)   2.9768 (1.0)  4.0161 (1.0)  4.3549 (1.0)  4.851 (1.0)  4.1735 (1.0)  6.6773 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x16-t5-small]
Name                                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x16-t5-small]  3.7796 (1.0)     3.7804 (1.0)   3.7724 (1.0)  3.7919 (1.0)  4.4253 (1.0)  4.5404 (1.0)  4.3903 (1.0)  5.4138 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x256-bert-base-uncased]
Name                                                                                          Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)     Median         Mean           Min            Max
--------------------------------------------------------------------------------------------  ---------------  -------------  ------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x256-bert-base-uncased]  15.7921 (1.0)    15.7932 (1.0)  15.786 (1.0)  15.7983 (1.0)  16.5907 (1.0)  16.3374 (1.0)  15.7452 (1.0)  16.6566 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x256-t5-small]
Name                                                                                 Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min           Max
-----------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  ------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x256-t5-small]  15.2494 (1.0)    15.3284 (1.0)  15.1828 (1.0)  15.6672 (1.0)  15.9234 (1.0)  15.9496 (1.0)  15.719 (1.0)  16.1839 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x32-bert-base-uncased]
Name                                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-------------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x32-bert-base-uncased]  4.1318 (1.0)     3.9837 (1.0)   3.7745 (1.0)  4.1472 (1.0)  5.0007 (1.0)  5.0303 (1.0)  4.9639 (1.0)  5.2368 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x32-t5-small]
Name                                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x32-t5-small]  4.3448 (1.0)     4.3452 (1.0)   4.3387 (1.0)  4.3561 (1.0)  4.8498 (1.0)  4.9738 (1.0)  4.8354 (1.0)  6.7184 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x33-bert-base-uncased]
Name                                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x33-bert-base-uncased]  31.8372 (1.0)    31.8307 (1.0)  31.8126 (1.0)  31.8423 (1.0)  33.4529 (1.0)  33.4243 (1.0)  33.3008 (1.0)  33.5191 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x33-t5-small]
Name                                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
----------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x33-t5-small]  13.8998 (1.0)    13.9006 (1.0)  13.8824 (1.0)  13.9203 (1.0)  14.1524 (1.0)  14.1937 (1.0)  14.1133 (1.0)  14.4109 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x384-bert-base-uncased]
Name                                                                                          Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median        Mean           Min            Max
--------------------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  ------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x384-bert-base-uncased]  23.6861 (1.0)    23.5916 (1.0)  23.3636 (1.0)  23.7251 (1.0)  25.051 (1.0)  24.7009 (1.0)  23.9402 (1.0)  25.1115 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x384-t5-small]
Name                                                                                 Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-----------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x384-t5-small]  23.849 (1.0)     24.0448 (1.0)  23.8408 (1.0)  24.6313 (1.0)  24.4995 (1.0)  26.0343 (1.0)  24.1172 (1.0)  30.8623 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x512-bert-base-uncased]
Name                                                                                          Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)    Median         Mean           Min            Max
--------------------------------------------------------------------------------------------  ---------------  -------------  -------------  ------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x512-bert-base-uncased]  31.6334 (1.0)    31.6447 (1.0)  31.5986 (1.0)  31.702 (1.0)  32.5907 (1.0)  31.9904 (1.0)  30.6196 (1.0)  32.7607 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x512-t5-small]
Name                                                                                 Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-----------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-16x512-t5-small]  34.7679 (1.0)    34.8831 (1.0)  34.7679 (1.0)  34.9983 (1.0)  34.9475 (1.0)  35.4316 (1.0)  34.9475 (1.0)  35.9157 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x128-bert-base-uncased]
Name                                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-------------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x128-bert-base-uncased]  1.7377 (1.0)     1.7382 (1.0)   1.7367 (1.0)  1.7439 (1.0)  2.8661 (1.0)  2.8713 (1.0)  2.8001 (1.0)  3.0735 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x128-t5-small]
Name                                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x128-t5-small]  2.2856 (1.0)     2.3026 (1.0)   2.2385 (1.0)  2.4525 (1.0)  3.3419 (1.0)  3.3722 (1.0)  3.2856 (1.0)  3.7258 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x16-bert-base-uncased]
Name                                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
------------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x16-bert-base-uncased]  1.0721 (1.0)     1.1009 (1.0)   1.0435 (1.0)  1.2104 (1.0)  2.2577 (1.0)  2.2935 (1.0)  2.2438 (1.0)  3.1464 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x16-t5-small]
Name                                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean         Min          Max
---------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  -----------  -----------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x16-t5-small]  2.6696 (1.0)     2.9134 (1.0)   2.5313 (1.0)  4.7872 (1.0)  3.5662 (1.0)  3.599 (1.0)  3.511 (1.0)  4.0105 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x256-bert-base-uncased]
Name                                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min          Max
-------------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  -----------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x256-bert-base-uncased]  2.1914 (1.0)     2.1913 (1.0)   2.1862 (1.0)  2.1944 (1.0)  3.3345 (1.0)  3.3454 (1.0)  3.314 (1.0)  3.6159 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x256-t5-small]
Name                                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x256-t5-small]  2.6757 (1.0)     2.6665 (1.0)   2.3532 (1.0)  2.686 (1.0)   3.7422 (1.0)  3.7593 (1.0)  3.7164 (1.0)  4.1641 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x32-bert-base-uncased]
Name                                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
------------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x32-bert-base-uncased]  1.0383 (1.0)     1.0588 (1.0)   0.9912 (1.0)  2.0306 (1.0)  2.1762 (1.0)  2.2159 (1.0)  2.1542 (1.0)  2.5686 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x32-t5-small]
Name                                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
---------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x32-t5-small]  2.2866 (1.0)     2.317 (1.0)    1.8944 (1.0)  3.244 (1.0)   3.3785 (1.0)  3.4247 (1.0)  3.3294 (1.0)  3.7703 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x33-bert-base-uncased]
Name                                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
------------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x33-bert-base-uncased]  6.1532 (1.0)     6.2748 (1.0)   6.1471 (1.0)  7.7066 (1.0)  6.8899 (1.0)  6.8997 (1.0)  6.8489 (1.0)  7.1304 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x33-t5-small]
Name                                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean         Min           Max
---------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  -----------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x33-t5-small]  4.4739 (1.0)     4.3039 (1.0)   4.0028 (1.0)  4.482 (1.0)   5.6681 (1.0)  5.707 (1.0)  5.6222 (1.0)  6.0533 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x384-bert-base-uncased]
Name                                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-------------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x384-bert-base-uncased]  2.432 (1.0)      2.432 (1.0)    2.4289 (1.0)  2.4381 (1.0)  3.4334 (1.0)  3.4609 (1.0)  3.4121 (1.0)  3.6898 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x384-t5-small]
Name                                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x384-t5-small]  3.6014 (1.0)     3.6018 (1.0)   3.5983 (1.0)  3.6055 (1.0)  4.2139 (1.0)  4.2406 (1.0)  4.1907 (1.0)  4.5774 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x512-bert-base-uncased]
Name                                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-------------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x512-bert-base-uncased]  3.5748 (1.0)     3.4817 (1.0)   3.2236 (1.0)  3.582 (1.0)   4.3998 (1.0)  4.4295 (1.0)  4.3905 (1.0)  4.8134 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x512-t5-small]
Name                                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean         Min           Max
----------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  -----------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-1x512-t5-small]  4.2578 (1.0)     4.2581 (1.0)   4.2537 (1.0)  4.2639 (1.0)  4.6908 (1.0)  4.715 (1.0)  4.6599 (1.0)  4.9391 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x128-bert-base-uncased]
Name                                                                                          Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
--------------------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x128-bert-base-uncased]  15.4163 (1.0)    15.3936 (1.0)  15.2689 (1.0)  15.4276 (1.0)  16.5065 (1.0)  16.3632 (1.0)  15.5062 (1.0)  17.2677 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x128-t5-small]
Name                                                                                 Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-----------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x128-t5-small]  14.4179 (1.0)    15.3946 (1.0)  14.4036 (1.0)  19.9342 (1.0)  14.2325 (1.0)  14.2204 (1.0)  13.9542 (1.0)  14.4176 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x16-bert-base-uncased]
Name                                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-------------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x16-bert-base-uncased]  4.9203 (1.0)     4.9208 (1.0)   4.9101 (1.0)  4.9326 (1.0)  5.7266 (1.0)  5.7554 (1.0)  5.7071 (1.0)  6.0406 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x16-t5-small]
Name                                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x16-t5-small]  6.1071 (1.0)     6.1075 (1.0)   6.0969 (1.0)  6.1174 (1.0)  7.2146 (1.0)  7.3302 (1.0)  7.1186 (1.0)  8.3696 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x256-bert-base-uncased]
Name                                                                                          Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)    Median         Mean           Min            Max
--------------------------------------------------------------------------------------------  ---------------  -------------  -------------  ------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x256-bert-base-uncased]  30.1025 (1.0)    30.0363 (1.0)  29.8824 (1.0)  30.124 (1.0)  30.6866 (1.0)  30.8022 (1.0)  29.4177 (1.0)  32.3024 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x256-t5-small]
Name                                                                                 Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min           Max
-----------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  ------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x256-t5-small]  28.1324 (1.0)    28.4962 (1.0)  28.1098 (1.0)  29.2465 (1.0)  28.6987 (1.0)  28.6258 (1.0)  28.163 (1.0)  29.0158 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x32-bert-base-uncased]
Name                                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
-------------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x32-bert-base-uncased]  7.1731 (1.0)     7.1731 (1.0)   7.1629 (1.0)  7.1793 (1.0)  7.7877 (1.0)  7.7967 (1.0)  7.6979 (1.0)  7.9585 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x32-t5-small]
Name                                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)     Median        Mean          Min           Max
----------------------------------------------------------------------------------  ---------------  -------------  ------------  -------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x32-t5-small]  7.6626 (1.0)     8.2078 (1.0)   7.6544 (1.0)  13.3048 (1.0)  7.9129 (1.0)  8.0205 (1.0)  7.9027 (1.0)  8.3362 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x33-bert-base-uncased]
Name                                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x33-bert-base-uncased]  59.3889 (1.0)    59.3889 (1.0)  59.3889 (1.0)  59.3889 (1.0)  60.9336 (1.0)  60.9336 (1.0)  60.9336 (1.0)  60.9336 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x33-t5-small]
Name                                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
----------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-32x33-t5-small]  28.6727 (1.0)    28.5285 (1.0)  26.7868 (1.0)  30.1261 (1.0)  26.8908 (1.0)  26.8007 (1.0)  26.6011 (1.0)  26.9103 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x128-bert-base-uncased]
Name                                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean         Min           Max
-------------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  -----------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x128-bert-base-uncased]  5.5378 (1.0)     5.4114 (1.0)   5.1446 (1.0)  5.545 (1.0)   6.2453 (1.0)  6.263 (1.0)  6.1443 (1.0)  6.4142 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x128-t5-small]
Name                                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
----------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x128-t5-small]  4.3592 (1.0)     4.3595 (1.0)   4.3551 (1.0)  4.3653 (1.0)  4.7888 (1.0)  4.8116 (1.0)  4.7808 (1.0)  5.0903 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x16-bert-base-uncased]
Name                                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
------------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x16-bert-base-uncased]  2.2252 (1.0)     2.2249 (1.0)   2.22 (1.0)    2.2303 (1.0)  3.2889 (1.0)  3.3122 (1.0)  3.2477 (1.0)  3.5779 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x16-t5-small]
Name                                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
---------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x16-t5-small]  2.3644 (1.0)     2.3642 (1.0)   2.3572 (1.0)  2.3757 (1.0)  3.7685 (1.0)  3.7983 (1.0)  3.7559 (1.0)  4.2363 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x256-bert-base-uncased]
Name                                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)     Median         Mean           Min           Max
-------------------------------------------------------------------------------------------  ---------------  -------------  ------------  -------------  -------------  -------------  ------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x256-bert-base-uncased]  9.1689 (1.0)     9.3082 (1.0)   9.1566 (1.0)  10.0731 (1.0)  10.8451 (1.0)  10.9171 (1.0)  9.9227 (1.0)  11.7464 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x256-t5-small]
Name                                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median       Mean          Min           Max
----------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  -----------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x256-t5-small]  9.0317 (1.0)     9.0329 (1.0)   9.0245 (1.0)  9.0388 (1.0)  9.113 (1.0)  9.1303 (1.0)  9.0586 (1.0)  9.3384 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x32-bert-base-uncased]
Name                                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
------------------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x32-bert-base-uncased]  2.5866 (1.0)     2.5975 (1.0)   2.5795 (1.0)  2.9286 (1.0)  3.6102 (1.0)  3.6366 (1.0)  3.5959 (1.0)  3.9617 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x32-t5-small]
Name                                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
---------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x32-t5-small]  2.8273 (1.0)     3.0447 (1.0)   2.8201 (1.0)  3.6239 (1.0)  3.9768 (1.0)  3.9883 (1.0)  3.8425 (1.0)  4.4401 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x33-bert-base-uncased]
Name                                                                                        Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median       Mean           Min            Max
------------------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -----------  -------------  -------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x33-bert-base-uncased]  20.2636 (1.0)    20.6755 (1.0)  20.2199 (1.0)  21.5624 (1.0)  19.52 (1.0)  19.6186 (1.0)  19.5095 (1.0)  19.859 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x33-t5-small]
Name                                                                               Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median        Mean          Min           Max
---------------------------------------------------------------------------------  ---------------  -------------  ------------  ------------  ------------  ------------  ------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x33-t5-small]  7.7619 (1.0)     7.7634 (1.0)   7.7445 (1.0)  7.7763 (1.0)  8.5961 (1.0)  8.6115 (1.0)  8.5324 (1.0)  8.9139 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x384-bert-base-uncased]
Name                                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  ------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x384-bert-base-uncased]  12.417 (1.0)     12.4671 (1.0)  12.4037 (1.0)  12.7693 (1.0)  13.5907 (1.0)  13.3986 (1.0)  12.7419 (1.0)  14.141 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x384-t5-small]
Name                                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
----------------------------------------------------------------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x384-t5-small]  13.9284 (1.0)    13.9299 (1.0)  13.9213 (1.0)  13.9448 (1.0)  13.8078 (1.0)  13.7539 (1.0)  13.5392 (1.0)  13.8933 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x512-bert-base-uncased]
Name                                                                                         Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)     Median         Mean           Min            Max
-------------------------------------------------------------------------------------------  ---------------  -------------  ------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x512-bert-base-uncased]  16.6021 (1.0)    16.606 (1.0)   16.597 (1.0)  16.6164 (1.0)  17.5534 (1.0)  17.2489 (1.0)  16.5861 (1.0)  17.6671 (1.0)

test/test_torchdynamo.py::test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x512-t5-small]
Name                                                                                Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)     Median         Mean           Min            Max
----------------------------------------------------------------------------------  ---------------  -------------  ------------  -------------  -------------  -------------  -------------  -------------
test_benchmark_implementations[dynamo_optimizer_cuda_graphs_causal-8x512-t5-small]  19.755 (1.0)     20.3052 (1.0)  19.754 (1.0)  21.9208 (1.0)  19.5504 (1.0)  20.1294 (1.0)  19.4296 (1.0)  21.1606 (1.0)

test/test_torchdynamo.py::test_whisper_hf[optimized-1]
Name                          Median (CUDA)    Mean (CUDA)    Min (CUDA)    Max (CUDA)    Median         Mean           Min            Max
----------------------------  ---------------  -------------  ------------  ------------  -------------  -------------  -------------  -------------
test_whisper_hf[optimized-1]  78.62 (1.0)      78.62 (1.0)    78.62 (1.0)   78.62 (1.0)   79.5812 (1.0)  79.5812 (1.0)  79.5812 (1.0)  79.5812 (1.0)

test/test_torchdynamo.py::test_whisper_hf[optimized-5]
Name                          Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
----------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_whisper_hf[optimized-5]  92.3655 (1.0)    92.3655 (1.0)  92.3655 (1.0)  92.3655 (1.0)  99.6807 (1.0)  99.6807 (1.0)  99.6807 (1.0)  99.6807 (1.0)

test/test_torchdynamo.py::test_whisper_hf[reference-1]
Name                          Median (CUDA)    Mean (CUDA)    Min (CUDA)     Max (CUDA)     Median         Mean           Min            Max
----------------------------  ---------------  -------------  -------------  -------------  -------------  -------------  -------------  -------------
test_whisper_hf[reference-1]  96.3776 (1.0)    96.3776 (1.0)  96.3776 (1.0)  96.3776 (1.0)  98.0347 (1.0)  98.0347 (1.0)  98.0347 (1.0)  98.0347 (1.0)

test/test_torchdynamo.py::test_whisper_hf[reference-5]
Name                          Median (CUDA)    Mean (CUDA)     Min (CUDA)      Max (CUDA)      Median         Mean           Min            Max
----------------------------  ---------------  --------------  --------------  --------------  -------------  -------------  -------------  -------------
test_whisper_hf[reference-5]  121.0347 (1.0)   121.0347 (1.0)  121.0347 (1.0)  121.0347 (1.0)  139.759 (1.0)  139.759 (1.0)  139.759 (1.0)  139.759 (1.0)


====================================================================================================== warnings summary ======================================================================================================
test/test_torchdynamo.py: 79 warnings
  /home/geantvert/.local/share/virtualenvs/kernl/lib/python3.9/site-packages/torch/cuda/graphs.py:79: UserWarning: The CUDA Graph is empty. This ususally means that the graph was attempted to be captured on wrong device or stream. (Triggered internally at ../aten/src/ATen/cuda/CUDAGraph.cpp:191.)
    super().capture_end()

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

    /mnt/workspace/kernl  on   feat/triton_2 ······················································································ took 1h 56m 50s   kernl   2.07   21%   50,7G  ╱ 0,B   at 23:45:17  ─╮
❯                                                                                                                                                                                                                          ─╯


requirements.txt Show resolved Hide resolved
src/kernl/implementations/attention.py Show resolved Hide resolved
src/kernl/implementations/attention.py Show resolved Hide resolved
test/test_attention.py Show resolved Hide resolved
test/test_attention.py Show resolved Hide resolved
@pommedeterresautee pommedeterresautee merged commit 481a7b5 into main Mar 15, 2023
@pommedeterresautee pommedeterresautee deleted the feat/triton_2 branch March 15, 2023 20:12
@pommedeterresautee pommedeterresautee linked an issue Mar 16, 2023 that may be closed by this pull request
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: Triton 2.0 makes attention kernel crash
2 participants