Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TorchInductor CPU Performance Dashboard #93531

Open
blzheng opened this issue Oct 13, 2022 · 598 comments
Open

TorchInductor CPU Performance Dashboard #93531

blzheng opened this issue Oct 13, 2022 · 598 comments
Assignees
Labels
oncall: cpu inductor CPU Inductor issues for Intel team to triage oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@blzheng
Copy link
Collaborator

blzheng commented Oct 13, 2022

Dashboard to track the performance of torchinductor on CPU.

cc @ezyang @msaroufim @wconstab @bdhirsh @anijain2305 @zou3519 @soumith @ngimel @chauhang

@blzheng blzheng changed the title [CPU] TorchInductor Performance Dashboard TorchInductor CPU Performance Dashboard Oct 13, 2022
@blzheng
Copy link
Collaborator Author

blzheng commented Oct 13, 2022

Performance Dashboard for float32 precision -- Single-Socket Multi-threads

Executive Summary

see more We evaluate torchinductor across three benchmark suites - torchbench, huggingface and timm. We run these experiments on ICX 8375C. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 84%, 43/51 | 98%, 43/44  | 87%, 53/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.03x    |    1.08x    |    1.03x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    2.23    |    4.47     |    3.75     |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    0.0x    |    0.0x     |    0.0x     |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|           squeezenet1_1           |  16  |  1.4738  |
|             resnet18              |  8   |  1.1306  |
|              alexnet              | 128  |  1.1131  |
|          vision_maskrcnn          |  1   |   1.1    |
|         soft_actor_critic         | 256  |  1.0935  |
|          resnext50_32x4d          |  8   |  1.0873  |
|        shufflenet_v2_x1_0         |  64  |  1.0814  |
|             resnet50              |  32  |  1.0604  |
|        mobilenet_v3_large         |  32  |  1.053   |
|           timm_resnest            |  32  |  1.0515  |
|           mobilenet_v2            |  16  |  1.0505  |
|            densenet121            |  64  |  1.0445  |
|            mnasnet1_0             |  32  |  1.0392  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0286  |
|               vgg16               |  4   |  1.0269  |
|            Super_SloMo            |  6   |  1.0227  |
|           pytorch_unet            |  1   |  1.019   |
|               dcgan               | 256  |  1.0135  |
|            timm_vovnet            |  32  |  1.0131  |
|          LearningToPaint          |  96  |  1.0075  |
|       functorch_dp_cifar10        |  64  |  1.0023  |
|      resnet50_quantized_qat       |  32  |  0.9979  |
|    mobilenet_v2_quantized_qat     |  96  |  0.9979  |
|              yolov3               |  8   |  0.995   |
|     detectron2_fcos_r_50_fpn      |  1   |  0.9871  |
|           hf_Longformer           |  1   |  0.9642  |
|            timm_regnet            |  32  |  0.963   |
|        Background_Matting         |  1   |  0.9402  |
|                drq                |  1   |  0.9262  |
|               dlrm                | 2048 |  0.9218  |
|            timm_nfnet             | 128  |  0.8948  |
|           hf_DistilBert           |  1   |  0.8699  |
|               hf_T5               |  1   |  0.8467  |
|             hf_Albert             |  1   |  0.8211  |
|              hf_Bert              |  1   |  0.8087  |
|           fastNLP_Bert            |  1   |  0.7941  |
|              hf_Bart              |  1   |  0.7719  |
|      nvidia_deeprecommender       | 256  |  0.7583  |
|              hf_GPT2              |  1   |  0.7083  |
|         timm_efficientnet         |  64  |  0.6984  |
|      timm_vision_transformer      |  8   |  0.6756  |
| attention_is_all_you_need_pytorch |  32  |  0.5568  |
|           BERT_pytorch            |  2   |  0.4027  |
|           lennard_jones           | 1000 |  0.3608  |
|             tacotron2             |  0   |   0.0    |
|        speech_transformer         |  0   |   0.0    |
|            hf_BigBird             |  0   |   0.0    |
|          pytorch_stargan          |  0   |   0.0    |
|            tts_angular            |  0   |   0.0    |
|              demucs               |  0   |   0.0    |
|            hf_Reformer            |  0   |   0.0    |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+-----+-------------+
|               name                | bs  |  inductor   |
+-----------------------------------+-----+-------------+
|           BERT_pytorch            |  2  |    pass     |
|          resnext50_32x4d          |  2  |    pass     |
|        Background_Matting         |  1  |    pass     |
|    mobilenet_v2_quantized_qat     |  2  |    pass     |
|        mobilenet_v3_large         |  2  |    pass     |
|      nvidia_deeprecommender       |  2  |    pass     |
|   pytorch_CycleGAN_and_pix2pix    |  1  |    pass     |
|           pytorch_unet            |  2  |    pass     |
|             resnet18              |  2  |    pass     |
|             resnet50              |  2  |    pass     |
|      resnet50_quantized_qat       |  2  |    pass     |
|        shufflenet_v2_x1_0         |  2  |    pass     |
|           lennard_jones           |  2  |    pass     |
|         soft_actor_critic         | 256 |    pass     |
|           squeezenet1_1           |  2  |    pass     |
|         timm_efficientnet         |  2  |    pass     |
|            timm_nfnet             |  2  |    pass     |
|            timm_regnet            |  2  |    pass     |
|           timm_resnest            |  2  |    pass     |
|      timm_vision_transformer      |  2  |    pass     |
|            timm_vovnet            |  2  |    pass     |
|               vgg16               |  2  |    pass     |
|            mnasnet1_0             |  2  |    pass     |
|           mobilenet_v2            |  2  |    pass     |
|               hf_T5               |  2  |    pass     |
|           fastNLP_Bert            |  2  |    pass     |
|          LearningToPaint          |  2  |    pass     |
|            Super_SloMo            |  2  |    pass     |
|              alexnet              |  2  |    pass     |
| attention_is_all_you_need_pytorch |  2  |    pass     |
|               dcgan               |  2  |    pass     |
|            densenet121            |  2  |    pass     |
|     detectron2_fcos_r_50_fpn      |  2  |    pass     |
|                drq                |  1  |    pass     |
|               dlrm                |  2  |    pass     |
|       functorch_dp_cifar10        |  2  |    pass     |
|             hf_Albert             |  2  |    pass     |
|              hf_Bart              |  2  |    pass     |
|              hf_Bert              |  2  |    pass     |
|           hf_DistilBert           |  2  |    pass     |
|              hf_GPT2              |  2  |    pass     |
|           hf_Longformer           |  2  |    pass     |
|              yolov3               |  2  |    pass     |
|            hf_Reformer            |  2  | fail_to_run |
|        speech_transformer         |  2  | fail_to_run |
|              demucs               |  1  | fail_to_run |
|          pytorch_stargan          | 16  | fail_to_run |
|            hf_BigBird             |  2  | fail_to_run |
|            tts_angular            |  2  | fail_to_run |
|             tacotron2             |  0  |   0.0000    |
|          vision_maskrcnn          |  0  |   0.0000    |
+-----------------------------------+-----+-------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|          vision_maskrcnn          |  1   | 11.5683  |
|           hf_Longformer           |  1   |  9.2219  |
|     detectron2_fcos_r_50_fpn      |  1   |  9.1967  |
|              yolov3               |  8   |  6.8279  |
|            timm_nfnet             | 128  |  5.4197  |
|            densenet121            |  64  |  4.9745  |
|            Super_SloMo            |  6   |  4.4554  |
| attention_is_all_you_need_pytorch |  32  |  3.6369  |
|              hf_Bart              |  1   |  3.5927  |
|           fastNLP_Bert            |  1   |  3.4844  |
|           BERT_pytorch            |  2   |  3.4361  |
|               hf_T5               |  1   |  3.388   |
|              hf_Bert              |  1   |  2.9421  |
|              hf_GPT2              |  1   |  2.7631  |
|            timm_regnet            |  32  |  2.6082  |
|      timm_vision_transformer      |  8   |  2.5672  |
|             hf_Albert             |  1   |  2.408   |
|         timm_efficientnet         |  64  |  2.0037  |
|        Background_Matting         |  1   |  1.8045  |
|        shufflenet_v2_x1_0         |  64  |  1.5502  |
|            timm_vovnet            |  32  |  1.5498  |
|       functorch_dp_cifar10        |  64  |  1.4493  |
|        mobilenet_v3_large         |  32  |  1.4454  |
|           timm_resnest            |  32  |  1.2875  |
|             resnet50              |  32  |  1.271   |
|          resnext50_32x4d          |  8   |  1.2652  |
|           mobilenet_v2            |  16  |  1.1627  |
|           hf_DistilBert           |  1   |  1.1205  |
|            mnasnet1_0             |  32  |  1.1095  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.7235  |
|           pytorch_unet            |  1   |  0.6441  |
|               dlrm                | 2048 |  0.6379  |
|           squeezenet1_1           |  16  |  0.6199  |
|          LearningToPaint          |  96  |  0.5728  |
|             resnet18              |  8   |  0.505   |
|               vgg16               |  4   |  0.4815  |
|              alexnet              | 128  |  0.3835  |
|                drq                |  1   |  0.3087  |
|      nvidia_deeprecommender       | 256  |  0.2971  |
|           lennard_jones           | 1000 |  0.1949  |
|         soft_actor_critic         | 256  |  0.1944  |
|    mobilenet_v2_quantized_qat     |  96  |  0.1375  |
|               dcgan               | 256  |  0.1053  |
|      resnet50_quantized_qat       |  32  | -0.0596  |
|              demucs               |  0   |   nan    |
|            hf_BigBird             |  0   |   nan    |
|            hf_Reformer            |  0   |   nan    |
|          pytorch_stargan          |  0   |   nan    |
|        speech_transformer         |  0   |   nan    |
|             tacotron2             |  0   |   nan    |
|            tts_angular            |  0   |   nan    |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|           BERT_pytorch            |  2   |   nan    |
|        Background_Matting         |  1   |   nan    |
|          LearningToPaint          |  96  |   nan    |
|            Super_SloMo            |  6   |   nan    |
|              alexnet              | 128  |   nan    |
| attention_is_all_you_need_pytorch |  32  |   nan    |
|               dcgan               | 256  |   nan    |
|              demucs               |  0   |   nan    |
|            densenet121            |  64  |   nan    |
|     detectron2_fcos_r_50_fpn      |  1   |   nan    |
|               dlrm                | 2048 |   nan    |
|                drq                |  1   |   nan    |
|           fastNLP_Bert            |  1   |   nan    |
|       functorch_dp_cifar10        |  64  |   nan    |
|             hf_Albert             |  1   |   nan    |
|              hf_Bart              |  1   |   nan    |
|              hf_Bert              |  1   |   nan    |
|            hf_BigBird             |  0   |   nan    |
|           hf_DistilBert           |  1   |   nan    |
|              hf_GPT2              |  1   |   nan    |
|           hf_Longformer           |  1   |   nan    |
|            hf_Reformer            |  0   |   nan    |
|               hf_T5               |  1   |   nan    |
|           lennard_jones           | 1000 |   nan    |
|            mnasnet1_0             |  32  |   nan    |
|           mobilenet_v2            |  16  |   nan    |
|    mobilenet_v2_quantized_qat     |  96  |   nan    |
|        mobilenet_v3_large         |  32  |   nan    |
|      nvidia_deeprecommender       | 256  |   nan    |
|   pytorch_CycleGAN_and_pix2pix    |  1   |   nan    |
|          pytorch_stargan          |  0   |   nan    |
|           pytorch_unet            |  1   |   nan    |
|             resnet18              |  8   |   nan    |
|             resnet50              |  32  |   nan    |
|      resnet50_quantized_qat       |  32  |   nan    |
|          resnext50_32x4d          |  8   |   nan    |
|        shufflenet_v2_x1_0         |  64  |   nan    |
|         soft_actor_critic         | 256  |   nan    |
|        speech_transformer         |  0   |   nan    |
|           squeezenet1_1           |  16  |   nan    |
|             tacotron2             |  0   |   nan    |
|         timm_efficientnet         |  64  |   nan    |
|            timm_nfnet             | 128  |   nan    |
|            timm_regnet            |  32  |   nan    |
|           timm_resnest            |  32  |   nan    |
|      timm_vision_transformer      |  8   |   nan    |
|            timm_vovnet            |  32  |   nan    |
|            tts_angular            |  0   |   nan    |
|               vgg16               |  4   |   nan    |
|          vision_maskrcnn          |  1   |   nan    |
|              yolov3               |  8   |   nan    |
+-----------------------------------+------+----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            XLNetLMHeadModel             | 4  |  6.6101  |
|               GoogleFnet                | 1  |  1.2581  |
|     M2M100ForConditionalGeneration      | 2  |  1.2183  |
|             OPTForCausalLM              | 4  |  1.2159  |
|             XGLMForCausalLM             | 1  |  1.1916  |
|          MobileBertForMaskedLM          | 16 |  1.1915  |
|               DistillGPT2               | 1  |  1.1535  |
|           ElectraForCausalLM            | 1  |  1.1114  |
|          AllenaiLongformerBase          | 1  |  1.0992  |
|            YituTechConvBert             | 1  |  1.0688  |
|           RobertaForCausalLM            | 4  |  1.0421  |
|       MT5ForConditionalGeneration       | 2  |  1.0358  |
|     MobileBertForQuestionAnswering      | 32 |  1.0298  |
|         MegatronBertForCausalLM         | 2  |  1.0162  |
|           DebertaForMaskedLM            | 4  |  1.006   |
|            AlbertForMaskedLM            | 2  |  1.0018  |
|       AlbertForQuestionAnswering        | 2  |  0.9888  |
|                CamemBert                | 1  |  0.9838  |
|       DebertaForQuestionAnswering       | 4  |  0.9677  |
|            TrOCRForCausalLM             | 8  |  0.9655  |
|           PegasusForCausalLM            | 8  |  0.9586  |
|            PLBartForCausalLM            | 16 |  0.9538  |
|          DistilBertForMaskedLM          | 16 |  0.9396  |
|     PLBartForConditionalGeneration      | 8  |  0.9342  |
|     PegasusForConditionalGeneration     | 4  |  0.9337  |
|            MBartForCausalLM             | 16 |  0.908   |
|      MBartForConditionalGeneration      | 8  |  0.8992  |
|         Speech2Text2ForCausalLM         | 64 |  0.8708  |
|        BertForQuestionAnswering         | 64 |  0.8658  |
|       RobertaForQuestionAnswering       | 64 |  0.8653  |
|             BertForMaskedLM             | 64 |  0.8638  |
|     DistilBertForQuestionAnswering      | 32 |  0.8587  |
|    MegatronBertForQuestionAnswering     | 8  |  0.8549  |
|                 T5Small                 | 1  |   0.83   |
|    LayoutLMForSequenceClassification    | 16 |  0.8279  |
|           LayoutLMForMaskedLM           | 16 |  0.8273  |
| BlenderbotSmallForConditionalGeneration | 32 |  0.7867  |
|       BlenderbotSmallForCausalLM        | 64 |  0.7799  |
|             BartForCausalLM             | 2  |  0.7795  |
|       ElectraForQuestionAnswering       | 64 |  0.7742  |
|      GPT2ForSequenceClassification      | 4  |  0.7712  |
|       T5ForConditionalGeneration        | 4  |  0.7252  |
|      BartForConditionalGeneration       | 1  |  0.7104  |
|                 BigBird                 | 0  |   0.0    |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+-------------+
|                  name                   | bs |  inductor   |
+-----------------------------------------+----+-------------+
|            AlbertForMaskedLM            | 1  |    pass     |
|       AlbertForQuestionAnswering        | 1  |    pass     |
|      MBartForConditionalGeneration      | 1  |    pass     |
|       MT5ForConditionalGeneration       | 1  |    pass     |
|         MegatronBertForCausalLM         | 1  |    pass     |
|    MegatronBertForQuestionAnswering     | 1  |    pass     |
|          MobileBertForMaskedLM          | 1  |    pass     |
|     MobileBertForQuestionAnswering      | 1  |    pass     |
|             OPTForCausalLM              | 1  |    pass     |
|            PLBartForCausalLM            | 1  |    pass     |
|     PLBartForConditionalGeneration      | 1  |    pass     |
|           PegasusForCausalLM            | 1  |    pass     |
|     PegasusForConditionalGeneration     | 1  |    pass     |
|           RobertaForCausalLM            | 1  |    pass     |
|       RobertaForQuestionAnswering       | 1  |    pass     |
|         Speech2Text2ForCausalLM         | 1  |    pass     |
|       T5ForConditionalGeneration        | 1  |    pass     |
|                 T5Small                 | 1  |    pass     |
|            TrOCRForCausalLM             | 1  |    pass     |
|             XGLMForCausalLM             | 1  |    pass     |
|            XLNetLMHeadModel             | 1  |    pass     |
|            MBartForCausalLM             | 1  |    pass     |
|     M2M100ForConditionalGeneration      | 1  |    pass     |
|    LayoutLMForSequenceClassification    | 1  |    pass     |
|           LayoutLMForMaskedLM           | 1  |    pass     |
|          AllenaiLongformerBase          | 1  |    pass     |
|             BartForCausalLM             | 1  |    pass     |
|      BartForConditionalGeneration       | 1  |    pass     |
|             BertForMaskedLM             | 1  |    pass     |
|        BertForQuestionAnswering         | 1  |    pass     |
|       BlenderbotSmallForCausalLM        | 1  |    pass     |
| BlenderbotSmallForConditionalGeneration | 1  |    pass     |
|                CamemBert                | 1  |    pass     |
|           DebertaForMaskedLM            | 1  |    pass     |
|       DebertaForQuestionAnswering       | 1  |    pass     |
|          DistilBertForMaskedLM          | 1  |    pass     |
|     DistilBertForQuestionAnswering      | 1  |    pass     |
|               DistillGPT2               | 1  |    pass     |
|           ElectraForCausalLM            | 1  |    pass     |
|       ElectraForQuestionAnswering       | 1  |    pass     |
|      GPT2ForSequenceClassification      | 1  |    pass     |
|               GoogleFnet                | 1  |    pass     |
|            YituTechConvBert             | 1  |    pass     |
|                 BigBird                 | 1  | fail_to_run |
+-----------------------------------------+----+-------------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 16 | 10.4985  |
|     MobileBertForQuestionAnswering      | 32 | 10.4742  |
|          AllenaiLongformerBase          | 1  | 10.1465  |
|      BartForConditionalGeneration       | 1  |  8.0126  |
|      MBartForConditionalGeneration      | 8  |  7.766   |
|           DebertaForMaskedLM            | 4  |  7.4744  |
|       DebertaForQuestionAnswering       | 4  |  7.3231  |
|     PegasusForConditionalGeneration     | 4  |  7.2929  |
|    MegatronBertForQuestionAnswering     | 8  |  6.6289  |
|         MegatronBertForCausalLM         | 2  |  6.5296  |
|     M2M100ForConditionalGeneration      | 2  |  6.517   |
|             XGLMForCausalLM             | 1  |  5.7003  |
| BlenderbotSmallForConditionalGeneration | 32 |  5.2596  |
|            YituTechConvBert             | 1  |  4.5927  |
|       T5ForConditionalGeneration        | 4  |  4.5008  |
|           LayoutLMForMaskedLM           | 16 |  4.074   |
|       ElectraForQuestionAnswering       | 64 |  4.0555  |
|    LayoutLMForSequenceClassification    | 16 |  3.867   |
|     PLBartForConditionalGeneration      | 8  |  3.8398  |
|             BertForMaskedLM             | 64 |  3.6523  |
|       RobertaForQuestionAnswering       | 64 |  3.5445  |
|                 T5Small                 | 1  |  3.5072  |
|             BartForCausalLM             | 2  |  3.3817  |
|        BertForQuestionAnswering         | 64 |  3.3744  |
|           RobertaForCausalLM            | 4  |  3.3594  |
|      GPT2ForSequenceClassification      | 4  |  3.3011  |
|       MT5ForConditionalGeneration       | 2  |  3.277   |
|            MBartForCausalLM             | 16 |  3.1166  |
|           PegasusForCausalLM            | 8  |  3.0688  |
|                CamemBert                | 1  |  3.0637  |
|           ElectraForCausalLM            | 1  |  3.0246  |
|            TrOCRForCausalLM             | 8  |  2.9159  |
|             OPTForCausalLM              | 4  |  2.8233  |
|       AlbertForQuestionAnswering        | 2  |  2.7299  |
|            AlbertForMaskedLM            | 2  |  2.654   |
|       BlenderbotSmallForCausalLM        | 64 |  2.312   |
|               GoogleFnet                | 1  |  1.8488  |
|          DistilBertForMaskedLM          | 16 |  1.725   |
|     DistilBertForQuestionAnswering      | 32 |  1.6647  |
|            PLBartForCausalLM            | 16 |  1.6417  |
|         Speech2Text2ForCausalLM         | 64 |  1.5846  |
|               DistillGPT2               | 1  |  1.4312  |
|            XLNetLMHeadModel             | 4  | -17.1104 |
|                 BigBird                 | 0  |   nan    |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 2  |   nan    |
|       AlbertForQuestionAnswering        | 2  |   nan    |
|          AllenaiLongformerBase          | 1  |   nan    |
|             BartForCausalLM             | 2  |   nan    |
|      BartForConditionalGeneration       | 1  |   nan    |
|             BertForMaskedLM             | 64 |   nan    |
|        BertForQuestionAnswering         | 64 |   nan    |
|                 BigBird                 | 0  |   nan    |
|       BlenderbotSmallForCausalLM        | 64 |   nan    |
| BlenderbotSmallForConditionalGeneration | 32 |   nan    |
|                CamemBert                | 1  |   nan    |
|           DebertaForMaskedLM            | 4  |   nan    |
|       DebertaForQuestionAnswering       | 4  |   nan    |
|          DistilBertForMaskedLM          | 16 |   nan    |
|     DistilBertForQuestionAnswering      | 32 |   nan    |
|               DistillGPT2               | 1  |   nan    |
|           ElectraForCausalLM            | 1  |   nan    |
|       ElectraForQuestionAnswering       | 64 |   nan    |
|      GPT2ForSequenceClassification      | 4  |   nan    |
|               GoogleFnet                | 1  |   nan    |
|           LayoutLMForMaskedLM           | 16 |   nan    |
|    LayoutLMForSequenceClassification    | 16 |   nan    |
|     M2M100ForConditionalGeneration      | 2  |   nan    |
|            MBartForCausalLM             | 16 |   nan    |
|      MBartForConditionalGeneration      | 8  |   nan    |
|       MT5ForConditionalGeneration       | 2  |   nan    |
|         MegatronBertForCausalLM         | 2  |   nan    |
|    MegatronBertForQuestionAnswering     | 8  |   nan    |
|          MobileBertForMaskedLM          | 16 |   nan    |
|     MobileBertForQuestionAnswering      | 32 |   nan    |
|             OPTForCausalLM              | 4  |   nan    |
|            PLBartForCausalLM            | 16 |   nan    |
|     PLBartForConditionalGeneration      | 8  |   nan    |
|           PegasusForCausalLM            | 8  |   nan    |
|     PegasusForConditionalGeneration     | 4  |   nan    |
|           RobertaForCausalLM            | 4  |   nan    |
|       RobertaForQuestionAnswering       | 64 |   nan    |
|         Speech2Text2ForCausalLM         | 64 |   nan    |
|       T5ForConditionalGeneration        | 4  |   nan    |
|                 T5Small                 | 1  |   nan    |
|            TrOCRForCausalLM             | 8  |   nan    |
|             XGLMForCausalLM             | 1  |   nan    |
|            XLNetLMHeadModel             | 4  |   nan    |
|            YituTechConvBert             | 1  |   nan    |
+-----------------------------------------+----+----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  |  1.4097  |
|          inception_v3           | 128 |  1.1948  |
|       gluon_inception_v3        | 128 |  1.1843  |
|        adv_inception_v3         | 128 |  1.1781  |
|        res2net101_26w_4s        | 64  |  1.0904  |
|             dpn107              | 32  |  1.0665  |
|             dla102              | 64  |  1.0544  |
|            repvgg_a2            | 128 |  1.0423  |
|            hrnet_w18            |  2  |  1.0366  |
|        res2net50_14w_8s         |  2  |  1.0365  |
|            gernet_l             | 128 |  1.036   |
|            lcnet_050            | 128 |  1.0349  |
|        ese_vovnet19b_dw         | 128 |  1.0294  |
|      mobilenetv3_large_100      | 128 |  1.0176  |
|         mobilenetv2_100         | 128 |  1.0172  |
|           fbnetc_100            | 128 |  1.0149  |
|           resnest101e           | 32  |  1.0148  |
|          gmixer_24_224          | 64  |  1.0147  |
|          cspdarknet53           | 64  |  1.0142  |
|           mnasnet_100           | 128 |  1.0138  |
|          ghostnet_100           | 128 |  1.0114  |
|          spnasnet_100           | 128 |  1.0066  |
|     swsl_resnext101_32x16d      | 32  |  1.0063  |
|        convmixer_768_32         | 32  |  1.0046  |
|            fbnetv3_b            | 128 |  1.0024  |
|        gluon_xception65         | 32  |  1.0008  |
|           selecsls42b           | 128 |  0.9956  |
|           regnety_002           | 128 |  0.9546  |
|           res2next50            |  2  |  0.9453  |
|         poolformer_m36          | 64  |  0.9104  |
|            nfnet_l0             | 64  |  0.9095  |
|           dm_nfnet_f0           | 128 |  0.9038  |
|      xcit_large_24_p8_224       |  5  |  0.8902  |
|           tf_mixnet_l           | 64  |  0.8771  |
|           volo_d1_224           | 64  |  0.8742  |
|  swin_base_patch4_window7_224   | 64  |  0.8543  |
|      beit_base_patch16_224      | 64  |  0.8432  |
|            mixnet_l             | 64  |  0.8371  |
|           rexnet_100            | 128 |  0.8289  |
|          mixer_b16_224          | 64  |  0.8269  |
|          cait_m36_384           |  2  |  0.8253  |
|          resmlp_12_224          | 128 |  0.8249  |
| deit_base_distilled_patch16_224 | 64  |  0.8189  |
|           convit_base           | 32  |  0.8189  |
|      vit_base_patch16_224       | 64  |  0.8177  |
|         visformer_small         | 128 |  0.8131  |
|        tnt_s_patch16_224        | 64  |  0.7996  |
|          convnext_base          | 32  |  0.7934  |
|         coat_lite_mini          | 128 |  0.7662  |
|       tf_efficientnet_b0        | 128 |  0.7497  |
|           mobilevit_s           | 32  |  0.7402  |
|          jx_nest_base           | 32  |  0.732   |
|        twins_pcpvt_base         | 32  |  0.7223  |
|            pit_b_224            | 64  |  0.7186  |
|            tinynet_a            | 128 |  0.701   |
|         crossvit_9_240          | 64  |  0.6647  |
|          gmlp_s16_224           | 64  |  0.6182  |
|          botnet26t_256          |  0  |   0.0    |
|        eca_halonext26ts         |  0  |   0.0    |
|       eca_botnext26ts_256       |  0  |   0.0    |
|        sebotnet33ts_256         |  0  |   0.0    |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|           resnest101e           | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|         visformer_small         | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|           convit_base           | 2  |     pass      |
|        convmixer_768_32         | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|            hrnet_w18            | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|        eca_halonext26ts         | 2  |  fail_to_run  |
|        sebotnet33ts_256         | 2  |  fail_to_run  |
|          botnet26t_256          | 2  |  fail_to_run  |
|       eca_botnext26ts_256       | 2  |  fail_to_run  |
|  swin_base_patch4_window7_224   | 2  | fail_accuracy |
|          ghostnet_100           | 2  | fail_accuracy |
|             dpn107              | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|  swin_base_patch4_window7_224   | 64  | 10.7024  |
|          cait_m36_384           |  2  | 10.2615  |
|      xcit_large_24_p8_224       |  5  |  9.9294  |
|         poolformer_m36          | 64  |  8.9775  |
|          jx_nest_base           | 32  |  8.5136  |
|            hrnet_w18            |  2  |  8.3372  |
|        twins_pcpvt_base         | 32  |  8.1557  |
|          pnasnet5large          | 16  |  6.6776  |
|        tnt_s_patch16_224        | 64  |  6.528   |
|          gmlp_s16_224           | 64  |  6.4682  |
|           tf_mixnet_l           | 64  |  6.137   |
|           volo_d1_224           | 64  |  6.1065  |
|           mobilevit_s           | 32  |  6.0721  |
|           dm_nfnet_f0           | 128 |  5.7608  |
|            mixnet_l             | 64  |  5.7029  |
|         crossvit_9_240          | 64  |  4.9915  |
|          convnext_base          | 32  |  4.9459  |
|        res2net101_26w_4s        | 64  |  4.7408  |
|            pit_b_224            | 64  |  4.6434  |
|             dpn107              | 32  |  4.4485  |
|         coat_lite_mini          | 128 |  4.308   |
|        res2net50_14w_8s         |  2  |  4.2332  |
|            fbnetv3_b            | 128 |  3.9204  |
|           convit_base           | 32  |  3.9035  |
|      beit_base_patch16_224      | 64  |  3.7148  |
|          gmixer_24_224          | 64  |  3.0949  |
| deit_base_distilled_patch16_224 | 64  |  3.0859  |
|         visformer_small         | 128 |  3.0531  |
|            nfnet_l0             | 64  |  2.9973  |
|            tinynet_a            | 128 |  2.9481  |
|      vit_base_patch16_224       | 64  |  2.888   |
|       tf_efficientnet_b0        | 128 |  2.8524  |
|           rexnet_100            | 128 |  2.8057  |
|          mixer_b16_224          | 64  |  2.6488  |
|           res2next50            |  2  |  2.4974  |
|          cspdarknet53           | 64  |  2.362   |
|          ghostnet_100           | 128 |  2.1989  |
|       gluon_inception_v3        | 128 |  2.1866  |
|        gluon_xception65         | 32  |  2.0955  |
|             dla102              | 64  |   2.09   |
|           fbnetc_100            | 128 |  1.8692  |
|          spnasnet_100           | 128 |  1.8467  |
|           regnety_002           | 128 |  1.8316  |
|          resmlp_12_224          | 128 |  1.8239  |
|        convmixer_768_32         | 32  |  1.8029  |
|            gernet_l             | 128 |  1.718   |
|        adv_inception_v3         | 128 |  1.704   |
|          inception_v3           | 128 |  1.6711  |
|      mobilenetv3_large_100      | 128 |  1.6126  |
|            repvgg_a2            | 128 |  1.5339  |
|           mnasnet_100           | 128 |  1.5139  |
|        ese_vovnet19b_dw         | 128 |  1.3457  |
|         mobilenetv2_100         | 128 |  1.2205  |
|     swsl_resnext101_32x16d      | 32  |  1.1662  |
|           selecsls42b           | 128 |  0.9562  |
|            lcnet_050            | 128 |  0.9299  |
|           resnest101e           | 32  | -1.6045  |
|          botnet26t_256          |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        adv_inception_v3         | 128 |   nan    |
|      beit_base_patch16_224      | 64  |   nan    |
|          botnet26t_256          |  0  |   nan    |
|          cait_m36_384           |  2  |   nan    |
|         coat_lite_mini          | 128 |   nan    |
|           convit_base           | 32  |   nan    |
|        convmixer_768_32         | 32  |   nan    |
|          convnext_base          | 32  |   nan    |
|         crossvit_9_240          | 64  |   nan    |
|          cspdarknet53           | 64  |   nan    |
| deit_base_distilled_patch16_224 | 64  |   nan    |
|             dla102              | 64  |   nan    |
|           dm_nfnet_f0           | 128 |   nan    |
|             dpn107              | 32  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        ese_vovnet19b_dw         | 128 |   nan    |
|           fbnetc_100            | 128 |   nan    |
|            fbnetv3_b            | 128 |   nan    |
|            gernet_l             | 128 |   nan    |
|          ghostnet_100           | 128 |   nan    |
|       gluon_inception_v3        | 128 |   nan    |
|        gluon_xception65         | 32  |   nan    |
|          gmixer_24_224          | 64  |   nan    |
|          gmlp_s16_224           | 64  |   nan    |
|            hrnet_w18            |  2  |   nan    |
|          inception_v3           | 128 |   nan    |
|          jx_nest_base           | 32  |   nan    |
|            lcnet_050            | 128 |   nan    |
|          mixer_b16_224          | 64  |   nan    |
|            mixnet_l             | 64  |   nan    |
|           mnasnet_100           | 128 |   nan    |
|         mobilenetv2_100         | 128 |   nan    |
|      mobilenetv3_large_100      | 128 |   nan    |
|           mobilevit_s           | 32  |   nan    |
|            nfnet_l0             | 64  |   nan    |
|            pit_b_224            | 64  |   nan    |
|          pnasnet5large          | 16  |   nan    |
|         poolformer_m36          | 64  |   nan    |
|           regnety_002           | 128 |   nan    |
|            repvgg_a2            | 128 |   nan    |
|        res2net101_26w_4s        | 64  |   nan    |
|        res2net50_14w_8s         |  2  |   nan    |
|           res2next50            |  2  |   nan    |
|          resmlp_12_224          | 128 |   nan    |
|           resnest101e           | 32  |   nan    |
|           rexnet_100            | 128 |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
|           selecsls42b           | 128 |   nan    |
|          spnasnet_100           | 128 |   nan    |
|  swin_base_patch4_window7_224   | 64  |   nan    |
|     swsl_resnext101_32x16d      | 32  |   nan    |
|       tf_efficientnet_b0        | 128 |   nan    |
|           tf_mixnet_l           | 64  |   nan    |
|            tinynet_a            | 128 |   nan    |
|        tnt_s_patch16_224        | 64  |   nan    |
|        twins_pcpvt_base         | 32  |   nan    |
|         visformer_small         | 128 |   nan    |
|      vit_base_patch16_224       | 64  |   nan    |
|           volo_d1_224           | 64  |   nan    |
|      xcit_large_24_p8_224       |  5  |   nan    |
+---------------------------------+-----+----------+

@blzheng
Copy link
Collaborator Author

blzheng commented Oct 13, 2022

Performance Dashboard for float32 precision -- Single-core Single-thread

Executive Summary

see more We evaluate torchinductor across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 87%, 47/54 | 93%, 41/44  | 87%, 53/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.05x    |    1.00x    |    1.04x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   17.67    |    21.57    |    37.83    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    0.0x    |    0.0x     |    0.0x     |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|           squeezenet1_1           |  16  |  2.0426  |
|       functorch_dp_cifar10        |  64  |  1.3211  |
|        shufflenet_v2_x1_0         |  64  |  1.2811  |
|           timm_resnest            |  32  |  1.2005  |
|              alexnet              | 128  |  1.1775  |
|             resnet18              |  8   |  1.1458  |
|          vision_maskrcnn          |  1   |  1.0896  |
|               vgg16               |  4   |  1.0833  |
|            densenet121            |  64  |  1.0796  |
|             resnet50              |  32  |  1.0712  |
|          resnext50_32x4d          |  8   |  1.0589  |
|            timm_vovnet            |  32  |  1.0577  |
|            Super_SloMo            |  6   |  1.053   |
|                drq                |  1   |  1.0514  |
|              yolov3               |  8   |  1.0338  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0336  |
|         soft_actor_critic         | 256  |  1.0149  |
|          LearningToPaint          |  96  |  1.0088  |
|            timm_regnet            |  32  |  1.004   |
|           pytorch_unet            |  1   |  1.0029  |
| attention_is_all_you_need_pytorch |  32  |  1.0016  |
|               dlrm                | 2048 |  1.0014  |
|      resnet50_quantized_qat       |  32  |  1.0002  |
|    mobilenet_v2_quantized_qat     |  96  |   1.0    |
|               dcgan               | 256  |  0.9906  |
|        mobilenet_v3_large         |  32  |  0.9839  |
|          pytorch_stargan          |  16  |  0.9812  |
|           mobilenet_v2            |  16  |  0.9542  |
|            mnasnet1_0             |  32  |  0.9527  |
|        Background_Matting         |  1   |  0.951   |
|   timm_vision_transformer_large   |  8   |  0.7949  |
|           BERT_pytorch            |  2   |  0.7655  |
|           hf_Longformer           |  1   |  0.7648  |
|              hf_GPT2              |  1   |  0.7537  |
|            timm_nfnet             | 128  |  0.7182  |
|      nvidia_deeprecommender       | 256  |  0.6899  |
|           hf_DistilBert           |  1   |  0.6658  |
|             hf_Albert             |  1   |  0.6541  |
|            hf_T5_large            |  1   |  0.6501  |
|              hf_Bert              |  1   |  0.6301  |
|           fastNLP_Bert            |  1   |  0.5986  |
|              hf_Bart              |  1   |  0.5983  |
|           hf_GPT2_large           |  1   |  0.5679  |
|      timm_vision_transformer      |  8   |  0.5591  |
|               hf_T5               |  1   |  0.5566  |
|            hf_T5_base             |  1   |  0.4761  |
|         timm_efficientnet         |  64  |  0.4639  |
|           lennard_jones           | 1000 |  0.284   |
|            hf_BigBird             |  0   |   0.0    |
|            hf_Reformer            |  0   |   0.0    |
|            tts_angular            |  0   |   0.0    |
|     detectron2_fcos_r_50_fpn      |  0   |   0.0    |
|              demucs               |  0   |   0.0    |
|        speech_transformer         |  0   |   0.0    |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+-----+-------------+
|               name                | bs  |  inductor   |
+-----------------------------------+-----+-------------+
|           BERT_pytorch            |  2  |    pass     |
|          resnext50_32x4d          |  2  |    pass     |
|           mobilenet_v2            |  2  |    pass     |
|    mobilenet_v2_quantized_qat     |  2  |    pass     |
|        mobilenet_v3_large         |  2  |    pass     |
|      nvidia_deeprecommender       |  2  |    pass     |
|   pytorch_CycleGAN_and_pix2pix    |  1  |    pass     |
|          pytorch_stargan          | 16  |    pass     |
|           pytorch_unet            |  2  |    pass     |
|             resnet18              |  2  |    pass     |
|             resnet50              |  2  |    pass     |
|      resnet50_quantized_qat       |  2  |    pass     |
|        shufflenet_v2_x1_0         |  2  |    pass     |
|           lennard_jones           |  2  |    pass     |
|         soft_actor_critic         | 256 |    pass     |
|           squeezenet1_1           |  2  |    pass     |
|         timm_efficientnet         |  2  |    pass     |
|            timm_nfnet             |  2  |    pass     |
|            timm_regnet            |  2  |    pass     |
|           timm_resnest            |  2  |    pass     |
|      timm_vision_transformer      |  2  |    pass     |
|   timm_vision_transformer_large   |  2  |    pass     |
|            timm_vovnet            |  2  |    pass     |
|               vgg16               |  2  |    pass     |
|        Background_Matting         |  1  |    pass     |
|            mnasnet1_0             |  2  |    pass     |
|            hf_T5_large            |  2  |    pass     |
|       functorch_dp_cifar10        |  2  |    pass     |
|          LearningToPaint          |  2  |    pass     |
|            Super_SloMo            |  2  |    pass     |
|              alexnet              |  2  |    pass     |
| attention_is_all_you_need_pytorch |  2  |    pass     |
|               dcgan               |  2  |    pass     |
|            densenet121            |  2  |    pass     |
|               dlrm                |  2  |    pass     |
|            hf_T5_base             |  2  |    pass     |
|           fastNLP_Bert            |  2  |    pass     |
|                drq                |  1  |    pass     |
|             hf_Albert             |  2  |    pass     |
|              hf_Bart              |  2  |    pass     |
|              hf_Bert              |  2  |    pass     |
|           hf_DistilBert           |  2  |    pass     |
|              hf_GPT2              |  2  |    pass     |
|           hf_GPT2_large           |  2  |    pass     |
|           hf_Longformer           |  2  |    pass     |
|               hf_T5               |  2  |    pass     |
|              yolov3               |  2  |    pass     |
|            hf_BigBird             |  2  | fail_to_run |
|     detectron2_fcos_r_50_fpn      |  2  | fail_to_run |
|              demucs               |  1  | fail_to_run |
|            hf_Reformer            |  2  | fail_to_run |
|            tts_angular            |  2  | fail_to_run |
|        speech_transformer         |  0  |   0.0000    |
|          vision_maskrcnn          |  0  |   0.0000    |
+-----------------------------------+-----+-------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|            hf_T5_base             |  1   | 110.8172 |
|            timm_nfnet             | 128  | 89.3988  |
|            densenet121            |  64  | 83.6673  |
|           hf_GPT2_large           |  1   | 68.8979  |
|   timm_vision_transformer_large   |  8   | 54.2422  |
|            hf_T5_large            |  1   | 49.4562  |
|         timm_efficientnet         |  64  | 39.8036  |
|          vision_maskrcnn          |  1   | 39.3907  |
|            Super_SloMo            |  6   | 34.6477  |
|        Background_Matting         |  1   | 26.0698  |
|              yolov3               |  8   | 25.8651  |
|           pytorch_unet            |  1   | 20.5692  |
|        mobilenet_v3_large         |  32  |  18.621  |
|            timm_regnet            |  32  | 16.8904  |
|      timm_vision_transformer      |  8   | 13.7887  |
|           hf_Longformer           |  1   | 12.8186  |
|        shufflenet_v2_x1_0         |  64  | 11.4008  |
|              hf_Bart              |  1   | 10.0706  |
|           mobilenet_v2            |  16  |  9.776   |
|            timm_vovnet            |  32  |  9.7671  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  9.2664  |
|          LearningToPaint          |  96  |  9.0346  |
|          pytorch_stargan          |  16  |  8.9297  |
|               hf_T5               |  1   |  8.3078  |
|           fastNLP_Bert            |  1   |  7.3867  |
|           timm_resnest            |  32  |  7.3225  |
|               vgg16               |  4   |  7.1152  |
|             resnet50              |  32  |  6.4957  |
|            mnasnet1_0             |  32  |  6.1132  |
| attention_is_all_you_need_pytorch |  32  |  5.6499  |
|           squeezenet1_1           |  16  |  5.4609  |
|             hf_Albert             |  1   |   5.4    |
|              hf_Bert              |  1   |  5.222   |
|              hf_GPT2              |  1   |  4.8789  |
|              alexnet              | 128  |  4.5239  |
|           BERT_pytorch            |  2   |  4.098   |
|           hf_DistilBert           |  1   |  3.285   |
|             resnet18              |  8   |  2.9366  |
|               dcgan               | 256  |  2.7928  |
|      nvidia_deeprecommender       | 256  |  2.6756  |
|       functorch_dp_cifar10        |  64  |  2.4487  |
|          resnext50_32x4d          |  8   |  1.7993  |
|               dlrm                | 2048 |  1.6011  |
|                drq                |  1   |  0.4278  |
|         soft_actor_critic         | 256  |  0.2726  |
|           lennard_jones           | 1000 |  0.2698  |
|    mobilenet_v2_quantized_qat     |  96  |  0.1607  |
|      resnet50_quantized_qat       |  32  |  0.1281  |
|              demucs               |  0   |   nan    |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |
|            hf_BigBird             |  0   |   nan    |
|            hf_Reformer            |  0   |   nan    |
|        speech_transformer         |  0   |   nan    |
|            tts_angular            |  0   |   nan    |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|           BERT_pytorch            |  2   |   nan    |
|        Background_Matting         |  1   |   nan    |
|          LearningToPaint          |  96  |   nan    |
|            Super_SloMo            |  6   |   nan    |
|              alexnet              | 128  |   nan    |
| attention_is_all_you_need_pytorch |  32  |   nan    |
|               dcgan               | 256  |   nan    |
|              demucs               |  0   |   nan    |
|            densenet121            |  64  |   nan    |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |
|               dlrm                | 2048 |   nan    |
|                drq                |  1   |   nan    |
|           fastNLP_Bert            |  1   |   nan    |
|       functorch_dp_cifar10        |  64  |   nan    |
|             hf_Albert             |  1   |   nan    |
|              hf_Bart              |  1   |   nan    |
|              hf_Bert              |  1   |   nan    |
|            hf_BigBird             |  0   |   nan    |
|           hf_DistilBert           |  1   |   nan    |
|              hf_GPT2              |  1   |   nan    |
|           hf_GPT2_large           |  1   |   nan    |
|           hf_Longformer           |  1   |   nan    |
|            hf_Reformer            |  0   |   nan    |
|               hf_T5               |  1   |   nan    |
|            hf_T5_base             |  1   |   nan    |
|            hf_T5_large            |  1   |   nan    |
|           lennard_jones           | 1000 |   nan    |
|            mnasnet1_0             |  32  |   nan    |
|           mobilenet_v2            |  16  |   nan    |
|    mobilenet_v2_quantized_qat     |  96  |   nan    |
|        mobilenet_v3_large         |  32  |   nan    |
|      nvidia_deeprecommender       | 256  |   nan    |
|   pytorch_CycleGAN_and_pix2pix    |  1   |   nan    |
|          pytorch_stargan          |  16  |   nan    |
|           pytorch_unet            |  1   |   nan    |
|             resnet18              |  8   |   nan    |
|             resnet50              |  32  |   nan    |
|      resnet50_quantized_qat       |  32  |   nan    |
|          resnext50_32x4d          |  8   |   nan    |
|        shufflenet_v2_x1_0         |  64  |   nan    |
|         soft_actor_critic         | 256  |   nan    |
|        speech_transformer         |  0   |   nan    |
|           squeezenet1_1           |  16  |   nan    |
|         timm_efficientnet         |  64  |   nan    |
|            timm_nfnet             | 128  |   nan    |
|            timm_regnet            |  32  |   nan    |
|           timm_resnest            |  32  |   nan    |
|      timm_vision_transformer      |  8   |   nan    |
|   timm_vision_transformer_large   |  8   |   nan    |
|            timm_vovnet            |  32  |   nan    |
|            tts_angular            |  0   |   nan    |
|               vgg16               |  4   |   nan    |
|          vision_maskrcnn          |  1   |   nan    |
|              yolov3               |  8   |   nan    |
+-----------------------------------+------+----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            XLNetLMHeadModel             | 4  |  0.9522  |
|       AlbertForQuestionAnswering        | 2  |  0.8971  |
|            AlbertForMaskedLM            | 2  |  0.8924  |
|       DebertaForQuestionAnswering       | 4  |  0.8817  |
|     MobileBertForQuestionAnswering      | 32 |  0.8509  |
|     M2M100ForConditionalGeneration      | 2  |  0.8381  |
|               GoogleFnet                | 1  |  0.8183  |
|           DebertaForMaskedLM            | 4  |  0.8167  |
|             OPTForCausalLM              | 4  |  0.8164  |
|            YituTechConvBert             | 1  |  0.7934  |
|             XGLMForCausalLM             | 1  |  0.7806  |
|    MegatronBertForQuestionAnswering     | 8  |  0.7733  |
|     PegasusForConditionalGeneration     | 4  |  0.7671  |
|      MBartForConditionalGeneration      | 8  |   0.76   |
|            TrOCRForCausalLM             | 8  |  0.7563  |
|           PegasusForCausalLM            | 8  |  0.7555  |
|            MBartForCausalLM             | 16 |  0.7508  |
|          MobileBertForMaskedLM          | 16 |  0.7499  |
|          AllenaiLongformerBase          | 1  |  0.7458  |
|       RobertaForQuestionAnswering       | 64 |  0.739   |
|     DistilBertForQuestionAnswering      | 32 |  0.7389  |
|        BertForQuestionAnswering         | 64 |  0.7387  |
|           RobertaForCausalLM            | 4  |  0.7317  |
|     PLBartForConditionalGeneration      | 8  |  0.7185  |
|             BertForMaskedLM             | 64 |  0.7183  |
|          DistilBertForMaskedLM          | 16 |  0.7094  |
|            PLBartForCausalLM            | 16 |  0.6941  |
|         Speech2Text2ForCausalLM         | 64 |  0.6843  |
|    LayoutLMForSequenceClassification    | 16 |  0.6161  |
|               DistillGPT2               | 1  |  0.6031  |
|                CamemBert                | 1  |  0.602   |
|           LayoutLMForMaskedLM           | 16 |  0.5932  |
| BlenderbotSmallForConditionalGeneration | 32 |  0.5907  |
|       BlenderbotSmallForCausalLM        | 64 |  0.5839  |
|             BartForCausalLM             | 2  |  0.5709  |
|      BartForConditionalGeneration       | 1  |  0.5091  |
|      GPT2ForSequenceClassification      | 4  |  0.493   |
|       ElectraForQuestionAnswering       | 64 |   0.47   |
|       T5ForConditionalGeneration        | 4  |  0.4668  |
|                 T5Small                 | 1  |  0.4642  |
|           ElectraForCausalLM            | 1  |   0.43   |
|         MegatronBertForCausalLM         | 0  |   0.0    |
|                 BigBird                 | 0  |   0.0    |
|       MT5ForConditionalGeneration       | 0  |   0.0    |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+-------------+
|                  name                   | bs |  inductor   |
+-----------------------------------------+----+-------------+
|            AlbertForMaskedLM            | 1  |    pass     |
|       AlbertForQuestionAnswering        | 1  |    pass     |
|      MBartForConditionalGeneration      | 1  |    pass     |
|       MT5ForConditionalGeneration       | 1  |    pass     |
|         MegatronBertForCausalLM         | 1  |    pass     |
|    MegatronBertForQuestionAnswering     | 1  |    pass     |
|          MobileBertForMaskedLM          | 1  |    pass     |
|     MobileBertForQuestionAnswering      | 1  |    pass     |
|             OPTForCausalLM              | 1  |    pass     |
|            PLBartForCausalLM            | 1  |    pass     |
|     PLBartForConditionalGeneration      | 1  |    pass     |
|           PegasusForCausalLM            | 1  |    pass     |
|     PegasusForConditionalGeneration     | 1  |    pass     |
|           RobertaForCausalLM            | 1  |    pass     |
|       RobertaForQuestionAnswering       | 1  |    pass     |
|         Speech2Text2ForCausalLM         | 1  |    pass     |
|       T5ForConditionalGeneration        | 1  |    pass     |
|                 T5Small                 | 1  |    pass     |
|            TrOCRForCausalLM             | 1  |    pass     |
|             XGLMForCausalLM             | 1  |    pass     |
|            XLNetLMHeadModel             | 1  |    pass     |
|            MBartForCausalLM             | 1  |    pass     |
|     M2M100ForConditionalGeneration      | 1  |    pass     |
|    LayoutLMForSequenceClassification    | 1  |    pass     |
|           LayoutLMForMaskedLM           | 1  |    pass     |
|          AllenaiLongformerBase          | 1  |    pass     |
|             BartForCausalLM             | 1  |    pass     |
|      BartForConditionalGeneration       | 1  |    pass     |
|             BertForMaskedLM             | 1  |    pass     |
|        BertForQuestionAnswering         | 1  |    pass     |
|       BlenderbotSmallForCausalLM        | 1  |    pass     |
| BlenderbotSmallForConditionalGeneration | 1  |    pass     |
|                CamemBert                | 1  |    pass     |
|           DebertaForMaskedLM            | 1  |    pass     |
|       DebertaForQuestionAnswering       | 1  |    pass     |
|          DistilBertForMaskedLM          | 1  |    pass     |
|     DistilBertForQuestionAnswering      | 1  |    pass     |
|               DistillGPT2               | 1  |    pass     |
|           ElectraForCausalLM            | 1  |    pass     |
|       ElectraForQuestionAnswering       | 1  |    pass     |
|      GPT2ForSequenceClassification      | 1  |    pass     |
|               GoogleFnet                | 1  |    pass     |
|            YituTechConvBert             | 1  |    pass     |
|                 BigBird                 | 1  | fail_to_run |
+-----------------------------------------+----+-------------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|       T5ForConditionalGeneration        | 4  | 55.5113  |
|       ElectraForQuestionAnswering       | 64 | 53.9419  |
|      GPT2ForSequenceClassification      | 4  |  51.708  |
|      BartForConditionalGeneration       | 1  | 48.6713  |
|           LayoutLMForMaskedLM           | 16 | 47.4153  |
|    LayoutLMForSequenceClassification    | 16 | 36.5231  |
|             BartForCausalLM             | 2  | 34.1458  |
|             BertForMaskedLM             | 64 | 33.9874  |
| BlenderbotSmallForConditionalGeneration | 32 | 30.9125  |
|       BlenderbotSmallForCausalLM        | 64 | 30.2863  |
|            AlbertForMaskedLM            | 2  | 26.5338  |
|           DebertaForMaskedLM            | 4  | 26.4973  |
|     PegasusForConditionalGeneration     | 4  | 26.3414  |
|                 T5Small                 | 1  | 25.6895  |
|      MBartForConditionalGeneration      | 8  | 24.7751  |
|            XLNetLMHeadModel             | 4  | 24.4838  |
|          AllenaiLongformerBase          | 1  | 22.8838  |
|     MobileBertForQuestionAnswering      | 32 |  20.116  |
|       AlbertForQuestionAnswering        | 2  | 19.5679  |
|       RobertaForQuestionAnswering       | 64 | 19.4393  |
|    MegatronBertForQuestionAnswering     | 8  | 18.6799  |
|        BertForQuestionAnswering         | 64 | 18.6026  |
|            MBartForCausalLM             | 16 |  18.417  |
|          MobileBertForMaskedLM          | 16 | 16.5618  |
|       DebertaForQuestionAnswering       | 4  | 15.1709  |
|     PLBartForConditionalGeneration      | 8  | 13.9216  |
|         Speech2Text2ForCausalLM         | 64 | 12.1095  |
|           PegasusForCausalLM            | 8  | 11.5466  |
|          DistilBertForMaskedLM          | 16 | 11.5357  |
|     DistilBertForQuestionAnswering      | 32 | 11.0201  |
|     M2M100ForConditionalGeneration      | 2  | 10.8537  |
|            PLBartForCausalLM            | 16 | 10.2818  |
|            YituTechConvBert             | 1  |  8.115   |
|             XGLMForCausalLM             | 1  |  8.0807  |
|            TrOCRForCausalLM             | 8  |  7.8229  |
|                CamemBert                | 1  |  6.4793  |
|           RobertaForCausalLM            | 4  |  6.2543  |
|           ElectraForCausalLM            | 1  |  5.9684  |
|             OPTForCausalLM              | 4  |  5.4036  |
|               DistillGPT2               | 1  |  5.024   |
|               GoogleFnet                | 1  |  3.2717  |
|                 BigBird                 | 0  |   nan    |
|       MT5ForConditionalGeneration       | 0  |   nan    |
|         MegatronBertForCausalLM         | 0  |   nan    |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 2  |   nan    |
|       AlbertForQuestionAnswering        | 2  |   nan    |
|          AllenaiLongformerBase          | 1  |   nan    |
|             BartForCausalLM             | 2  |   nan    |
|      BartForConditionalGeneration       | 1  |   nan    |
|             BertForMaskedLM             | 64 |   nan    |
|        BertForQuestionAnswering         | 64 |   nan    |
|                 BigBird                 | 0  |   nan    |
|       BlenderbotSmallForCausalLM        | 64 |   nan    |
| BlenderbotSmallForConditionalGeneration | 32 |   nan    |
|                CamemBert                | 1  |   nan    |
|           DebertaForMaskedLM            | 4  |   nan    |
|       DebertaForQuestionAnswering       | 4  |   nan    |
|          DistilBertForMaskedLM          | 16 |   nan    |
|     DistilBertForQuestionAnswering      | 32 |   nan    |
|               DistillGPT2               | 1  |   nan    |
|           ElectraForCausalLM            | 1  |   nan    |
|       ElectraForQuestionAnswering       | 64 |   nan    |
|      GPT2ForSequenceClassification      | 4  |   nan    |
|               GoogleFnet                | 1  |   nan    |
|           LayoutLMForMaskedLM           | 16 |   nan    |
|    LayoutLMForSequenceClassification    | 16 |   nan    |
|     M2M100ForConditionalGeneration      | 2  |   nan    |
|            MBartForCausalLM             | 16 |   nan    |
|      MBartForConditionalGeneration      | 8  |   nan    |
|       MT5ForConditionalGeneration       | 0  |   nan    |
|         MegatronBertForCausalLM         | 0  |   nan    |
|    MegatronBertForQuestionAnswering     | 8  |   nan    |
|          MobileBertForMaskedLM          | 16 |   nan    |
|     MobileBertForQuestionAnswering      | 32 |   nan    |
|             OPTForCausalLM              | 4  |   nan    |
|            PLBartForCausalLM            | 16 |   nan    |
|     PLBartForConditionalGeneration      | 8  |   nan    |
|           PegasusForCausalLM            | 8  |   nan    |
|     PegasusForConditionalGeneration     | 4  |   nan    |
|           RobertaForCausalLM            | 4  |   nan    |
|       RobertaForQuestionAnswering       | 64 |   nan    |
|         Speech2Text2ForCausalLM         | 64 |   nan    |
|       T5ForConditionalGeneration        | 4  |   nan    |
|                 T5Small                 | 1  |   nan    |
|            TrOCRForCausalLM             | 8  |   nan    |
|             XGLMForCausalLM             | 1  |   nan    |
|            XLNetLMHeadModel             | 4  |   nan    |
|            YituTechConvBert             | 1  |   nan    |
+-----------------------------------------+----+----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  |  1.664   |
|          inception_v3           | 128 |  1.3505  |
|       gluon_inception_v3        | 128 |  1.3323  |
|        adv_inception_v3         | 128 |  1.3165  |
|           res2next50            |  2  |  1.1219  |
|        res2net50_14w_8s         |  2  |  1.0979  |
|        ese_vovnet19b_dw         | 128 |  1.0866  |
|             dla102              | 64  |  1.0829  |
|           resnest101e           | 32  |  1.0724  |
|        res2net101_26w_4s        | 64  |  1.0501  |
|            hrnet_w18            |  2  |  1.0403  |
|             dpn107              | 32  |  1.0259  |
|            repvgg_a2            | 128 |  1.0171  |
|        convmixer_768_32         | 32  |  1.0058  |
|            gernet_l             | 128 |  0.998   |
|     swsl_resnext101_32x16d      | 32  |  0.9949  |
|          ghostnet_100           | 128 |  0.9917  |
|          cspdarknet53           | 64  |  0.9902  |
|           selecsls42b           | 128 |  0.9854  |
|        gluon_xception65         | 32  |  0.9847  |
|            lcnet_050            | 128 |  0.983   |
|           regnety_002           | 128 |  0.979   |
|            fbnetv3_b            | 128 |  0.9709  |
|      mobilenetv3_large_100      | 128 |  0.9617  |
|          spnasnet_100           | 128 |  0.9466  |
|           mnasnet_100           | 128 |  0.9465  |
|           fbnetc_100            | 128 |  0.9433  |
|         mobilenetv2_100         | 128 |  0.942   |
|          gmixer_24_224          | 64  |  0.8434  |
|      xcit_large_24_p8_224       |  5  |  0.835   |
|            nfnet_l0             | 64  |  0.7418  |
|         poolformer_m36          | 64  |  0.7288  |
|          cait_m36_384           |  2  |  0.7162  |
|           dm_nfnet_f0           | 128 |  0.6992  |
| deit_base_distilled_patch16_224 | 64  |  0.6954  |
|          mixer_b16_224          | 64  |  0.6919  |
|      beit_base_patch16_224      | 64  |  0.6911  |
|      vit_base_patch16_224       | 64  |  0.6855  |
|  swin_base_patch4_window7_224   | 64  |  0.6759  |
|           convit_base           | 32  |  0.6559  |
|          resmlp_12_224          | 128 |  0.6513  |
|           tf_mixnet_l           | 64  |  0.633   |
|            mixnet_l             | 64  |  0.6284  |
|           volo_d1_224           | 64  |  0.6235  |
|         visformer_small         | 128 |  0.6217  |
|          convnext_base          | 32  |  0.6019  |
|           rexnet_100            | 128 |  0.5803  |
|        tnt_s_patch16_224        | 64  |  0.561   |
|            pit_b_224            | 64  |  0.5376  |
|        twins_pcpvt_base         | 32  |  0.5318  |
|          jx_nest_base           | 32  |  0.5297  |
|         coat_lite_mini          | 128 |  0.5291  |
|         crossvit_9_240          | 64  |  0.4988  |
|       tf_efficientnet_b0        | 128 |  0.4821  |
|           mobilevit_s           | 32  |   0.47   |
|            tinynet_a            | 128 |  0.4677  |
|          gmlp_s16_224           | 64  |  0.419   |
|          botnet26t_256          |  0  |   0.0    |
|        eca_halonext26ts         |  0  |   0.0    |
|        sebotnet33ts_256         |  0  |   0.0    |
|       eca_botnext26ts_256       |  0  |   0.0    |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|           resnest101e           | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|         visformer_small         | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|           convit_base           | 2  |     pass      |
|        convmixer_768_32         | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|            hrnet_w18            | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|        eca_halonext26ts         | 2  |  fail_to_run  |
|        sebotnet33ts_256         | 2  |  fail_to_run  |
|          botnet26t_256          | 2  |  fail_to_run  |
|       eca_botnext26ts_256       | 2  |  fail_to_run  |
|  swin_base_patch4_window7_224   | 2  | fail_accuracy |
|          ghostnet_100           | 2  | fail_accuracy |
|             dpn107              | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+-----------+
|              name               | bs  | inductor  |
+---------------------------------+-----+-----------+
|           dm_nfnet_f0           | 128 | 120.4526  |
|            pit_b_224            | 64  |  93.2877  |
|         coat_lite_mini          | 128 |  86.3203  |
|  swin_base_patch4_window7_224   | 64  |   85.21   |
|           rexnet_100            | 128 |  78.2162  |
|          jx_nest_base           | 32  |  73.2018  |
|        twins_pcpvt_base         | 32  |  73.0605  |
|             dpn107              | 32  |  68.3539  |
|            nfnet_l0             | 64  |  59.3309  |
|           volo_d1_224           | 64  |   59.28   |
|          gmlp_s16_224           | 64  |  54.9854  |
|         poolformer_m36          | 64  |  54.1717  |
|        tnt_s_patch16_224        | 64  |  53.9854  |
|          pnasnet5large          | 16  |  53.3107  |
|       tf_efficientnet_b0        | 128 |  52.2643  |
|            tinynet_a            | 128 |  51.7527  |
|         visformer_small         | 128 |  49.5635  |
|          convnext_base          | 32  |  48.0006  |
|           mobilevit_s           | 32  |  47.8097  |
|           convit_base           | 32  |  47.6997  |
|            mixnet_l             | 64  |  47.2449  |
| deit_base_distilled_patch16_224 | 64  |  47.1471  |
|      vit_base_patch16_224       | 64  |  46.3161  |
|         crossvit_9_240          | 64  |  45.886   |
|      beit_base_patch16_224      | 64  |  43.2621  |
|          cait_m36_384           |  2  |  40.3293  |
|      xcit_large_24_p8_224       |  5  |  39.4825  |
|            fbnetv3_b            | 128 |  39.2445  |
|          mixer_b16_224          | 64  |  36.5714  |
|             dla102              | 64  |  34.2857  |
|          ghostnet_100           | 128 |  32.7922  |
|           tf_mixnet_l           | 64  |  29.7102  |
|        res2net101_26w_4s        | 64  |  29.1843  |
|          cspdarknet53           | 64  |  27.721   |
|          resmlp_12_224          | 128 |  24.0163  |
|        gluon_xception65         | 32  |  22.7874  |
|        ese_vovnet19b_dw         | 128 |  21.0097  |
|          gmixer_24_224          | 64  |  19.8152  |
|        adv_inception_v3         | 128 |  18.1021  |
|           fbnetc_100            | 128 |  15.4501  |
|           selecsls42b           | 128 |  14.6314  |
|      mobilenetv3_large_100      | 128 |  13.3493  |
|         mobilenetv2_100         | 128 |  13.0842  |
|           regnety_002           | 128 |  12.2577  |
|            gernet_l             | 128 |  11.3181  |
|            repvgg_a2            | 128 |  8.4116   |
|            lcnet_050            | 128 |  7.8323   |
|            hrnet_w18            |  2  |  7.7945   |
|        convmixer_768_32         | 32  |  5.9951   |
|        res2net50_14w_8s         |  2  |  5.1672   |
|           mnasnet_100           | 128 |  4.6516   |
|          spnasnet_100           | 128 |  3.0418   |
|           res2next50            |  2  |  2.0724   |
|       gluon_inception_v3        | 128 | -17.0452  |
|          inception_v3           | 128 | -18.3518  |
|     swsl_resnext101_32x16d      | 32  | -26.2001  |
|           resnest101e           | 32  | -132.8819 |
|          botnet26t_256          |  0  |    nan    |
|       eca_botnext26ts_256       |  0  |    nan    |
|        eca_halonext26ts         |  0  |    nan    |
|        sebotnet33ts_256         |  0  |    nan    |
+---------------------------------+-----+-----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        adv_inception_v3         | 128 |   nan    |
|      beit_base_patch16_224      | 64  |   nan    |
|          botnet26t_256          |  0  |   nan    |
|          cait_m36_384           |  2  |   nan    |
|         coat_lite_mini          | 128 |   nan    |
|           convit_base           | 32  |   nan    |
|        convmixer_768_32         | 32  |   nan    |
|          convnext_base          | 32  |   nan    |
|         crossvit_9_240          | 64  |   nan    |
|          cspdarknet53           | 64  |   nan    |
| deit_base_distilled_patch16_224 | 64  |   nan    |
|             dla102              | 64  |   nan    |
|           dm_nfnet_f0           | 128 |   nan    |
|             dpn107              | 32  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        ese_vovnet19b_dw         | 128 |   nan    |
|           fbnetc_100            | 128 |   nan    |
|            fbnetv3_b            | 128 |   nan    |
|            gernet_l             | 128 |   nan    |
|          ghostnet_100           | 128 |   nan    |
|       gluon_inception_v3        | 128 |   nan    |
|        gluon_xception65         | 32  |   nan    |
|          gmixer_24_224          | 64  |   nan    |
|          gmlp_s16_224           | 64  |   nan    |
|            hrnet_w18            |  2  |   nan    |
|          inception_v3           | 128 |   nan    |
|          jx_nest_base           | 32  |   nan    |
|            lcnet_050            | 128 |   nan    |
|          mixer_b16_224          | 64  |   nan    |
|            mixnet_l             | 64  |   nan    |
|           mnasnet_100           | 128 |   nan    |
|         mobilenetv2_100         | 128 |   nan    |
|      mobilenetv3_large_100      | 128 |   nan    |
|           mobilevit_s           | 32  |   nan    |
|            nfnet_l0             | 64  |   nan    |
|            pit_b_224            | 64  |   nan    |
|          pnasnet5large          | 16  |   nan    |
|         poolformer_m36          | 64  |   nan    |
|           regnety_002           | 128 |   nan    |
|            repvgg_a2            | 128 |   nan    |
|        res2net101_26w_4s        | 64  |   nan    |
|        res2net50_14w_8s         |  2  |   nan    |
|           res2next50            |  2  |   nan    |
|          resmlp_12_224          | 128 |   nan    |
|           resnest101e           | 32  |   nan    |
|           rexnet_100            | 128 |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
|           selecsls42b           | 128 |   nan    |
|          spnasnet_100           | 128 |   nan    |
|  swin_base_patch4_window7_224   | 64  |   nan    |
|     swsl_resnext101_32x16d      | 32  |   nan    |
|       tf_efficientnet_b0        | 128 |   nan    |
|           tf_mixnet_l           | 64  |   nan    |
|            tinynet_a            | 128 |   nan    |
|        tnt_s_patch16_224        | 64  |   nan    |
|        twins_pcpvt_base         | 32  |   nan    |
|         visformer_small         | 128 |   nan    |
|      vit_base_patch16_224       | 64  |   nan    |
|           volo_d1_224           | 64  |   nan    |
|      xcit_large_24_p8_224       |  5  |   nan    |
+---------------------------------+-----+----------+

@jansel
Copy link
Contributor

jansel commented Oct 13, 2022

This is inference? Or training?

@blzheng
Copy link
Collaborator Author

blzheng commented Oct 13, 2022

Inference.

@blzheng
Copy link
Collaborator Author

blzheng commented Oct 18, 2022

Performance Dashboard for float32 precision -- Single-Socket Multi-threads

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 93%, 51/55 | 100%, 44/44 | 90%, 55/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.03x    |    1.12x    |    1.03x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    3.70    |    5.17     |    5.74     |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.94x    |    0.98x    |    0.94x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|           timm_resnest            |  32  |  1.2382  |
|        shufflenet_v2_x1_0         |  64  |  1.2261  |
|            densenet121            |  64  |  1.134   |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.1287  |
|           squeezenet1_1           |  16  |  1.1201  |
|             resnet50              |  32  |  1.1148  |
|         soft_actor_critic         | 256  |  1.1113  |
|              alexnet              | 128  |  1.1105  |
|            mnasnet1_0             |  32  |  1.0631  |
|               dlrm                | 2048 |  1.046   |
|                drq                |  1   |  1.0436  |
|             resnet18              |  8   |  1.036   |
|            Super_SloMo            |  6   |  1.0339  |
|            timm_regnet            |  32  |  1.0339  |
|            hf_T5_large            |  1   |  1.027   |
|            hf_BigBird             |  1   |  1.0227  |
|            hf_Reformer            |  1   |  1.0173  |
|               vgg16               |  4   |  1.0146  |
|          pytorch_stargan          |  16  |  1.013   |
|      resnet50_quantized_qat       |  32  |  1.0099  |
|    mobilenet_v2_quantized_qat     |  96  |  1.0007  |
|              demucs               |  1   |  0.9942  |
|            tts_angular            |  64  |  0.9912  |
|               hf_T5               |  1   |  0.9896  |
|       functorch_dp_cifar10        |  64  |  0.9846  |
|               dcgan               | 256  |  0.9822  |
|            timm_vovnet            |  32  |  0.9784  |
|           BERT_pytorch            |  2   |  0.9668  |
|          LearningToPaint          |  96  |  0.9624  |
|        Background_Matting         |  1   |  0.9616  |
|        mobilenet_v3_large         |  32  |  0.9279  |
|           pytorch_unet            |  1   |  0.9242  |
|           hf_Longformer           |  1   |  0.9203  |
|              hf_GPT2              |  1   |  0.9074  |
|           hf_GPT2_large           |  1   |  0.8962  |
|        speech_transformer         |  1   |  0.8892  |
|             hf_Albert             |  1   |  0.8793  |
|   timm_vision_transformer_large   |  8   |  0.8567  |
|            hf_T5_base             |  1   |  0.837   |
|           hf_DistilBert           |  1   |  0.8303  |
|      nvidia_deeprecommender       | 256  |  0.8219  |
|              hf_Bert              |  1   |  0.7961  |
|              yolov3               |  8   |  0.784   |
|              hf_Bart              |  1   |  0.7805  |
|           fastNLP_Bert            |  1   |  0.7779  |
|           mobilenet_v2            |  16  |  0.7575  |
|            timm_nfnet             | 128  |  0.7403  |
|          vision_maskrcnn          |  1   |  0.739   |
|          resnext50_32x4d          |  8   |  0.7271  |
|      timm_vision_transformer      |  8   |  0.7141  |
| attention_is_all_you_need_pytorch |  32  |  0.704   |
|         timm_efficientnet         |  64  |  0.6645  |
|           lennard_jones           | 1000 |  0.3621  |
|             tacotron2             |  0   |   0.0    |
|     detectron2_fcos_r_50_fpn      |  0   |   0.0    |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  2  | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip |
|           hf_GPT2_large           |  2  | pass_due_to_skip |
|           BERT_pytorch            |  2  |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |
|        mobilenet_v3_large         |  2  |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|           pytorch_unet            |  2  |       pass       |
|             resnet18              |  2  |       pass       |
|             resnet50              |  2  |       pass       |
|      resnet50_quantized_qat       |  2  |       pass       |
|          resnext50_32x4d          |  2  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|           mobilenet_v2            |  2  |       pass       |
|        speech_transformer         |  2  |       pass       |
|           squeezenet1_1           |  2  |       pass       |
|         timm_efficientnet         |  2  |       pass       |
|            timm_nfnet             |  2  |       pass       |
|            timm_regnet            |  2  |       pass       |
|           timm_resnest            |  2  |       pass       |
|      timm_vision_transformer      |  2  |       pass       |
|            timm_vovnet            |  2  |       pass       |
|            tts_angular            |  2  |       pass       |
|               vgg16               |  2  |       pass       |
|    mobilenet_v2_quantized_qat     |  2  |       pass       |
|            mnasnet1_0             |  2  |       pass       |
|           fastNLP_Bert            |  2  |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |
|          LearningToPaint          |  2  |       pass       |
|            Super_SloMo            |  2  |       pass       |
|              alexnet              |  2  |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |
|               dcgan               |  2  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  2  |       pass       |
|               dlrm                |  2  |       pass       |
|                drq                |  1  |       pass       |
|           lennard_jones           |  2  |       pass       |
|              yolov3               |  2  |       pass       |
|             hf_Albert             |  2  |       pass       |
|              hf_Bart              |  2  |       pass       |
|              hf_Bert              |  2  |       pass       |
|            hf_BigBird             |  2  |       pass       |
|           hf_DistilBert           |  2  |       pass       |
|              hf_GPT2              |  2  |       pass       |
|           hf_Longformer           |  2  |       pass       |
|            hf_Reformer            |  2  |       pass       |
|               hf_T5               |  2  |       pass       |
|            hf_T5_base             |  2  |       pass       |
|        Background_Matting         |  1  |  fail_accuracy   |
|             tacotron2             |  0  |      0.0000      |
|     detectron2_fcos_r_50_fpn      |  0  |      0.0000      |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|            hf_T5_large            |  1   | 19.5685  |
|          vision_maskrcnn          |  1   | 14.1261  |
|           hf_Longformer           |  1   |  10.801  |
|           hf_GPT2_large           |  1   | 10.5252  |
|   timm_vision_transformer_large   |  8   | 10.0167  |
|            hf_T5_base             |  1   |  9.5113  |
|            hf_BigBird             |  1   |  9.4953  |
|              yolov3               |  8   |  9.4608  |
|            timm_nfnet             | 128  |  8.7221  |
|            densenet121            |  64  |  8.1788  |
|            Super_SloMo            |  6   |  5.0844  |
|        speech_transformer         |  1   |  4.3803  |
|              hf_Bart              |  1   |  4.2628  |
|            timm_regnet            |  32  |  3.9241  |
| attention_is_all_you_need_pytorch |  32  |  3.9061  |
|               hf_T5               |  1   |  3.8853  |
|           fastNLP_Bert            |  1   |  3.8334  |
|         timm_efficientnet         |  64  |  3.6686  |
|           BERT_pytorch            |  2   |  3.6197  |
|              hf_Bert              |  1   |  3.3367  |
|              hf_GPT2              |  1   |  3.064   |
|        shufflenet_v2_x1_0         |  64  |  2.9784  |
|        Background_Matting         |  1   |  2.8748  |
|      timm_vision_transformer      |  8   |  2.8561  |
|        mobilenet_v3_large         |  32  |  2.7998  |
|           mobilenet_v2            |  16  |  2.7577  |
|            hf_Reformer            |  1   |  2.7114  |
|             hf_Albert             |  1   |  2.6651  |
|          resnext50_32x4d          |  8   |  2.6175  |
|            timm_vovnet            |  32  |  2.521   |
|            mnasnet1_0             |  32  |  2.4095  |
|             resnet50              |  32  |  2.3808  |
|           timm_resnest            |  32  |  1.8313  |
|           hf_DistilBert           |  1   |  1.7493  |
|       functorch_dp_cifar10        |  64  |  1.5456  |
|           pytorch_unet            |  1   |  1.4478  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.3776  |
|          pytorch_stargan          |  16  |  1.1897  |
|          LearningToPaint          |  96  |  1.0779  |
|             resnet18              |  8   |  0.9023  |
|           squeezenet1_1           |  16  |  0.7238  |
|              demucs               |  1   |  0.7217  |
|               dlrm                | 2048 |  0.6254  |
|               vgg16               |  4   |  0.4764  |
|            tts_angular            |  64  |  0.4255  |
|              alexnet              | 128  |  0.3327  |
|                drq                |  1   |  0.3197  |
|      nvidia_deeprecommender       | 256  |  0.3053  |
|         soft_actor_critic         | 256  |  0.1629  |
|           lennard_jones           | 1000 |  0.1179  |
|               dcgan               | 256  |  0.1104  |
|    mobilenet_v2_quantized_qat     |  96  | -0.0207  |
|      resnet50_quantized_qat       |  32  | -0.1267  |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |
|             tacotron2             |  0   |   nan    |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|              demucs               |  1   |  0.9985  |
|      nvidia_deeprecommender       | 256  |  0.9969  |
|    mobilenet_v2_quantized_qat     |  96  |  0.9967  |
|            hf_T5_base             |  1   |  0.9947  |
|               vgg16               |  4   |  0.994   |
|           hf_GPT2_large           |  1   |  0.9939  |
|        Background_Matting         |  1   |  0.9934  |
|           pytorch_unet            |  1   |  0.9917  |
|            timm_nfnet             | 128  |  0.9916  |
|          LearningToPaint          |  96  |  0.9915  |
|              alexnet              | 128  |  0.991   |
|            hf_BigBird             |  1   |  0.9909  |
|            Super_SloMo            |  6   |  0.9886  |
|           lennard_jones           | 1000 |  0.9865  |
|         soft_actor_critic         | 256  |  0.9834  |
| attention_is_all_you_need_pytorch |  32  |  0.9832  |
|           hf_DistilBert           |  1   |  0.982   |
|            tts_angular            |  64  |  0.9816  |
|              hf_GPT2              |  1   |  0.9809  |
|           BERT_pytorch            |  2   |  0.9804  |
|                drq                |  1   |  0.9799  |
|              hf_Bart              |  1   |  0.9795  |
|           fastNLP_Bert            |  1   |  0.9793  |
|         timm_efficientnet         |  64  |  0.9791  |
|           mobilenet_v2            |  16  |  0.9776  |
|        speech_transformer         |  1   |  0.9734  |
|              hf_Bert              |  1   |  0.9733  |
|      resnet50_quantized_qat       |  32  |  0.972   |
|            timm_vovnet            |  32  |  0.972   |
|               hf_T5               |  1   |  0.9719  |
|   timm_vision_transformer_large   |  8   |  0.9702  |
|           hf_Longformer           |  1   |  0.9675  |
|        mobilenet_v3_large         |  32  |  0.9621  |
|          vision_maskrcnn          |  1   |  0.9553  |
|               dlrm                | 2048 |  0.9471  |
|             hf_Albert             |  1   |  0.9471  |
|             resnet50              |  32  |  0.9328  |
|           squeezenet1_1           |  16  |  0.931   |
|       functorch_dp_cifar10        |  64  |  0.9216  |
|      timm_vision_transformer      |  8   |  0.9087  |
|            hf_T5_large            |  1   |  0.9059  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.8997  |
|            mnasnet1_0             |  32  |  0.8658  |
|          resnext50_32x4d          |  8   |  0.8647  |
|        shufflenet_v2_x1_0         |  64  |  0.8498  |
|             resnet18              |  8   |  0.8451  |
|          pytorch_stargan          |  16  |  0.845   |
|            hf_Reformer            |  1   |  0.812   |
|              yolov3               |  8   |  0.7951  |
|            densenet121            |  64  |  0.7917  |
|           timm_resnest            |  32  |  0.7725  |
|            timm_regnet            |  32  |  0.7187  |
|               dcgan               | 256  |  0.6787  |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |
|             tacotron2             |  0   |   nan    |
+-----------------------------------+------+----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            XLNetLMHeadModel             | 4  |  7.3631  |
|       MT5ForConditionalGeneration       | 2  |  1.8516  |
|             XGLMForCausalLM             | 1  |  1.3518  |
|     M2M100ForConditionalGeneration      | 2  |  1.3156  |
|               DistillGPT2               | 1  |  1.2913  |
|           ElectraForCausalLM            | 1  |  1.283   |
|          MobileBertForMaskedLM          | 16 |  1.2617  |
|               GoogleFnet                | 1  |  1.2474  |
|             OPTForCausalLM              | 4  |  1.2442  |
|            YituTechConvBert             | 1  |  1.1842  |
|     MobileBertForQuestionAnswering      | 32 |  1.1796  |
|                 BigBird                 | 1  |  1.0738  |
|          AllenaiLongformerBase          | 1  |  1.0663  |
|            AlbertForMaskedLM            | 2  |  1.0232  |
|       AlbertForQuestionAnswering        | 2  |  1.0177  |
|     PegasusForConditionalGeneration     | 4  |  0.9945  |
|         MegatronBertForCausalLM         | 2  |  0.9931  |
|                 T5Small                 | 1  |  0.9825  |
|           RobertaForCausalLM            | 4  |  0.9662  |
|           PegasusForCausalLM            | 8  |  0.9613  |
|            TrOCRForCausalLM             | 8  |  0.9557  |
|           DebertaForMaskedLM            | 4  |  0.9541  |
|     PLBartForConditionalGeneration      | 8  |  0.9521  |
|                CamemBert                | 1  |  0.947   |
|       DebertaForQuestionAnswering       | 4  |  0.9268  |
|            PLBartForCausalLM            | 16 |  0.9236  |
|      MBartForConditionalGeneration      | 8  |  0.9134  |
|      GPT2ForSequenceClassification      | 4  |  0.9105  |
|         Speech2Text2ForCausalLM         | 64 |  0.9003  |
|          DistilBertForMaskedLM          | 16 |  0.8957  |
|            MBartForCausalLM             | 16 |  0.8906  |
|    MegatronBertForQuestionAnswering     | 8  |  0.8662  |
|       RobertaForQuestionAnswering       | 64 |  0.8525  |
|    LayoutLMForSequenceClassification    | 16 |  0.8496  |
|           LayoutLMForMaskedLM           | 16 |  0.846   |
|       T5ForConditionalGeneration        | 4  |  0.8412  |
|             BartForCausalLM             | 2  |  0.828   |
|             BertForMaskedLM             | 64 |  0.8275  |
|        BertForQuestionAnswering         | 64 |  0.8246  |
|     DistilBertForQuestionAnswering      | 32 |  0.8142  |
|       ElectraForQuestionAnswering       | 64 |  0.7925  |
| BlenderbotSmallForConditionalGeneration | 32 |  0.788   |
|      BartForConditionalGeneration       | 1  |  0.7823  |
|       BlenderbotSmallForCausalLM        | 64 |  0.7666  |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 16 | 12.2183  |
|     MobileBertForQuestionAnswering      | 32 | 11.9702  |
|          AllenaiLongformerBase          | 1  | 11.5861  |
|                 BigBird                 | 1  |  9.5059  |
|      BartForConditionalGeneration       | 1  |  8.9487  |
|      MBartForConditionalGeneration      | 8  |  8.8587  |
|     PegasusForConditionalGeneration     | 4  |  8.4913  |
|           DebertaForMaskedLM            | 4  |  8.4733  |
|       DebertaForQuestionAnswering       | 4  |  8.4294  |
|     M2M100ForConditionalGeneration      | 2  |  7.602   |
|    MegatronBertForQuestionAnswering     | 8  |  7.4598  |
|         MegatronBertForCausalLM         | 2  |  6.4697  |
|             XGLMForCausalLM             | 1  |  6.4641  |
| BlenderbotSmallForConditionalGeneration | 32 |   5.96   |
|            YituTechConvBert             | 1  |  5.2836  |
|       T5ForConditionalGeneration        | 4  |  4.8153  |
|           LayoutLMForMaskedLM           | 16 |  4.5577  |
|       ElectraForQuestionAnswering       | 64 |  4.5315  |
|     PLBartForConditionalGeneration      | 8  |  4.4101  |
|             BertForMaskedLM             | 64 |  4.3663  |
|    LayoutLMForSequenceClassification    | 16 |  4.2891  |
|        BertForQuestionAnswering         | 64 |  4.1585  |
|       RobertaForQuestionAnswering       | 64 |  4.1072  |
|                 T5Small                 | 1  |  3.8782  |
|       MT5ForConditionalGeneration       | 2  |  3.8328  |
|           RobertaForCausalLM            | 4  |  3.8004  |
|             BartForCausalLM             | 2  |  3.7592  |
|      GPT2ForSequenceClassification      | 4  |  3.7104  |
|            MBartForCausalLM             | 16 |  3.6174  |
|           PegasusForCausalLM            | 8  |  3.5072  |
|                CamemBert                | 1  |  3.3432  |
|           ElectraForCausalLM            | 1  |  3.3289  |
|             OPTForCausalLM              | 4  |  3.3105  |
|            TrOCRForCausalLM             | 8  |  3.234   |
|       AlbertForQuestionAnswering        | 2  |  2.935   |
|            AlbertForMaskedLM            | 2  |  2.8877  |
|       BlenderbotSmallForCausalLM        | 64 |  2.7639  |
|               GoogleFnet                | 1  |  2.0909  |
|          DistilBertForMaskedLM          | 16 |  1.9801  |
|     DistilBertForQuestionAnswering      | 32 |  1.9167  |
|            PLBartForCausalLM            | 16 |  1.9065  |
|         Speech2Text2ForCausalLM         | 64 |  1.8383  |
|               DistillGPT2               | 1  |  1.6408  |
|            XLNetLMHeadModel             | 4  | -16.2899 |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 2  |  0.9966  |
|       ElectraForQuestionAnswering       | 64 |  0.9966  |
|       AlbertForQuestionAnswering        | 2  |  0.9964  |
|           LayoutLMForMaskedLM           | 16 |  0.9961  |
|       BlenderbotSmallForCausalLM        | 64 |  0.996   |
|    LayoutLMForSequenceClassification    | 16 |  0.9956  |
|      GPT2ForSequenceClassification      | 4  |  0.9956  |
|             BertForMaskedLM             | 64 |  0.9955  |
|        BertForQuestionAnswering         | 64 |  0.9942  |
|       RobertaForQuestionAnswering       | 64 |  0.9942  |
|           DebertaForMaskedLM            | 4  |  0.9939  |
|       T5ForConditionalGeneration        | 4  |  0.9938  |
|             BartForCausalLM             | 2  |  0.9934  |
|       DebertaForQuestionAnswering       | 4  |  0.9927  |
|                 BigBird                 | 1  |  0.9924  |
|            XLNetLMHeadModel             | 4  |  0.9916  |
|            MBartForCausalLM             | 16 |  0.9915  |
|            PLBartForCausalLM            | 16 |  0.9902  |
| BlenderbotSmallForConditionalGeneration | 32 |  0.9902  |
|      BartForConditionalGeneration       | 1  |  0.9901  |
|           PegasusForCausalLM            | 8  |  0.9896  |
|     PegasusForConditionalGeneration     | 4  |  0.9887  |
|               GoogleFnet                | 1  |  0.9885  |
|         Speech2Text2ForCausalLM         | 64 |  0.988   |
|       MT5ForConditionalGeneration       | 2  |  0.9876  |
|          DistilBertForMaskedLM          | 16 |  0.9871  |
|      MBartForConditionalGeneration      | 8  |  0.9865  |
|            TrOCRForCausalLM             | 8  |  0.986   |
|     M2M100ForConditionalGeneration      | 2  |  0.9859  |
|               DistillGPT2               | 1  |  0.9851  |
|    MegatronBertForQuestionAnswering     | 8  |  0.985   |
|             XGLMForCausalLM             | 1  |  0.9834  |
|             OPTForCausalLM              | 4  |  0.9829  |
|     PLBartForConditionalGeneration      | 8  |  0.9812  |
|          AllenaiLongformerBase          | 1  |  0.9789  |
|                CamemBert                | 1  |  0.9726  |
|                 T5Small                 | 1  |  0.9719  |
|           RobertaForCausalLM            | 4  |  0.9716  |
|            YituTechConvBert             | 1  |  0.9622  |
|         MegatronBertForCausalLM         | 2  |  0.9588  |
|           ElectraForCausalLM            | 1  |  0.944   |
|          MobileBertForMaskedLM          | 16 |  0.9269  |
|     DistilBertForQuestionAnswering      | 32 |  0.9081  |
|     MobileBertForQuestionAnswering      | 32 |  0.8899  |
+-----------------------------------------+----+----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          inception_v3           | 128 |  1.2067  |
|        res2net101_26w_4s        | 64  |  1.189   |
|        adv_inception_v3         | 128 |  1.1879  |
|       gluon_inception_v3        | 128 |  1.1847  |
|           mnasnet_100           | 128 |  1.1736  |
|          spnasnet_100           | 128 |  1.1705  |
|        convmixer_768_32         | 32  |  1.1569  |
|          ghostnet_100           | 128 |  1.1505  |
|           fbnetc_100            | 128 |  1.134   |
|           resnest101e           | 32  |  1.1296  |
|             dpn107              | 32  |  1.1117  |
|           volo_d1_224           | 64  |  1.1116  |
|            gernet_l             | 128 |  1.1023  |
|          pnasnet5large          | 16  |  1.0403  |
|           selecsls42b           | 128 |  1.0238  |
|      mobilenetv3_large_100      | 128 |  1.0237  |
|        res2net50_14w_8s         |  2  |  1.0053  |
|            repvgg_a2            | 128 |  1.0013  |
|          gmixer_24_224          | 64  |  0.9843  |
|         poolformer_m36          | 64  |  0.9821  |
|            hrnet_w18            |  2  |  0.9727  |
|         mobilenetv2_100         | 128 |  0.9628  |
|             dla102              | 64  |  0.951   |
|      xcit_large_24_p8_224       |  5  |  0.9447  |
|            fbnetv3_b            | 128 |   0.93   |
|           tf_mixnet_l           | 64  |  0.9279  |
|          cait_m36_384           |  2  |  0.9264  |
|        ese_vovnet19b_dw         | 128 |  0.9251  |
|           regnety_002           | 128 |  0.9222  |
|            nfnet_l0             | 64  |  0.9063  |
|            mixnet_l             | 64  |  0.8793  |
|     swsl_resnext101_32x16d      | 32  |  0.8755  |
|      beit_base_patch16_224      | 64  |  0.8658  |
|            lcnet_050            | 128 |  0.8585  |
|  swin_base_patch4_window7_224   | 64  |  0.8467  |
| deit_base_distilled_patch16_224 | 64  |  0.8454  |
|      vit_base_patch16_224       | 64  |  0.8451  |
|           convit_base           | 32  |  0.8401  |
|           dm_nfnet_f0           | 128 |  0.8236  |
|        tnt_s_patch16_224        | 64  |  0.8202  |
|          resmlp_12_224          | 128 |  0.8106  |
|          mixer_b16_224          | 64  |  0.809   |
|          jx_nest_base           | 32  |  0.8065  |
|           rexnet_100            | 128 |   0.8    |
|         visformer_small         | 128 |  0.7974  |
|          convnext_base          | 32  |  0.7851  |
|            pit_b_224            | 64  |  0.7825  |
|           res2next50            |  2  |  0.7694  |
|         coat_lite_mini          | 128 |  0.736   |
|        twins_pcpvt_base         | 32  |  0.7297  |
|            tinynet_a            | 128 |  0.7163  |
|       tf_efficientnet_b0        | 128 |  0.7025  |
|        gluon_xception65         | 32  |  0.6939  |
|         crossvit_9_240          | 64  |  0.6874  |
|          cspdarknet53           | 64  |  0.6632  |
|          gmlp_s16_224           | 64  |  0.553   |
|           mobilevit_s           | 32  |  0.4792  |
|        sebotnet33ts_256         |  0  |   0.0    |
|       eca_botnext26ts_256       |  0  |   0.0    |
|        eca_halonext26ts         |  0  |   0.0    |
|          botnet26t_256          |  0  |   0.0    |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|           resnest101e           | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|  swin_base_patch4_window7_224   | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|         visformer_small         | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|           convit_base           | 2  |     pass      |
|        convmixer_768_32         | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|             dpn107              | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|            hrnet_w18            | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|        eca_halonext26ts         | 2  |  fail_to_run  |
|        sebotnet33ts_256         | 2  |  fail_to_run  |
|          botnet26t_256          | 2  |  fail_to_run  |
|       eca_botnext26ts_256       | 2  |  fail_to_run  |
|          ghostnet_100           | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|            hrnet_w18            |  2  | 17.4435  |
|          pnasnet5large          | 16  | 15.8176  |
|           mobilevit_s           | 32  | 11.8116  |
|      xcit_large_24_p8_224       |  5  | 11.6298  |
|  swin_base_patch4_window7_224   | 64  | 11.2435  |
|          cait_m36_384           |  2  |  11.183  |
|         poolformer_m36          | 64  |  9.7591  |
|           dm_nfnet_f0           | 128 |  9.0595  |
|        twins_pcpvt_base         | 32  |  9.0463  |
|          jx_nest_base           | 32  |  8.8532  |
|        res2net101_26w_4s        | 64  |  8.7024  |
|           tf_mixnet_l           | 64  |  8.1885  |
|        res2net50_14w_8s         |  2  |  8.0084  |
|        gluon_xception65         | 32  |  7.7414  |
|            mixnet_l             | 64  |  7.7377  |
|        tnt_s_patch16_224        | 64  |  7.3083  |
|             dpn107              | 32  |  7.2266  |
|          gmlp_s16_224           | 64  |  7.1444  |
|            fbnetv3_b            | 128 |  6.8568  |
|           volo_d1_224           | 64  |  6.0621  |
|          cspdarknet53           | 64  |  5.9534  |
|             dla102              | 64  |  5.8048  |
|          convnext_base          | 32  |  5.558   |
|         crossvit_9_240          | 64  |  5.5577  |
|           rexnet_100            | 128 |  5.1342  |
|            nfnet_l0             | 64  |  5.0545  |
|       tf_efficientnet_b0        | 128 |  4.8415  |
|     swsl_resnext101_32x16d      | 32  |  4.7578  |
|            pit_b_224            | 64  |  4.7423  |
|           res2next50            |  2  |  4.6471  |
|          inception_v3           | 128 |  4.617   |
|            tinynet_a            | 128 |  4.4623  |
|          ghostnet_100           | 128 |  4.3717  |
|        adv_inception_v3         | 128 |  4.248   |
|       gluon_inception_v3        | 128 |  4.2008  |
|      beit_base_patch16_224      | 64  |  4.1799  |
|         coat_lite_mini          | 128 |  4.0925  |
|           convit_base           | 32  |  4.0284  |
|           fbnetc_100            | 128 |  3.6995  |
| deit_base_distilled_patch16_224 | 64  |  3.6688  |
|          gmixer_24_224          | 64  |  3.5157  |
|          spnasnet_100           | 128 |  3.5142  |
|         visformer_small         | 128 |  3.4201  |
|          mixer_b16_224          | 64  |  3.2746  |
|            repvgg_a2            | 128 |  3.2501  |
|      vit_base_patch16_224       | 64  |  3.0714  |
|           regnety_002           | 128 |  3.0702  |
|            gernet_l             | 128 |  3.0452  |
|      mobilenetv3_large_100      | 128 |  3.0294  |
|           mnasnet_100           | 128 |  2.8618  |
|         mobilenetv2_100         | 128 |  2.7368  |
|        ese_vovnet19b_dw         | 128 |  2.3969  |
|          resmlp_12_224          | 128 |  2.1207  |
|            lcnet_050            | 128 |  1.9732  |
|        convmixer_768_32         | 32  |  1.9472  |
|           selecsls42b           | 128 |  1.8057  |
|           resnest101e           | 32  |  1.7945  |
|          botnet26t_256          |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        ese_vovnet19b_dw         | 128 |  0.9974  |
|           dm_nfnet_f0           | 128 |  0.9972  |
|      beit_base_patch16_224      | 64  |  0.9967  |
| deit_base_distilled_patch16_224 | 64  |  0.9965  |
|          mixer_b16_224          | 64  |  0.9963  |
|            gernet_l             | 128 |  0.9963  |
|            fbnetv3_b            | 128 |  0.9962  |
|            pit_b_224            | 64  |  0.9962  |
|            nfnet_l0             | 64  |  0.9962  |
|          cspdarknet53           | 64  |  0.9961  |
|        gluon_xception65         | 32  |  0.9955  |
|          resmlp_12_224          | 128 |  0.9951  |
|          inception_v3           | 128 |  0.9945  |
|            repvgg_a2            | 128 |  0.9943  |
|          convnext_base          | 32  |  0.9942  |
|           rexnet_100            | 128 |  0.9942  |
|         poolformer_m36          | 64  |  0.9937  |
|             dla102              | 64  |  0.9929  |
|           tf_mixnet_l           | 64  |  0.9928  |
|           mobilevit_s           | 32  |  0.9926  |
|          pnasnet5large          | 16  |  0.9926  |
|           fbnetc_100            | 128 |  0.992   |
|       tf_efficientnet_b0        | 128 |  0.9919  |
|           selecsls42b           | 128 |  0.9914  |
|           volo_d1_224           | 64  |  0.9913  |
|     swsl_resnext101_32x16d      | 32  |  0.9912  |
|            mixnet_l             | 64  |  0.9895  |
|          jx_nest_base           | 32  |  0.9894  |
|      xcit_large_24_p8_224       |  5  |  0.9892  |
|          cait_m36_384           |  2  |  0.9883  |
|        res2net101_26w_4s        | 64  |  0.9867  |
|        adv_inception_v3         | 128 |  0.9863  |
|       gluon_inception_v3        | 128 |  0.9855  |
|           mnasnet_100           | 128 |  0.9852  |
|          spnasnet_100           | 128 |  0.9852  |
|          ghostnet_100           | 128 |  0.9845  |
|            lcnet_050            | 128 |  0.9831  |
|           regnety_002           | 128 |  0.9808  |
|            tinynet_a            | 128 |  0.9741  |
|        convmixer_768_32         | 32  |  0.9715  |
|           res2next50            |  2  |  0.9548  |
|        res2net50_14w_8s         |  2  |  0.9372  |
|          gmlp_s16_224           | 64  |  0.9283  |
|          gmixer_24_224          | 64  |  0.9094  |
|        twins_pcpvt_base         | 32  |  0.9033  |
|  swin_base_patch4_window7_224   | 64  |  0.9009  |
|           resnest101e           | 32  |  0.8544  |
|      mobilenetv3_large_100      | 128 |  0.8465  |
|           convit_base           | 32  |  0.8249  |
|            hrnet_w18            |  2  |  0.8205  |
|         coat_lite_mini          | 128 |  0.8071  |
|         mobilenetv2_100         | 128 |  0.805   |
|      vit_base_patch16_224       | 64  |  0.8042  |
|         visformer_small         | 128 |  0.7898  |
|             dpn107              | 32  |  0.7697  |
|         crossvit_9_240          | 64  |  0.7352  |
|        tnt_s_patch16_224        | 64  |  0.6241  |
|          botnet26t_256          |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

@ESI-SYD
Copy link

ESI-SYD commented Oct 27, 2022

Performance Dashboard for float32 precision -- Single-core Single-thread

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+--------------+-------------+
| Compiler | torchbench | huggingface  | timm_models |
+----------+------------+--------------+-------------+
| inductor | 91%, 49/54 |  95%, 42/44  |  90%, 55/61 |  
+----------+------------+--------------+-------------+

Geometric mean speedup

+----------+------------+--------------+-------------+
| Compiler | torchbench | huggingface  | timm_models |
+----------+------------+--------------+-------------+
| inductor |   1.03x    |    1.00x     |    1.01x    |  
+----------+------------+--------------+-------------+

Mean compilation time (seconds)

+----------+------------+--------------+-------------+
| Compiler | torchbench | huggingface  | timm_models |
+----------+------------+--------------+-------------+
| inductor |   5.47     |     7.35     |    5.69     |  
+----------+------------+--------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+--------------+-------------+
| Compiler | torchbench | huggingface  | timm_models |
+----------+------------+--------------+-------------+
| inductor |   0.97x    |    0.98x     |     0.95x   |  
+----------+------------+--------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|        shufflenet_v2_x1_0         |  1  |  1.3718  |
|           squeezenet1_1           |  1  |  1.3473  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  1.1701  |
|           timm_resnest            |  1  |  1.162   |
|             resnet18              |  1  |  1.121   |
| attention_is_all_you_need_pytorch |  1  |  1.0937  |
|              alexnet              |  1  |  1.0816  |
|               vgg16               |  1  |  1.0685  |
|            Super_SloMo            |  1  |  1.055   |
|                drq                |  1  |  1.0498  |
|         soft_actor_critic         | 256 |  1.0109  |
|               dcgan               |  1  |  1.007   |
|              demucs               |  1  |  1.0053  |
|               dlrm                |  1  |  1.0047  |
|          pytorch_stargan          | 16  |  0.9986  |
|            tts_angular            |  1  |  0.9985  |
|          LearningToPaint          |  1  |  0.9972  |
|      resnet50_quantized_qat       |  1  |  0.9955  |
|    mobilenet_v2_quantized_qat     |  1  |  0.9899  |
|             resnet50              |  1  |  0.9861  |
|            hf_BigBird             |  1  |  0.9834  |
|      nvidia_deeprecommender       |  1  |  0.9787  |
|            timm_nfnet             |  1  |  0.9738  |
|           pytorch_unet            |  1  |  0.9646  |
|        Background_Matting         |  1  |  0.951   |
|            timm_vovnet            |  1  |  0.9484  |
|            timm_regnet            |  1  |  0.9359  |
|        speech_transformer         |  1  |  0.9162  |
|          resnext50_32x4d          |  1  |  0.8816  |
|           hf_Longformer           |  1  |  0.8638  |
|              hf_GPT2              |  1  |  0.8593  |
|            mnasnet1_0             |  1  |  0.8526  |
|            hf_Reformer            |  1  |  0.8378  |
|          vision_maskrcnn          |  1  |  0.8344  |
|              yolov3               |  1  |  0.8325  |
|           BERT_pytorch            |  1  |  0.7956  |
|            hf_T5_large            |  1  |  0.7947  |
|   timm_vision_transformer_large   |  1  |  0.7892  |
|             hf_Albert             |  1  |  0.7764  |
|        mobilenet_v3_large         |  1  |  0.7747  |
|            densenet121            |  1  |  0.7624  |
|           lennard_jones           |  1  |  0.7559  |
|               hf_T5               |  1  |  0.7424  |
|           hf_GPT2_large           |  1  |  0.7353  |
|           hf_DistilBert           |  1  |  0.7248  |
|              hf_Bert              |  1  |  0.6891  |
|              hf_Bart              |  1  |  0.6826  |
|           fastNLP_Bert            |  1  |  0.6784  |
|            hf_T5_base             |  1  |  0.6702  |
|           mobilenet_v2            |  1  |  0.6445  |
|      timm_vision_transformer      |  1  |  0.5835  |
|       functorch_dp_cifar10        |  1  |  0.5038  |
|         timm_efficientnet         |  1  |  0.4167  |
|     detectron2_fcos_r_50_fpn      |  0  |   0.0    |
+-----------------------------------+-----+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  1  | pass_due_to_skip |
|   timm_vision_transformer_large   |  1  | pass_due_to_skip |
|           hf_GPT2_large           |  1  | pass_due_to_skip |
|           BERT_pytorch            |  1  |       pass       |
|        shufflenet_v2_x1_0         |  1  |       pass       |
|        mobilenet_v3_large         |  1  |       pass       |
|      nvidia_deeprecommender       |  1  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|           pytorch_unet            |  1  |       pass       |
|             resnet18              |  1  |       pass       |
|             resnet50              |  1  |       pass       |
|      resnet50_quantized_qat       |  1  |       pass       |
|          resnext50_32x4d          |  1  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|           mobilenet_v2            |  1  |       pass       |
|        speech_transformer         |  1  |       pass       |
|           squeezenet1_1           |  1  |       pass       |
|         timm_efficientnet         |  1  |       pass       |
|            timm_nfnet             |  1  |       pass       |
|            timm_regnet            |  1  |       pass       |
|           timm_resnest            |  1  |       pass       |
|      timm_vision_transformer      |  1  |       pass       |
|            timm_vovnet            |  1  |       pass       |
|            tts_angular            |  1  |       pass       |
|               vgg16               |  1  |       pass       |
|    mobilenet_v2_quantized_qat     |  1  |       pass       |
|            mnasnet1_0             |  1  |       pass       |
|           fastNLP_Bert            |  1  |       pass       |
|       functorch_dp_cifar10        |  1  |       pass       |
|          LearningToPaint          |  1  |       pass       |
|            Super_SloMo            |  1  |       pass       |
|              alexnet              |  1  |       pass       |
| attention_is_all_you_need_pytorch |  1  |       pass       |
|               dcgan               |  1  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  1  |       pass       |
|               dlrm                |  1  |       pass       |
|                drq                |  1  |       pass       |
|           lennard_jones           |  1  |       pass       |
|              yolov3               |  1  |       pass       |
|             hf_Albert             |  1  |       pass       |
|              hf_Bart              |  1  |       pass       |
|              hf_Bert              |  1  |       pass       |
|            hf_BigBird             |  1  |       pass       |
|           hf_Longformer           |  1  |       pass       |
|            hf_Reformer            |  1  |       pass       |
|               hf_T5               |  1  |       pass       |
|            hf_T5_base             |  1  |       pass       |
|        Background_Matting         |  1  |  fail_accuracy   |
|     detectron2_fcos_r_50_fpn      |  0  |      0.0000      |
|           hf_DistilBert           |  0  |      0.0000      |
|              hf_GPT2              |  0  |      0.0000      |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|            hf_T5_base             |  1  | 50.6758  |
|           hf_GPT2_large           |  1  |  34.583  |
|            hf_T5_large            |  1  | 28.4916  |
|          vision_maskrcnn          |  1  | 18.0325  |
|   timm_vision_transformer_large   |  1  | 13.3261  |
|           hf_Longformer           |  1  | 12.2457  |
|            hf_BigBird             |  1  | 10.1571  |
|              yolov3               |  1  |  9.6826  |
|            densenet121            |  1  |  7.9872  |
|              hf_Bart              |  1  |  6.8012  |
|            timm_nfnet             |  1  |  6.7634  |
|           fastNLP_Bert            |  1  |  6.5982  |
|              hf_Bert              |  1  |  5.2685  |
|               hf_T5               |  1  |  5.0497  |
|        speech_transformer         |  1  |  4.7929  |
|        Background_Matting         |  1  |  4.684   |
|            timm_regnet            |  1  |  4.2927  |
|            Super_SloMo            |  1  |  4.2702  |
|             hf_Albert             |  1  |  3.9343  |
|            hf_Reformer            |  1  |  3.7558  |
|           BERT_pytorch            |  1  |  3.7028  |
|        shufflenet_v2_x1_0         |  1  |  3.6925  |
|              hf_GPT2              |  1  |  3.5584  |
|         timm_efficientnet         |  1  |  3.4965  |
| attention_is_all_you_need_pytorch |  1  |  3.3817  |
|             resnet50              |  1  |  2.8523  |
|      timm_vision_transformer      |  1  |  2.8507  |
|          resnext50_32x4d          |  1  |  2.8399  |
|            mnasnet1_0             |  1  |  2.7662  |
|        mobilenet_v3_large         |  1  |  2.7274  |
|           pytorch_unet            |  1  |  2.7155  |
|           mobilenet_v2            |  1  |  2.6687  |
|            timm_vovnet            |  1  |  2.546   |
|           hf_DistilBert           |  1  |  2.4877  |
|           timm_resnest            |  1  |  2.2898  |
|       functorch_dp_cifar10        |  1  |  1.917   |
|          pytorch_stargan          | 16  |  1.3333  |
|          LearningToPaint          |  1  |  1.0428  |
|             resnet18              |  1  |  0.9963  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  0.8937  |
|               dlrm                |  1  |  0.8525  |
|              demucs               |  1  |  0.816   |
|           squeezenet1_1           |  1  |  0.6268  |
|            tts_angular            |  1  |  0.5151  |
|               vgg16               |  1  |  0.3717  |
|              alexnet              |  1  |  0.3331  |
|                drq                |  1  |  0.3052  |
|               dcgan               |  1  |  0.298   |
|      nvidia_deeprecommender       |  1  |  0.2661  |
|         soft_actor_critic         | 256 |  0.2076  |
|           lennard_jones           |  1  |  0.1296  |
|    mobilenet_v2_quantized_qat     |  1  |  0.0764  |
|      resnet50_quantized_qat       |  1  |  0.0556  |
|     detectron2_fcos_r_50_fpn      |  0  |   nan    |
+-----------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|              demucs               |  1  |  0.999   |
|      nvidia_deeprecommender       |  1  |  0.9966  |
|      resnet50_quantized_qat       |  1  |  0.9953  |
|               vgg16               |  1  |  0.9952  |
|            hf_T5_base             |  1  |  0.9948  |
|        Background_Matting         |  1  |  0.9946  |
|    mobilenet_v2_quantized_qat     |  1  |  0.993   |
|           hf_GPT2_large           |  1  |  0.9927  |
|              alexnet              |  1  |  0.9917  |
|            hf_BigBird             |  1  |  0.9915  |
|           pytorch_unet            |  1  |  0.9913  |
|         soft_actor_critic         | 256 |  0.9909  |
|          pytorch_stargan          | 16  |  0.9906  |
|          LearningToPaint          |  1  |  0.9905  |
|   timm_vision_transformer_large   |  1  |  0.9901  |
|            tts_angular            |  1  |  0.9876  |
|           lennard_jones           |  1  |  0.9871  |
|                drq                |  1  |  0.9868  |
|               dcgan               |  1  |  0.9846  |
| attention_is_all_you_need_pytorch |  1  |  0.9835  |
|              hf_GPT2              |  1  |  0.9834  |
|           fastNLP_Bert            |  1  |  0.9824  |
|            hf_T5_large            |  1  |  0.9816  |
|           hf_DistilBert           |  1  |  0.9815  |
|              hf_Bert              |  1  |  0.9815  |
|           BERT_pytorch            |  1  |  0.9808  |
|        speech_transformer         |  1  |  0.9796  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  0.9788  |
|             hf_Albert             |  1  |  0.9783  |
|               hf_T5               |  1  |  0.9752  |
|             resnet18              |  1  |  0.9752  |
|          vision_maskrcnn          |  1  |  0.975   |
|              hf_Bart              |  1  |  0.973   |
|            Super_SloMo            |  1  |  0.973   |
|           hf_Longformer           |  1  |  0.9685  |
|           timm_resnest            |  1  |  0.968   |
|            timm_vovnet            |  1  |  0.9678  |
|            timm_regnet            |  1  |  0.9662  |
|          resnext50_32x4d          |  1  |  0.9637  |
|             resnet50              |  1  |  0.9636  |
|           squeezenet1_1           |  1  |  0.9626  |
|            timm_nfnet             |  1  |  0.9595  |
|      timm_vision_transformer      |  1  |  0.9577  |
|              yolov3               |  1  |  0.9573  |
|               dlrm                |  1  |  0.9548  |
|            mnasnet1_0             |  1  |  0.9511  |
|        mobilenet_v3_large         |  1  |  0.9489  |
|           mobilenet_v2            |  1  |  0.9488  |
|       functorch_dp_cifar10        |  1  |  0.9474  |
|            hf_Reformer            |  1  |  0.9454  |
|        shufflenet_v2_x1_0         |  1  |  0.9432  |
|         timm_efficientnet         |  1  |  0.943   |
|            densenet121            |  1  |  0.8713  |
|     detectron2_fcos_r_50_fpn      |  0  |   nan    |
+-----------------------------------+-----+----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|     MobileBertForQuestionAnswering      | 1  |  1.0345  |
|            XLNetLMHeadModel             | 1  |  0.9686  |
|            AlbertForMaskedLM            | 1  |  0.9365  |
|       AlbertForQuestionAnswering        | 1  |  0.932   |
|          MobileBertForMaskedLM          | 1  |  0.8867  |
|                 BigBird                 | 1  |  0.8861  |
|     M2M100ForConditionalGeneration      | 1  |  0.8828  |
|             OPTForCausalLM              | 1  |  0.8575  |
|               GoogleFnet                | 1  |  0.8302  |
|       DebertaForQuestionAnswering       | 1  |  0.7953  |
|          AllenaiLongformerBase          | 1  |  0.792   |
|         MegatronBertForCausalLM         | 1  |  0.7858  |
|            YituTechConvBert             | 1  |  0.7855  |
|    MegatronBertForQuestionAnswering     | 1  |  0.7855  |
|     PegasusForConditionalGeneration     | 1  |  0.7852  |
|         Speech2Text2ForCausalLM         | 1  |  0.7839  |
|      MBartForConditionalGeneration      | 1  |  0.7785  |
|            TrOCRForCausalLM             | 1  |  0.7713  |
|           PegasusForCausalLM            | 1  |  0.7688  |
|            MBartForCausalLM             | 1  |  0.7678  |
|             XGLMForCausalLM             | 1  |  0.7664  |
|       RobertaForQuestionAnswering       | 1  |  0.7628  |
|           DebertaForMaskedLM            | 1  |  0.7606  |
|     DistilBertForQuestionAnswering      | 1  |  0.7468  |
|           RobertaForCausalLM            | 1  |  0.7448  |
|     PLBartForConditionalGeneration      | 1  |  0.7417  |
|          DistilBertForMaskedLM          | 1  |  0.7257  |
|            PLBartForCausalLM            | 1  |  0.7234  |
|               DistillGPT2               | 1  |  0.7063  |
|           LayoutLMForMaskedLM           | 1  |  0.6533  |
|    LayoutLMForSequenceClassification    | 1  |  0.6507  |
|             BartForCausalLM             | 1  |  0.6452  |
|      GPT2ForSequenceClassification      | 1  |  0.645   |
|                CamemBert                | 1  |  0.6426  |
| BlenderbotSmallForConditionalGeneration | 1  |  0.6426  |
|       BlenderbotSmallForCausalLM        | 1  |  0.6418  |
|       T5ForConditionalGeneration        | 1  |  0.6012  |
|                 T5Small                 | 1  |  0.6006  |
|      BartForConditionalGeneration       | 1  |  0.592   |
|       ElectraForQuestionAnswering       | 1  |  0.4796  |
|           ElectraForCausalLM            | 1  |  0.4554  |
|       MT5ForConditionalGeneration       | 1  |  0.4265  |
|             BertForMaskedLM             | 0  |   0.0    |
|        BertForQuestionAnswering         | 0  |   0.0    |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|      BartForConditionalGeneration       | 1  | 31.7606  |
|          AllenaiLongformerBase          | 1  | 15.2198  |
|             BartForCausalLM             | 1  | 12.4113  |
|     MobileBertForQuestionAnswering      | 1  | 11.9979  |
|          MobileBertForMaskedLM          | 1  | 11.9653  |
|                 BigBird                 | 1  | 11.1688  |
|            XLNetLMHeadModel             | 1  | 10.6341  |
|           DebertaForMaskedLM            | 1  | 10.3189  |
|      MBartForConditionalGeneration      | 1  |  9.6144  |
|       DebertaForQuestionAnswering       | 1  |  9.5465  |
|     PegasusForConditionalGeneration     | 1  |  9.3128  |
|                 T5Small                 | 1  |  9.1569  |
|       T5ForConditionalGeneration        | 1  |  9.1361  |
|             XGLMForCausalLM             | 1  |  8.3606  |
|     M2M100ForConditionalGeneration      | 1  |  8.2767  |
|            AlbertForMaskedLM            | 1  |  8.2582  |
|      GPT2ForSequenceClassification      | 1  |  8.1982  |
|       AlbertForQuestionAnswering        | 1  |  8.1664  |
|         MegatronBertForCausalLM         | 1  |  7.8404  |
|    MegatronBertForQuestionAnswering     | 1  |  7.6892  |
|       MT5ForConditionalGeneration       | 1  |  7.571   |
|            YituTechConvBert             | 1  |   6.62   |
|                CamemBert                | 1  |  6.0292  |
| BlenderbotSmallForConditionalGeneration | 1  |  5.8922  |
|           LayoutLMForMaskedLM           | 1  |  5.7682  |
|    LayoutLMForSequenceClassification    | 1  |  5.3124  |
|     PLBartForConditionalGeneration      | 1  |  4.6582  |
|           ElectraForCausalLM            | 1  |  4.6399  |
|       ElectraForQuestionAnswering       | 1  |  4.2141  |
|           PegasusForCausalLM            | 1  |  4.0619  |
|            MBartForCausalLM             | 1  |  4.0019  |
|            TrOCRForCausalLM             | 1  |  3.8745  |
|           RobertaForCausalLM            | 1  |  3.7961  |
|       RobertaForQuestionAnswering       | 1  |  3.6619  |
|             OPTForCausalLM              | 1  |  3.2093  |
|               DistillGPT2               | 1  |  3.0738  |
|               GoogleFnet                | 1  |  2.7978  |
|       BlenderbotSmallForCausalLM        | 1  |  2.5215  |
|         Speech2Text2ForCausalLM         | 1  |   2.15   |
|            PLBartForCausalLM            | 1  |  1.9917  |
|          DistilBertForMaskedLM          | 1  |  1.9543  |
|     DistilBertForQuestionAnswering      | 1  |  1.9133  |
|             BertForMaskedLM             | 0  |   nan    |
|        BertForQuestionAnswering         | 0  |   nan    |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |  0.9949  |
|       AlbertForQuestionAnswering        | 1  |  0.9948  |
|                 BigBird                 | 1  |  0.9932  |
|               GoogleFnet                | 1  |  0.9915  |
|      BartForConditionalGeneration       | 1  |  0.9905  |
|      GPT2ForSequenceClassification      | 1  |  0.9904  |
|           DebertaForMaskedLM            | 1  |  0.9886  |
|     PegasusForConditionalGeneration     | 1  |  0.9885  |
|       DebertaForQuestionAnswering       | 1  |  0.9879  |
|            TrOCRForCausalLM             | 1  |  0.9876  |
|     M2M100ForConditionalGeneration      | 1  |  0.9874  |
|            MBartForCausalLM             | 1  |  0.9858  |
|       MT5ForConditionalGeneration       | 1  |  0.985   |
|               DistillGPT2               | 1  |  0.9846  |
|                 T5Small                 | 1  |  0.9841  |
|       T5ForConditionalGeneration        | 1  |  0.9841  |
|             XGLMForCausalLM             | 1  |  0.9831  |
|           PegasusForCausalLM            | 1  |  0.9827  |
|            XLNetLMHeadModel             | 1  |  0.9824  |
|    LayoutLMForSequenceClassification    | 1  |  0.9821  |
|           LayoutLMForMaskedLM           | 1  |  0.9818  |
|     DistilBertForQuestionAnswering      | 1  |  0.9817  |
|          DistilBertForMaskedLM          | 1  |  0.9817  |
|         MegatronBertForCausalLM         | 1  |  0.9815  |
|                CamemBert                | 1  |  0.9815  |
|    MegatronBertForQuestionAnswering     | 1  |  0.9811  |
|           RobertaForCausalLM            | 1  |  0.9794  |
|       RobertaForQuestionAnswering       | 1  |  0.9792  |
|          AllenaiLongformerBase          | 1  |  0.979   |
|      MBartForConditionalGeneration      | 1  |  0.9778  |
|             OPTForCausalLM              | 1  |  0.9769  |
|            YituTechConvBert             | 1  |  0.976   |
|     PLBartForConditionalGeneration      | 1  |  0.974   |
|       BlenderbotSmallForCausalLM        | 1  |  0.9736  |
|         Speech2Text2ForCausalLM         | 1  |  0.9733  |
|            PLBartForCausalLM            | 1  |  0.9681  |
|           ElectraForCausalLM            | 1  |  0.9627  |
|             BartForCausalLM             | 1  |  0.9607  |
|       ElectraForQuestionAnswering       | 1  |  0.958   |
| BlenderbotSmallForConditionalGeneration | 1  |  0.9569  |
|          MobileBertForMaskedLM          | 1  |  0.9308  |
|     MobileBertForQuestionAnswering      | 1  |  0.9129  |
|             BertForMaskedLM             | 0  |   nan    |
|        BertForQuestionAnswering         | 0  |   nan    |
+-----------------------------------------+----+----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  |  1.1504  |
|          inception_v3           | 1  |  1.1191  |
|       gluon_inception_v3        | 1  |  1.0991  |
|        adv_inception_v3         | 1  |  1.098   |
|        res2net101_26w_4s        | 1  |  1.0485  |
|          ghostnet_100           | 1  |  1.039   |
|        res2net50_14w_8s         | 1  |  1.0334  |
|           resnest101e           | 1  |  1.0113  |
|            nfnet_l0             | 1  |  1.0066  |
|            hrnet_w18            | 1  |  1.0041  |
|           regnety_002           | 1  |  0.9546  |
|            repvgg_a2            | 1  |  0.9453  |
|           selecsls42b           | 1  |  0.9205  |
|           dm_nfnet_f0           | 1  |  0.9179  |
|             dpn107              | 1  |  0.9178  |
|          spnasnet_100           | 1  |  0.9163  |
|           mnasnet_100           | 1  |  0.9118  |
|           res2next50            | 1  |  0.8947  |
|            gernet_l             | 1  |  0.8872  |
|     swsl_resnext101_32x16d      | 1  |  0.8771  |
|          gmixer_24_224          | 1  |  0.8391  |
|      xcit_large_24_p8_224       | 1  |  0.8366  |
|        convmixer_768_32         | 1  |  0.8288  |
|      beit_base_patch16_224      | 1  |  0.8284  |
|             dla102              | 1  |  0.8054  |
|      mobilenetv3_large_100      | 1  |  0.8004  |
|        gluon_xception65         | 1  |  0.7888  |
|           fbnetc_100            | 1  |  0.7678  |
| deit_base_distilled_patch16_224 | 1  |  0.7665  |
|      vit_base_patch16_224       | 1  |  0.7505  |
|            fbnetv3_b            | 1  |  0.7415  |
|           volo_d1_224           | 1  |  0.7371  |
|           convit_base           | 1  |  0.7311  |
|  swin_base_patch4_window7_224   | 1  |  0.7262  |
|          cait_m36_384           | 1  |  0.7251  |
|          mixer_b16_224          | 1  |  0.7156  |
|            lcnet_050            | 1  |  0.7139  |
|         poolformer_m36          | 1  |  0.6751  |
|         mobilenetv2_100         | 1  |  0.6626  |
|          cspdarknet53           | 1  |  0.6616  |
|          resmlp_12_224          | 1  |   0.66   |
|            pit_b_224            | 1  |  0.6482  |
|          convnext_base          | 1  |  0.6364  |
|        tnt_s_patch16_224        | 1  |  0.6352  |
|        twins_pcpvt_base         | 1  |  0.6059  |
|          jx_nest_base           | 1  |  0.6047  |
|         visformer_small         | 1  |  0.6011  |
|         crossvit_9_240          | 1  |  0.5865  |
|           tf_mixnet_l           | 1  |  0.5648  |
|        ese_vovnet19b_dw         | 1  |  0.546   |
|            mixnet_l             | 1  |  0.5282  |
|         coat_lite_mini          | 1  |  0.5237  |
|           rexnet_100            | 1  |  0.4773  |
|            tinynet_a            | 1  |  0.4549  |
|       tf_efficientnet_b0        | 1  |  0.4062  |
|           mobilevit_s           | 1  |  0.3664  |
|          gmlp_s16_224           | 1  |  0.364   |
|        sebotnet33ts_256         | 0  |   0.0    |
|          botnet26t_256          | 0  |   0.0    |
|       eca_botnext26ts_256       | 0  |   0.0    |
|        eca_halonext26ts         | 0  |   0.0    |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|          pnasnet5large          | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|        res2net101_26w_4s        | 1  |     pass      |
|        res2net50_14w_8s         | 1  |     pass      |
|           res2next50            | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|           resnest101e           | 1  |     pass      |
|           rexnet_100            | 1  |     pass      |
|           selecsls42b           | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|     swsl_resnext101_32x16d      | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|         visformer_small         | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|           volo_d1_224           | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|         coat_lite_mini          | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|        convmixer_768_32         | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|             dla102              | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|      xcit_large_24_p8_224       | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|        gluon_xception65         | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|            hrnet_w18            | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|        eca_halonext26ts         | 1  |  fail_to_run  |
|        sebotnet33ts_256         | 1  |  fail_to_run  |
|          botnet26t_256          | 1  |  fail_to_run  |
|       eca_botnext26ts_256       | 1  |  fail_to_run  |
|          ghostnet_100           | 1  | fail_accuracy |
|          cait_m36_384           | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|            hrnet_w18            | 1  |   18.0   |
|          cait_m36_384           | 1  | 17.8966  |
|          pnasnet5large          | 1  | 15.4624  |
|      xcit_large_24_p8_224       | 1  | 12.9297  |
|  swin_base_patch4_window7_224   | 1  | 11.0748  |
|         poolformer_m36          | 1  |  9.9394  |
|        res2net101_26w_4s        | 1  |  9.2803  |
|        twins_pcpvt_base         | 1  |  9.1837  |
|           mobilevit_s           | 1  |  8.716   |
|          jx_nest_base           | 1  |  8.5922  |
|        res2net50_14w_8s         | 1  |  8.3796  |
|            mixnet_l             | 1  |  8.0371  |
|           tf_mixnet_l           | 1  |  7.9076  |
|             dpn107              | 1  |  7.8182  |
|        tnt_s_patch16_224        | 1  |  7.6116  |
|          gmlp_s16_224           | 1  |  7.3036  |
|           dm_nfnet_f0           | 1  |  6.8258  |
|            fbnetv3_b            | 1  |  6.7063  |
|        gluon_xception65         | 1  |  6.5942  |
|           volo_d1_224           | 1  |  6.0258  |
|             dla102              | 1  |  5.8349  |
|          convnext_base          | 1  |  5.6419  |
|         crossvit_9_240          | 1  |  5.3923  |
|           resnest101e           | 1  |  5.2057  |
|        adv_inception_v3         | 1  |  5.1079  |
|       gluon_inception_v3        | 1  |  5.0266  |
|          inception_v3           | 1  |  4.9859  |
|           res2next50            | 1  |  4.9741  |
|          cspdarknet53           | 1  |  4.8769  |
|            nfnet_l0             | 1  |  4.5565  |
|          ghostnet_100           | 1  |  4.4579  |
|            tinynet_a            | 1  |  4.2661  |
|           rexnet_100            | 1  |  4.2428  |
|     swsl_resnext101_32x16d      | 1  |  3.9205  |
|          gmixer_24_224          | 1  |  3.8914  |
|         coat_lite_mini          | 1  |  3.8132  |
|       tf_efficientnet_b0        | 1  |  3.7351  |
|           convit_base           | 1  |  3.7254  |
|        convmixer_768_32         | 1  |  3.6167  |
|         visformer_small         | 1  |  3.6126  |
|           fbnetc_100            | 1  |  3.601   |
|          spnasnet_100           | 1  |  3.5018  |
| deit_base_distilled_patch16_224 | 1  |  3.4593  |
|            pit_b_224            | 1  |  3.4397  |
|      beit_base_patch16_224      | 1  |  3.3555  |
|            gernet_l             | 1  |  3.2903  |
|            repvgg_a2            | 1  |  3.1485  |
|      mobilenetv3_large_100      | 1  |  3.0992  |
|         mobilenetv2_100         | 1  |  3.067   |
|      vit_base_patch16_224       | 1  |  2.9249  |
|           regnety_002           | 1  |  2.8927  |
|          mixer_b16_224          | 1  |  2.8533  |
|           mnasnet_100           | 1  |  2.8398  |
|        ese_vovnet19b_dw         | 1  |  2.4235  |
|            lcnet_050            | 1  |  2.4123  |
|           selecsls42b           | 1  |  2.2144  |
|          resmlp_12_224          | 1  |  1.7754  |
|          botnet26t_256          | 0  |   nan    |
|       eca_botnext26ts_256       | 0  |   nan    |
|        eca_halonext26ts         | 0  |   nan    |
|        sebotnet33ts_256         | 0  |   nan    |
+---------------------------------+----+----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  |  0.983   |
|      vit_base_patch16_224       | 1  |  0.9803  |
|     swsl_resnext101_32x16d      | 1  |  0.9802  |
| deit_base_distilled_patch16_224 | 1  |  0.9785  |
|      xcit_large_24_p8_224       | 1  |  0.9776  |
|            pit_b_224            | 1  |  0.9772  |
|           selecsls42b           | 1  |  0.9742  |
|           convit_base           | 1  |  0.9734  |
|          mixer_b16_224          | 1  |  0.9731  |
|      beit_base_patch16_224      | 1  |  0.9731  |
|        convmixer_768_32         | 1  |  0.9718  |
|        ese_vovnet19b_dw         | 1  |  0.9685  |
|          convnext_base          | 1  |  0.9653  |
|            repvgg_a2            | 1  |  0.965   |
|          resmlp_12_224          | 1  |  0.9649  |
|            gernet_l             | 1  |  0.9642  |
|           dm_nfnet_f0           | 1  |  0.9628  |
|            nfnet_l0             | 1  |  0.9625  |
|            lcnet_050            | 1  |  0.9612  |
|             dpn107              | 1  |  0.9606  |
|         visformer_small         | 1  |  0.9583  |
|          cspdarknet53           | 1  |  0.9557  |
|        gluon_xception65         | 1  |  0.9537  |
|         mobilenetv2_100         | 1  |  0.9518  |
|          gmixer_24_224          | 1  |  0.9517  |
|      mobilenetv3_large_100      | 1  |  0.9515  |
|           regnety_002           | 1  |  0.9499  |
|          jx_nest_base           | 1  |  0.9497  |
|  swin_base_patch4_window7_224   | 1  |  0.9491  |
|           mnasnet_100           | 1  |  0.949   |
|         coat_lite_mini          | 1  |  0.949   |
|           res2next50            | 1  |  0.9488  |
|       tf_efficientnet_b0        | 1  |  0.9463  |
|       gluon_inception_v3        | 1  |  0.9458  |
|            tinynet_a            | 1  |  0.9453  |
|             dla102              | 1  |  0.9445  |
|        adv_inception_v3         | 1  |  0.9433  |
|          inception_v3           | 1  |  0.943   |
|          pnasnet5large          | 1  |  0.9425  |
|           fbnetc_100            | 1  |  0.9409  |
|          spnasnet_100           | 1  |  0.9397  |
|           rexnet_100            | 1  |  0.9397  |
|          ghostnet_100           | 1  |  0.9363  |
|           mobilevit_s           | 1  |  0.9355  |
|           volo_d1_224           | 1  |  0.9336  |
|           resnest101e           | 1  |  0.9329  |
|        res2net101_26w_4s        | 1  |  0.9305  |
|        res2net50_14w_8s         | 1  |  0.9252  |
|           tf_mixnet_l           | 1  |  0.9241  |
|            mixnet_l             | 1  |  0.924   |
|        tnt_s_patch16_224        | 1  |  0.9211  |
|        twins_pcpvt_base         | 1  |  0.9207  |
|         crossvit_9_240          | 1  |  0.9185  |
|            fbnetv3_b            | 1  |  0.9182  |
|         poolformer_m36          | 1  |  0.9156  |
|          gmlp_s16_224           | 1  |  0.9146  |
|            hrnet_w18            | 1  |  0.8641  |
|          botnet26t_256          | 0  |   nan    |
|       eca_botnext26ts_256       | 0  |   nan    |
|        eca_halonext26ts         | 0  |   nan    |
|        sebotnet33ts_256         | 0  |   nan    |
+---------------------------------+----+----------+

@blzheng
Copy link
Collaborator Author

blzheng commented Oct 28, 2022

Performance Dashboard for float32 precision -- Single-Socket Multi-threads

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 94%, 51/54 | 100%, 44/44 | 90%, 55/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.05x    |    1.16x    |    1.05x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    4.21    |    6.31     |    7.17     |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.94x    |    0.94x    |    0.98x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|         soft_actor_critic         | 256  |  1.6285  |
|           timm_resnest            |  32  |  1.3656  |
|             resnet50              |  32  |  1.301   |
|        shufflenet_v2_x1_0         |  64  |  1.2455  |
|            mnasnet1_0             |  32  |  1.1844  |
|                drq                |  1   |  1.146   |
|            densenet121            |  64  |  1.1441  |
|              alexnet              | 128  |  1.1201  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.1175  |
|           squeezenet1_1           |  16  |  1.0813  |
|               vgg16               |  4   |  1.0698  |
|               dlrm                | 2048 |  1.0623  |
|             resnet18              |  8   |  1.0495  |
|            hf_T5_large            |  1   |  1.045   |
|            Super_SloMo            |  6   |  1.0397  |
|       functorch_dp_cifar10        |  64  |  1.0378  |
|            timm_regnet            |  32  |  1.0326  |
|          pytorch_stargan          |  16  |  1.0309  |
|        mobilenet_v3_large         |  32  |  1.0179  |
|               dcgan               | 256  |  1.0161  |
|               hf_T5               |  1   |  1.0094  |
|          LearningToPaint          |  96  |  1.0057  |
|            hf_Reformer            |  1   |  1.0056  |
|      resnet50_quantized_qat       |  32  |  1.0033  |
|    mobilenet_v2_quantized_qat     |  96  |  1.001   |
|            hf_BigBird             |  1   |  1.0007  |
|              demucs               |  1   |  0.9973  |
|            tts_angular            |  64  |  0.996   |
|            timm_vovnet            |  32  |  0.9907  |
|        Background_Matting         |  1   |  0.9842  |
|           BERT_pytorch            |  2   |  0.9805  |
|           hf_GPT2_large           |  1   |  0.9283  |
|        speech_transformer         |  1   |  0.9266  |
|           mobilenet_v2            |  16  |  0.9249  |
|              hf_GPT2              |  1   |  0.9178  |
|           pytorch_unet            |  1   |  0.9163  |
|           hf_Longformer           |  1   |  0.9125  |
|             hf_Albert             |  1   |  0.8922  |
|   timm_vision_transformer_large   |  8   |  0.8638  |
|            hf_T5_base             |  1   |  0.8383  |
|      nvidia_deeprecommender       | 256  |  0.832   |
|           hf_DistilBert           |  1   |  0.8276  |
|              yolov3               |  8   |  0.8185  |
|          resnext50_32x4d          |  8   |  0.8175  |
|              hf_Bert              |  1   |  0.7978  |
|            timm_nfnet             | 128  |  0.7914  |
|              hf_Bart              |  1   |  0.7894  |
|          vision_maskrcnn          |  1   |  0.7852  |
|           fastNLP_Bert            |  1   |  0.7733  |
| attention_is_all_you_need_pytorch |  32  |  0.7196  |
|         timm_efficientnet         |  64  |  0.6824  |
|      timm_vision_transformer      |  8   |  0.681   |
|           lennard_jones           | 1000 |  0.3501  |
|     detectron2_fcos_r_50_fpn      |  0   |   0.0    |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  2  | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip |
|           hf_GPT2_large           |  2  | pass_due_to_skip |
|           BERT_pytorch            |  2  |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |
|        mobilenet_v3_large         |  2  |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|           pytorch_unet            |  2  |       pass       |
|             resnet18              |  2  |       pass       |
|             resnet50              |  2  |       pass       |
|      resnet50_quantized_qat       |  2  |       pass       |
|          resnext50_32x4d          |  2  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|           mobilenet_v2            |  2  |       pass       |
|        speech_transformer         |  2  |       pass       |
|           squeezenet1_1           |  2  |       pass       |
|         timm_efficientnet         |  2  |       pass       |
|            timm_nfnet             |  2  |       pass       |
|            timm_regnet            |  2  |       pass       |
|           timm_resnest            |  2  |       pass       |
|      timm_vision_transformer      |  2  |       pass       |
|            timm_vovnet            |  2  |       pass       |
|            tts_angular            |  2  |       pass       |
|               vgg16               |  2  |       pass       |
|    mobilenet_v2_quantized_qat     |  2  |       pass       |
|            mnasnet1_0             |  2  |       pass       |
|           fastNLP_Bert            |  2  |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |
|          LearningToPaint          |  2  |       pass       |
|            Super_SloMo            |  2  |       pass       |
|              alexnet              |  2  |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |
|               dcgan               |  2  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  2  |       pass       |
|               dlrm                |  2  |       pass       |
|                drq                |  1  |       pass       |
|           lennard_jones           |  2  |       pass       |
|              yolov3               |  2  |       pass       |
|             hf_Albert             |  2  |       pass       |
|              hf_Bart              |  2  |       pass       |
|              hf_Bert              |  2  |       pass       |
|            hf_BigBird             |  2  |       pass       |
|           hf_DistilBert           |  2  |       pass       |
|              hf_GPT2              |  2  |       pass       |
|           hf_Longformer           |  2  |       pass       |
|            hf_Reformer            |  2  |       pass       |
|               hf_T5               |  2  |       pass       |
|            hf_T5_base             |  2  |       pass       |
|        Background_Matting         |  1  |  fail_accuracy   |
|     detectron2_fcos_r_50_fpn      |  0  |      0.0000      |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|            hf_T5_large            |  1   |  20.652  |
|          vision_maskrcnn          |  1   | 16.9233  |
|   timm_vision_transformer_large   |  8   | 12.9825  |
|           hf_GPT2_large           |  1   | 12.2959  |
|           hf_Longformer           |  1   | 11.5857  |
|              yolov3               |  8   | 11.1044  |
|            timm_nfnet             | 128  | 10.2945  |
|            hf_BigBird             |  1   | 10.2459  |
|            hf_T5_base             |  1   | 10.0108  |
|            densenet121            |  64  |  9.4044  |
|            Super_SloMo            |  6   |  6.6149  |
|            timm_regnet            |  32  |  5.5325  |
|        speech_transformer         |  1   |  5.0215  |
|              hf_Bart              |  1   |  4.9445  |
|           fastNLP_Bert            |  1   |  4.4597  |
| attention_is_all_you_need_pytorch |  32  |  4.2971  |
|               hf_T5               |  1   |  4.245   |
|           BERT_pytorch            |  2   |  4.2274  |
|         timm_efficientnet         |  64  |  4.1576  |
|              hf_Bert              |  1   |  3.9968  |
|      timm_vision_transformer      |  8   |  3.8982  |
|        mobilenet_v3_large         |  32  |  3.8922  |
|        shufflenet_v2_x1_0         |  64  |  3.7045  |
|           mobilenet_v2            |  16  |  3.6952  |
|            hf_Reformer            |  1   |  3.5863  |
|              hf_GPT2              |  1   |  3.5126  |
|          resnext50_32x4d          |  8   |  3.4358  |
|            mnasnet1_0             |  32  |  3.329   |
|        Background_Matting         |  1   |  3.2703  |
|            timm_vovnet            |  32  |  3.0503  |
|             hf_Albert             |  1   |  3.0439  |
|             resnet50              |  32  |  2.6625  |
|           timm_resnest            |  32  |  2.5714  |
|       functorch_dp_cifar10        |  64  |  2.4244  |
|           hf_DistilBert           |  1   |  2.098   |
|          LearningToPaint          |  96  |  1.7859  |
|           pytorch_unet            |  1   |  1.5454  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.5254  |
|          pytorch_stargan          |  16  |  1.3323  |
|             resnet18              |  8   |  1.2597  |
|              demucs               |  1   |  1.1289  |
|           squeezenet1_1           |  16  |  0.8397  |
|      nvidia_deeprecommender       | 256  |  0.7496  |
|               dlrm                | 2048 |  0.706   |
|               vgg16               |  4   |  0.5992  |
|            tts_angular            |  64  |  0.5919  |
|              alexnet              | 128  |  0.4548  |
|                drq                |  1   |  0.3724  |
|         soft_actor_critic         | 256  |  0.2193  |
|           lennard_jones           | 1000 |  0.1881  |
|               dcgan               | 256  |  0.1669  |
|    mobilenet_v2_quantized_qat     |  96  |  0.0845  |
|      resnet50_quantized_qat       |  32  |  0.0364  |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|              demucs               |  1   |  0.9989  |
|      resnet50_quantized_qat       |  32  |  0.9956  |
|               vgg16               |  4   |  0.9947  |
|            timm_regnet            |  32  |  0.9938  |
|   timm_vision_transformer_large   |  8   |  0.9936  |
|        Background_Matting         |  1   |  0.9933  |
|          LearningToPaint          |  96  |  0.9918  |
|              alexnet              | 128  |  0.9915  |
|            timm_nfnet             | 128  |  0.9913  |
|           pytorch_unet            |  1   |  0.9904  |
|            hf_BigBird             |  1   |  0.9901  |
|            timm_vovnet            |  32  |  0.9891  |
|            Super_SloMo            |  6   |  0.9878  |
|            densenet121            |  64  |  0.9866  |
|            mnasnet1_0             |  32  |  0.9851  |
|           lennard_jones           | 1000 |  0.9851  |
| attention_is_all_you_need_pytorch |  32  |  0.9834  |
|        mobilenet_v3_large         |  32  |  0.9826  |
|         soft_actor_critic         | 256  |  0.9822  |
|            tts_angular            |  64  |  0.9821  |
|        shufflenet_v2_x1_0         |  64  |  0.9818  |
|           mobilenet_v2            |  16  |  0.9808  |
|           hf_DistilBert           |  1   |  0.9805  |
|                drq                |  1   |  0.9801  |
|           hf_GPT2_large           |  1   |  0.9801  |
|              hf_Bart              |  1   |  0.9781  |
|           BERT_pytorch            |  2   |  0.9753  |
|            hf_T5_base             |  1   |  0.9752  |
|    mobilenet_v2_quantized_qat     |  96  |  0.9742  |
|         timm_efficientnet         |  64  |  0.9734  |
|             resnet50              |  32  |  0.9717  |
|          vision_maskrcnn          |  1   |  0.9709  |
|          pytorch_stargan          |  16  |  0.9709  |
|        speech_transformer         |  1   |  0.9708  |
|             resnet18              |  8   |   0.97   |
|           squeezenet1_1           |  16  |  0.9679  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.9673  |
|              hf_GPT2              |  1   |  0.964   |
|          resnext50_32x4d          |  8   |  0.9555  |
|             hf_Albert             |  1   |  0.9551  |
|              yolov3               |  8   |  0.9545  |
|               dlrm                | 2048 |  0.9473  |
|      timm_vision_transformer      |  8   |  0.9411  |
|           fastNLP_Bert            |  1   |  0.9254  |
|           hf_Longformer           |  1   |  0.9221  |
|       functorch_dp_cifar10        |  64  |  0.9091  |
|            hf_Reformer            |  1   |  0.822   |
|              hf_Bert              |  1   |  0.8084  |
|           timm_resnest            |  32  |  0.8031  |
|               dcgan               | 256  |  0.718   |
|            hf_T5_large            |  1   |  0.6768  |
|      nvidia_deeprecommender       | 256  |  0.587   |
|               hf_T5               |  1   |  0.5784  |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |
+-----------------------------------+------+----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            XLNetLMHeadModel             | 4  |  7.481   |
|       MT5ForConditionalGeneration       | 2  |  6.932   |
|             XGLMForCausalLM             | 1  |  1.3679  |
|     M2M100ForConditionalGeneration      | 2  |  1.3395  |
|               DistillGPT2               | 1  |  1.3298  |
|             OPTForCausalLM              | 4  |  1.291   |
|           ElectraForCausalLM            | 1  |  1.2877  |
|          MobileBertForMaskedLM          | 16 |  1.2679  |
|               GoogleFnet                | 1  |  1.2672  |
|     MobileBertForQuestionAnswering      | 32 |  1.2178  |
|            YituTechConvBert             | 1  |  1.2023  |
|                 BigBird                 | 1  |  1.1067  |
|         MegatronBertForCausalLM         | 2  |  1.0607  |
|            AlbertForMaskedLM            | 2  |  1.0285  |
|                 T5Small                 | 1  |  1.0266  |
|       AlbertForQuestionAnswering        | 2  |  1.022   |
|          AllenaiLongformerBase          | 1  |  1.018   |
|     PegasusForConditionalGeneration     | 4  |  1.0109  |
|     PLBartForConditionalGeneration      | 8  |  0.9846  |
|            TrOCRForCausalLM             | 8  |  0.9816  |
|           RobertaForCausalLM            | 4  |  0.9761  |
|           PegasusForCausalLM            | 8  |  0.9758  |
|                CamemBert                | 1  |  0.9677  |
|           DebertaForMaskedLM            | 4  |  0.9663  |
|         Speech2Text2ForCausalLM         | 64 |  0.9482  |
|       DebertaForQuestionAnswering       | 4  |  0.9393  |
|      MBartForConditionalGeneration      | 8  |  0.9377  |
|            PLBartForCausalLM            | 16 |  0.9336  |
|      GPT2ForSequenceClassification      | 4  |  0.9188  |
|          DistilBertForMaskedLM          | 16 |  0.9062  |
|            MBartForCausalLM             | 16 |  0.8997  |
|    MegatronBertForQuestionAnswering     | 8  |  0.8808  |
|       RobertaForQuestionAnswering       | 64 |  0.8567  |
|    LayoutLMForSequenceClassification    | 16 |  0.854   |
|       T5ForConditionalGeneration        | 4  |  0.8515  |
|           LayoutLMForMaskedLM           | 16 |  0.8487  |
|             BartForCausalLM             | 2  |  0.8387  |
|     DistilBertForQuestionAnswering      | 32 |  0.8373  |
|        BertForQuestionAnswering         | 64 |  0.8344  |
|             BertForMaskedLM             | 64 |  0.8302  |
| BlenderbotSmallForConditionalGeneration | 32 |  0.8262  |
|      BartForConditionalGeneration       | 1  |  0.8234  |
|       ElectraForQuestionAnswering       | 64 |  0.7907  |
|       BlenderbotSmallForCausalLM        | 64 |  0.7829  |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|     MobileBertForQuestionAnswering      | 32 | 14.3407  |
|          MobileBertForMaskedLM          | 16 | 14.3272  |
|          AllenaiLongformerBase          | 1  | 12.8257  |
|      MBartForConditionalGeneration      | 8  | 11.0707  |
|     PegasusForConditionalGeneration     | 4  | 10.7008  |
|                 BigBird                 | 1  | 10.3945  |
|       DebertaForQuestionAnswering       | 4  | 10.3599  |
|      BartForConditionalGeneration       | 1  | 10.1184  |
|           DebertaForMaskedLM            | 4  |  9.705   |
|    MegatronBertForQuestionAnswering     | 8  |  9.4832  |
|     M2M100ForConditionalGeneration      | 2  |  9.4558  |
|         MegatronBertForCausalLM         | 2  |  9.3521  |
|             XGLMForCausalLM             | 1  |  7.5338  |
| BlenderbotSmallForConditionalGeneration | 32 |  6.9944  |
|            YituTechConvBert             | 1  |  6.4043  |
|       ElectraForQuestionAnswering       | 64 |  6.0649  |
|       T5ForConditionalGeneration        | 4  |  5.9274  |
|        BertForQuestionAnswering         | 64 |  5.5988  |
|           LayoutLMForMaskedLM           | 16 |   5.4    |
|       MT5ForConditionalGeneration       | 2  |  5.2979  |
|     PLBartForConditionalGeneration      | 8  |  5.2489  |
|             BertForMaskedLM             | 64 |  5.2009  |
|            MBartForCausalLM             | 16 |  5.0805  |
|    LayoutLMForSequenceClassification    | 16 |  5.0601  |
|      GPT2ForSequenceClassification      | 4  |  4.9866  |
|           RobertaForCausalLM            | 4  |  4.6449  |
|       RobertaForQuestionAnswering       | 64 |  4.5597  |
|             BartForCausalLM             | 2  |  4.5368  |
|             OPTForCausalLM              | 4  |  4.379   |
|                 T5Small                 | 1  |  4.3357  |
|           PegasusForCausalLM            | 8  |  4.3134  |
|                CamemBert                | 1  |  4.1945  |
|            TrOCRForCausalLM             | 8  |  4.1408  |
|           ElectraForCausalLM            | 1  |  4.0948  |
|            AlbertForMaskedLM            | 2  |  3.5141  |
|       AlbertForQuestionAnswering        | 2  |  3.5065  |
|       BlenderbotSmallForCausalLM        | 64 |  3.2783  |
|     DistilBertForQuestionAnswering      | 32 |  2.9662  |
|               GoogleFnet                | 1  |  2.6108  |
|          DistilBertForMaskedLM          | 16 |  2.4159  |
|               DistillGPT2               | 1  |  2.3806  |
|            PLBartForCausalLM            | 16 |  2.2758  |
|         Speech2Text2ForCausalLM         | 64 |  2.1607  |
|            XLNetLMHeadModel             | 4  | -14.658  |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 2  |  0.9964  |
|       AlbertForQuestionAnswering        | 2  |  0.9962  |
|             BertForMaskedLM             | 64 |  0.9954  |
|       BlenderbotSmallForCausalLM        | 64 |  0.9954  |
|      GPT2ForSequenceClassification      | 4  |  0.9954  |
|       ElectraForQuestionAnswering       | 64 |  0.9953  |
|       T5ForConditionalGeneration        | 4  |  0.9939  |
|             BartForCausalLM             | 2  |  0.9936  |
|           DebertaForMaskedLM            | 4  |  0.9922  |
|                 BigBird                 | 1  |  0.9914  |
|            MBartForCausalLM             | 16 |  0.9912  |
|       DebertaForQuestionAnswering       | 4  |  0.9908  |
|      BartForConditionalGeneration       | 1  |  0.9906  |
|            PLBartForCausalLM            | 16 |  0.9902  |
| BlenderbotSmallForConditionalGeneration | 32 |  0.9897  |
|            TrOCRForCausalLM             | 8  |  0.9896  |
|           PegasusForCausalLM            | 8  |  0.9894  |
|               GoogleFnet                | 1  |  0.9891  |
|     DistilBertForQuestionAnswering      | 32 |  0.9889  |
|     PegasusForConditionalGeneration     | 4  |  0.9886  |
|         Speech2Text2ForCausalLM         | 64 |  0.9885  |
|           LayoutLMForMaskedLM           | 16 |  0.988   |
|          DistilBertForMaskedLM          | 16 |  0.9874  |
|      MBartForConditionalGeneration      | 8  |  0.9856  |
|    LayoutLMForSequenceClassification    | 16 |  0.9853  |
|        BertForQuestionAnswering         | 64 |  0.9826  |
|     PLBartForConditionalGeneration      | 8  |  0.9815  |
|             OPTForCausalLM              | 4  |  0.9807  |
|               DistillGPT2               | 1  |  0.9755  |
|          AllenaiLongformerBase          | 1  |  0.974   |
|     M2M100ForConditionalGeneration      | 2  |  0.9724  |
|          MobileBertForMaskedLM          | 16 |  0.9715  |
|           RobertaForCausalLM            | 4  |  0.9678  |
|             XGLMForCausalLM             | 1  |  0.9657  |
|       RobertaForQuestionAnswering       | 64 |  0.9552  |
|           ElectraForCausalLM            | 1  |  0.9379  |
|            XLNetLMHeadModel             | 4  |  0.9116  |
|                CamemBert                | 1  |  0.8447  |
|                 T5Small                 | 1  |  0.8396  |
|     MobileBertForQuestionAnswering      | 32 |  0.8228  |
|            YituTechConvBert             | 1  |  0.8046  |
|         MegatronBertForCausalLM         | 2  |  0.6934  |
|    MegatronBertForQuestionAnswering     | 8  |  0.6877  |
|       MT5ForConditionalGeneration       | 2  |  0.5174  |
+-----------------------------------------+----+----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        res2net101_26w_4s        | 64  |  1.286   |
|           mnasnet_100           | 128 |  1.2825  |
|          spnasnet_100           | 128 |  1.2711  |
|          inception_v3           | 128 |  1.2154  |
|           fbnetc_100            | 128 |  1.2065  |
|       gluon_inception_v3        | 128 |  1.1991  |
|        adv_inception_v3         | 128 |  1.1929  |
|          ghostnet_100           | 128 |  1.1589  |
|        convmixer_768_32         | 32  |  1.1486  |
|      mobilenetv3_large_100      | 128 |  1.1401  |
|            gernet_l             | 128 |  1.1394  |
|             dpn107              | 32  |  1.1393  |
|         mobilenetv2_100         | 128 |  1.1004  |
|           volo_d1_224           | 64  |  1.0784  |
|        res2net50_14w_8s         |  2  |  1.0773  |
|            repvgg_a2            | 128 |  1.072   |
|            hrnet_w18            |  2  |  1.0713  |
|            fbnetv3_b            | 128 |  1.0666  |
|           resnest101e           | 32  |  1.0534  |
|           selecsls42b           | 128 |  1.0523  |
|           regnety_002           | 128 |  1.0394  |
|            lcnet_050            | 128 |  1.0392  |
|     swsl_resnext101_32x16d      | 32  |  1.0289  |
|          pnasnet5large          | 16  |  1.0235  |
|             dla102              | 64  |  0.9799  |
|           tf_mixnet_l           | 64  |  0.9699  |
|          cait_m36_384           |  2  |  0.9604  |
|         poolformer_m36          | 64  |  0.9602  |
|        ese_vovnet19b_dw         | 128 |  0.9537  |
|          gmixer_24_224          | 64  |  0.9481  |
|      xcit_large_24_p8_224       |  5  |  0.9399  |
|            mixnet_l             | 64  |  0.9373  |
|  swin_base_patch4_window7_224   | 64  |  0.8761  |
|      beit_base_patch16_224      | 64  |  0.8741  |
|            nfnet_l0             | 64  |  0.8696  |
|           dm_nfnet_f0           | 128 |  0.859   |
| deit_base_distilled_patch16_224 | 64  |  0.8468  |
|           convit_base           | 32  |  0.843   |
|      vit_base_patch16_224       | 64  |  0.8349  |
|          resmlp_12_224          | 128 |  0.8168  |
|          jx_nest_base           | 32  |  0.8133  |
|        tnt_s_patch16_224        | 64  |  0.8119  |
|           rexnet_100            | 128 |  0.8068  |
|         visformer_small         | 128 |  0.8049  |
|           res2next50            |  2  |  0.8005  |
|          mixer_b16_224          | 64  |  0.7952  |
|          convnext_base          | 32  |  0.7919  |
|            pit_b_224            | 64  |  0.7642  |
|            tinynet_a            | 128 |  0.7436  |
|       tf_efficientnet_b0        | 128 |  0.7389  |
|         coat_lite_mini          | 128 |  0.7325  |
|        gluon_xception65         | 32  |  0.7183  |
|        twins_pcpvt_base         | 32  |  0.7145  |
|         crossvit_9_240          | 64  |  0.6909  |
|          cspdarknet53           | 64  |  0.6884  |
|          gmlp_s16_224           | 64  |  0.5267  |
|           mobilevit_s           | 32  |  0.502   |
|       eca_botnext26ts_256       |  0  |   0.0    |
|        eca_halonext26ts         |  0  |   0.0    |
|          botnet26t_256          |  0  |   0.0    |
|        sebotnet33ts_256         |  0  |   0.0    |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|           resnest101e           | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|  swin_base_patch4_window7_224   | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|         visformer_small         | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|           convit_base           | 2  |     pass      |
|        convmixer_768_32         | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|             dpn107              | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|            hrnet_w18            | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|        eca_halonext26ts         | 2  |  fail_to_run  |
|        sebotnet33ts_256         | 2  |  fail_to_run  |
|          botnet26t_256          | 2  |  fail_to_run  |
|       eca_botnext26ts_256       | 2  |  fail_to_run  |
|          ghostnet_100           | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  | 19.2248  |
|            hrnet_w18            |  2  | 18.9544  |
|  swin_base_patch4_window7_224   | 64  | 14.8893  |
|      xcit_large_24_p8_224       |  5  | 14.0629  |
|           mobilevit_s           | 32  | 13.4383  |
|          cait_m36_384           |  2  | 13.0711  |
|         poolformer_m36          | 64  |  12.376  |
|        twins_pcpvt_base         | 32  | 11.2295  |
|           dm_nfnet_f0           | 128 | 10.6087  |
|           resnest101e           | 32  | 10.3014  |
|           tf_mixnet_l           | 64  |  9.6655  |
|        res2net101_26w_4s        | 64  |  9.6545  |
|          jx_nest_base           | 32  |  9.5788  |
|        tnt_s_patch16_224        | 64  |  9.3977  |
|            mixnet_l             | 64  |  9.3847  |
|        gluon_xception65         | 32  |  9.3583  |
|        res2net50_14w_8s         |  2  |  8.9778  |
|          gmlp_s16_224           | 64  |  8.6621  |
|             dpn107              | 32  |  8.6491  |
|            fbnetv3_b            | 128 |  8.4694  |
|           volo_d1_224           | 64  |  8.0304  |
|          convnext_base          | 32  |  7.4253  |
|         crossvit_9_240          | 64  |  7.0807  |
|            nfnet_l0             | 64  |  7.0103  |
|            pit_b_224            | 64  |  6.5919  |
|             dla102              | 64  |  6.5173  |
|          cspdarknet53           | 64  |  6.4078  |
|         coat_lite_mini          | 128 |  6.2778  |
|       tf_efficientnet_b0        | 128 |  6.0339  |
|     swsl_resnext101_32x16d      | 32  |  5.8507  |
|          ghostnet_100           | 128 |   5.82   |
|           convit_base           | 32  |  5.7277  |
|            tinynet_a            | 128 |  5.679   |
|           rexnet_100            | 128 |  5.6751  |
|          gmixer_24_224          | 64  |  5.6566  |
|           res2next50            |  2  |  5.3908  |
|      vit_base_patch16_224       | 64  |  5.0519  |
|         visformer_small         | 128 |  4.8499  |
|       gluon_inception_v3        | 128 |   4.78   |
|          mixer_b16_224          | 64  |  4.7777  |
|           fbnetc_100            | 128 |  4.7003  |
|      beit_base_patch16_224      | 64  |  4.6882  |
|          inception_v3           | 128 |  4.5974  |
|         mobilenetv2_100         | 128 |  4.5727  |
|        adv_inception_v3         | 128 |  4.5165  |
|            gernet_l             | 128 |  4.3313  |
| deit_base_distilled_patch16_224 | 64  |  4.3017  |
|           regnety_002           | 128 |  4.1075  |
|            repvgg_a2            | 128 |  4.0545  |
|          spnasnet_100           | 128 |  3.9151  |
|      mobilenetv3_large_100      | 128 |  3.5092  |
|           mnasnet_100           | 128 |  3.2211  |
|        convmixer_768_32         | 32  |  3.063   |
|        ese_vovnet19b_dw         | 128 |  2.7839  |
|            lcnet_050            | 128 |  2.1599  |
|           selecsls42b           | 128 |  2.1297  |
|          resmlp_12_224          | 128 |  2.0739  |
|          botnet26t_256          |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        ese_vovnet19b_dw         | 128 |  0.9973  |
|     swsl_resnext101_32x16d      | 32  |  0.9967  |
|      vit_base_patch16_224       | 64  |  0.9965  |
|           dm_nfnet_f0           | 128 |  0.9965  |
|         visformer_small         | 128 |  0.9964  |
|      beit_base_patch16_224      | 64  |  0.9962  |
|          mixer_b16_224          | 64  |  0.9958  |
|            gernet_l             | 128 |  0.9956  |
|         coat_lite_mini          | 128 |  0.9955  |
|        gluon_xception65         | 32  |  0.9955  |
|          cspdarknet53           | 64  |  0.9955  |
|            pit_b_224            | 64  |  0.9954  |
|            fbnetv3_b            | 128 |  0.995   |
|           convit_base           | 32  |  0.9949  |
|           fbnetc_100            | 128 |  0.9947  |
|       tf_efficientnet_b0        | 128 |  0.9944  |
|           rexnet_100            | 128 |  0.9943  |
|          spnasnet_100           | 128 |  0.9943  |
|          gmixer_24_224          | 64  |  0.9943  |
|  swin_base_patch4_window7_224   | 64  |  0.9938  |
|         mobilenetv2_100         | 128 |  0.9938  |
|            repvgg_a2            | 128 |  0.9935  |
|          jx_nest_base           | 32  |  0.9933  |
|            nfnet_l0             | 64  |  0.9925  |
|          convnext_base          | 32  |  0.9922  |
|           mobilevit_s           | 32  |  0.9921  |
|           tf_mixnet_l           | 64  |  0.992   |
|         poolformer_m36          | 64  |  0.9916  |
|             dpn107              | 32  |  0.9916  |
|            mixnet_l             | 64  |  0.9916  |
|          ghostnet_100           | 128 |  0.9914  |
|          pnasnet5large          | 16  |  0.9914  |
|           selecsls42b           | 128 |  0.9907  |
|           mnasnet_100           | 128 |  0.9907  |
|           volo_d1_224           | 64  |  0.9898  |
|      xcit_large_24_p8_224       |  5  |  0.9887  |
|           regnety_002           | 128 |  0.9886  |
|          cait_m36_384           |  2  |  0.9885  |
|       gluon_inception_v3        | 128 |  0.985   |
|        adv_inception_v3         | 128 |  0.9849  |
|          inception_v3           | 128 |  0.9845  |
|          resmlp_12_224          | 128 |  0.9842  |
|      mobilenetv3_large_100      | 128 |  0.9839  |
| deit_base_distilled_patch16_224 | 64  |  0.9824  |
|            tinynet_a            | 128 |  0.9789  |
|            lcnet_050            | 128 |  0.9726  |
|             dla102              | 64  |  0.9696  |
|        convmixer_768_32         | 32  |  0.9682  |
|        res2net101_26w_4s        | 64  |  0.9647  |
|           res2next50            |  2  |  0.9514  |
|           resnest101e           | 32  |  0.951   |
|         crossvit_9_240          | 64  |  0.9487  |
|        twins_pcpvt_base         | 32  |  0.9332  |
|        res2net50_14w_8s         |  2  |  0.9328  |
|          gmlp_s16_224           | 64  |  0.9267  |
|            hrnet_w18            |  2  |  0.8875  |
|        tnt_s_patch16_224        | 64  |  0.7729  |
|          botnet26t_256          |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

@ESI-SYD
Copy link

ESI-SYD commented Nov 3, 2022

Performance Dashboard for float32 precision -- Single-Socket Multi-threads

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 89%, 49/55 | 100%, 44/44 | 89%, 54/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.06x    |    1.07x    |    1.08x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   10.95    |    14.59    |    16.49    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.93x    |    0.97x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|           timm_resnest            |  32  |  1.2876  |
|            densenet121            |  64  |  1.2032  |
|        shufflenet_v2_x1_0         |  64  |  1.1822  |
|             resnet18              |  8   |  1.1751  |
|       functorch_dp_cifar10        |  64  |  1.1673  |
|              alexnet              | 128  |   1.15   |
|           squeezenet1_1           |  16  |  1.1478  |
|            mnasnet1_0             |  32  |  1.1333  |
|           pytorch_unet            |  1   |  1.1325  |
|        mobilenet_v3_large         |  32  |  1.1291  |
|            timm_vovnet            |  32  |  1.1207  |
|          resnext50_32x4d          |  8   |  1.1188  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.1158  |
|             resnet50              |  32  |  1.1079  |
|           mobilenet_v2            |  16  |  1.0973  |
|          vision_maskrcnn          |  1   |  1.0919  |
|               vgg16               |  4   |  1.087   |
|                drq                |  1   |  1.0844  |
|               dcgan               | 256  |  1.0739  |
|            hf_T5_large            |  1   |  1.0637  |
|            timm_regnet            |  32  |  1.0604  |
|               dlrm                | 2048 |  1.0598  |
|          LearningToPaint          |  96  |  1.0565  |
|              yolov3               |  8   |  1.0482  |
|            Super_SloMo            |  6   |  1.0463  |
|            hf_BigBird             |  1   |  1.0328  |
|          pytorch_stargan          |  16  |  1.0299  |
|               hf_T5               |  1   |  1.0255  |
|        Background_Matting         |  1   |  1.0247  |
|     detectron2_fcos_r_50_fpn      |  1   |  1.0231  |
|            hf_Reformer            |  1   |  1.0204  |
|              demucs               |  1   |  1.0015  |
|            tts_angular            |  64  |  1.001   |
|    mobilenet_v2_quantized_qat     |  96  |  1.0008  |
|      resnet50_quantized_qat       |  32  |  0.9899  |
|           BERT_pytorch            |  2   |  0.9855  |
|           hf_GPT2_large           |  1   |  0.9577  |
|              hf_GPT2              |  1   |  0.9225  |
|           hf_Longformer           |  1   |  0.9222  |
|             hf_Albert             |  1   |  0.8953  |
|            hf_T5_base             |  1   |  0.8771  |
|   timm_vision_transformer_large   |  8   |  0.8726  |
|      nvidia_deeprecommender       | 256  |  0.8412  |
|           hf_DistilBert           |  1   |  0.8293  |
|            timm_nfnet             | 128  |  0.8142  |
|         timm_efficientnet         |  64  |  0.8089  |
|              hf_Bart              |  1   |  0.7965  |
|              hf_Bert              |  1   |  0.796   |
|      timm_vision_transformer      |  8   |  0.7337  |
| attention_is_all_you_need_pytorch |  32  |  0.7245  |
|           lennard_jones           | 1000 |  0.3547  |
|        speech_transformer         |  0   |   0.0    |
|         soft_actor_critic         |  0   |   0.0    |
|             tacotron2             |  0   |   0.0    |
|           fastNLP_Bert            |  0   |   0.0    |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+----+------------------+
|               name                | bs |     inductor     |
+-----------------------------------+----+------------------+
|            hf_T5_large            | 2  | pass_due_to_skip |
|           hf_GPT2_large           | 2  | pass_due_to_skip |
|   timm_vision_transformer_large   | 2  | pass_due_to_skip |
|               hf_T5               | 2  |       pass       |
|            hf_Reformer            | 2  |       pass       |
|            hf_T5_base             | 2  |       pass       |
|          LearningToPaint          | 2  |       pass       |
|            Super_SloMo            | 2  |       pass       |
|              alexnet              | 2  |       pass       |
| attention_is_all_you_need_pytorch | 2  |       pass       |
|               dcgan               | 2  |       pass       |
|              demucs               | 1  |       pass       |
|            densenet121            | 2  |       pass       |
|               dlrm                | 2  |       pass       |
|                drq                | 1  |       pass       |
|              yolov3               | 2  |       pass       |
|            mnasnet1_0             | 2  |       pass       |
|             hf_Albert             | 2  |       pass       |
|              hf_Bart              | 2  |       pass       |
|              hf_Bert              | 2  |       pass       |
|            hf_BigBird             | 2  |       pass       |
|           hf_DistilBert           | 2  |       pass       |
|              hf_GPT2              | 2  |       pass       |
|           hf_Longformer           | 2  |       pass       |
|       functorch_dp_cifar10        | 2  |       pass       |
|           lennard_jones           | 2  |       pass       |
|          resnext50_32x4d          | 2  |       pass       |
|           BERT_pytorch            | 2  |       pass       |
|      resnet50_quantized_qat       | 2  |       pass       |
|           mobilenet_v2            | 2  |       pass       |
|    mobilenet_v2_quantized_qat     | 2  |       pass       |
|        mobilenet_v3_large         | 2  |       pass       |
|      nvidia_deeprecommender       | 2  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    | 1  |       pass       |
|          pytorch_stargan          | 16 |       pass       |
|           pytorch_unet            | 2  |       pass       |
|               vgg16               | 2  |       pass       |
|             resnet50              | 2  |       pass       |
|             resnet18              | 2  |       pass       |
|        Background_Matting         | 1  |       pass       |
|        shufflenet_v2_x1_0         | 2  |       pass       |
|           squeezenet1_1           | 2  |       pass       |
|         timm_efficientnet         | 2  |       pass       |
|            timm_nfnet             | 2  |       pass       |
|            timm_regnet            | 2  |       pass       |
|           timm_resnest            | 2  |       pass       |
|      timm_vision_transformer      | 2  |       pass       |
|            timm_vovnet            | 2  |       pass       |
|            tts_angular            | 2  |       pass       |
|     detectron2_fcos_r_50_fpn      | 2  |   fail_to_run    |
|           fastNLP_Bert            | 0  |      0.0000      |
|         soft_actor_critic         | 0  |      0.0000      |
|             tacotron2             | 0  |      0.0000      |
|          vision_maskrcnn          | 0  |      0.0000      |
|        speech_transformer         | 0  |      0.0000      |
+-----------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|     detectron2_fcos_r_50_fpn      |  1   | 60.6662  |
|          vision_maskrcnn          |  1   | 57.9893  |
|            hf_T5_large            |  1   | 30.2159  |
|            hf_BigBird             |  1   | 20.3531  |
|           hf_Longformer           |  1   | 19.4456  |
|   timm_vision_transformer_large   |  8   | 19.3107  |
|           BERT_pytorch            |  2   | 19.2475  |
|            timm_nfnet             | 128  | 19.1044  |
|           hf_GPT2_large           |  1   | 19.0724  |
|              yolov3               |  8   | 18.4934  |
|            hf_T5_base             |  1   |  18.095  |
|            densenet121            |  64  | 16.2855  |
|            Super_SloMo            |  6   |  14.675  |
|         timm_efficientnet         |  64  | 13.0065  |
|              hf_Bart              |  1   | 12.0408  |
|            timm_regnet            |  32  | 11.9215  |
|            hf_Reformer            |  1   | 11.5293  |
| attention_is_all_you_need_pytorch |  32  |  11.421  |
|               hf_T5               |  1   | 11.3392  |
|              hf_Bert              |  1   | 10.6351  |
|        mobilenet_v3_large         |  32  | 10.5692  |
|        shufflenet_v2_x1_0         |  64  | 10.4591  |
|        Background_Matting         |  1   |  10.313  |
|              hf_GPT2              |  1   | 10.1996  |
|           pytorch_unet            |  1   | 10.1988  |
|      timm_vision_transformer      |  8   | 10.1087  |
|           mobilenet_v2            |  16  | 10.0596  |
|          resnext50_32x4d          |  8   |  9.8076  |
|            mnasnet1_0             |  32  |  9.7355  |
|             hf_Albert             |  1   |  9.6999  |
|             resnet50              |  32  |  9.6626  |
|            timm_vovnet            |  32  |  9.639   |
|           timm_resnest            |  32  |  8.6942  |
|           hf_DistilBert           |  1   |  8.6865  |
|       functorch_dp_cifar10        |  64  |  8.1404  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  7.8394  |
|          pytorch_stargan          |  16  |  7.7223  |
|          LearningToPaint          |  96  |  7.7173  |
|             resnet18              |  8   |  7.4809  |
|           squeezenet1_1           |  16  |  7.2604  |
|               vgg16               |  4   |  7.1666  |
|               dlrm                | 2048 |  7.0587  |
|            tts_angular            |  64  |  7.012   |
|      nvidia_deeprecommender       | 256  |  6.9814  |
|              alexnet              | 128  |  6.8342  |
|                drq                |  1   |  6.5838  |
|               dcgan               | 256  |  6.5091  |
|           lennard_jones           | 1000 |  6.4467  |
|              demucs               |  1   |  1.4501  |
|      resnet50_quantized_qat       |  32  |  0.1674  |
|    mobilenet_v2_quantized_qat     |  96  |  0.1381  |
|           fastNLP_Bert            |  0   |   nan    |
|         soft_actor_critic         |  0   |   nan    |
|        speech_transformer         |  0   |   nan    |
|             tacotron2             |  0   |   nan    |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|              demucs               |  1   |  0.9988  |
|      resnet50_quantized_qat       |  32  |  0.9962  |
|            hf_T5_base             |  1   |  0.9954  |
|            timm_nfnet             | 128  |  0.9928  |
|        Background_Matting         |  1   |  0.9925  |
|               vgg16               |  4   |  0.9925  |
|          LearningToPaint          |  96  |  0.9903  |
|            Super_SloMo            |  6   |  0.9903  |
|         timm_efficientnet         |  64  |  0.9902  |
|            hf_BigBird             |  1   |  0.9899  |
|           pytorch_unet            |  1   |  0.9891  |
|   timm_vision_transformer_large   |  8   |  0.9872  |
|            mnasnet1_0             |  32  |  0.9841  |
|           lennard_jones           | 1000 |  0.9824  |
| attention_is_all_you_need_pytorch |  32  |  0.982   |
|           mobilenet_v2            |  16  |  0.9801  |
|           hf_GPT2_large           |  1   |  0.9796  |
|        shufflenet_v2_x1_0         |  64  |  0.9793  |
|           hf_DistilBert           |  1   |  0.9792  |
|              alexnet              | 128  |  0.9791  |
|        mobilenet_v3_large         |  32  |  0.979   |
|          resnext50_32x4d          |  8   |  0.9787  |
|            timm_vovnet            |  32  |  0.9786  |
|     detectron2_fcos_r_50_fpn      |  1   |  0.9784  |
|                drq                |  1   |  0.9769  |
|              hf_Bart              |  1   |  0.9763  |
|    mobilenet_v2_quantized_qat     |  96  |  0.9732  |
|           BERT_pytorch            |  2   |  0.9725  |
|           timm_resnest            |  32  |  0.9704  |
|             resnet50              |  32  |  0.9696  |
|          pytorch_stargan          |  16  |  0.9637  |
|              yolov3               |  8   |  0.9567  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.9535  |
|              hf_GPT2              |  1   |  0.9527  |
|            tts_angular            |  64  |  0.9512  |
|               dlrm                | 2048 |  0.9468  |
|             hf_Albert             |  1   |  0.941   |
|      timm_vision_transformer      |  8   |  0.9341  |
|           hf_Longformer           |  1   |  0.9206  |
|       functorch_dp_cifar10        |  64  |  0.8988  |
|             resnet18              |  8   |  0.8541  |
|          vision_maskrcnn          |  1   |  0.853   |
|            hf_Reformer            |  1   |  0.8337  |
|            timm_regnet            |  32  |  0.8202  |
|           squeezenet1_1           |  16  |  0.7868  |
|            densenet121            |  64  |  0.7861  |
|              hf_Bert              |  1   |  0.7728  |
|               hf_T5               |  1   |  0.7683  |
|               dcgan               | 256  |  0.6899  |
|            hf_T5_large            |  1   |  0.6732  |
|      nvidia_deeprecommender       | 256  |  0.583   |
|           fastNLP_Bert            |  0   |   nan    |
|         soft_actor_critic         |  0   |   nan    |
|        speech_transformer         |  0   |   nan    |
|             tacotron2             |  0   |   nan    |
+-----------------------------------+------+----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       MT5ForConditionalGeneration       |  8  |  2.5426  |
|            XLNetLMHeadModel             | 32  |  1.9834  |
|     MobileBertForQuestionAnswering      | 64  |  1.3245  |
|               DistillGPT2               |  1  |  1.3185  |
|               GoogleFnet                |  1  |  1.2467  |
|            YituTechConvBert             |  1  |  1.1918  |
|          MobileBertForMaskedLM          | 32  |  1.1428  |
|                 BigBird                 |  1  |  1.0946  |
|     M2M100ForConditionalGeneration      |  8  |  1.0897  |
|             XGLMForCausalLM             |  8  |  1.0627  |
|          AllenaiLongformerBase          |  1  |  1.032   |
|                 T5Small                 |  1  |  1.0308  |
|            AlbertForMaskedLM            |  4  |  1.0299  |
|       AlbertForQuestionAnswering        |  4  |  1.0296  |
|             OPTForCausalLM              | 32  |  1.0154  |
|           DebertaForMaskedLM            |  4  |  0.9783  |
|      GPT2ForSequenceClassification      |  4  |  0.9597  |
|         Speech2Text2ForCausalLM         | 128 |  0.9463  |
|                CamemBert                |  1  |  0.9439  |
|       DebertaForQuestionAnswering       |  8  |  0.9296  |
|           RobertaForCausalLM            | 64  |  0.9035  |
|           ElectraForCausalLM            | 32  |  0.9019  |
|     PLBartForConditionalGeneration      | 16  |  0.8948  |
|      MBartForConditionalGeneration      | 16  |  0.8879  |
|            PLBartForCausalLM            | 32  |  0.8864  |
|    LayoutLMForSequenceClassification    | 16  |  0.8858  |
|         MegatronBertForCausalLM         | 16  |  0.8855  |
|     PegasusForConditionalGeneration     | 16  |  0.8843  |
|       T5ForConditionalGeneration        |  4  |  0.878   |
|            TrOCRForCausalLM             | 32  |  0.8737  |
|       RobertaForQuestionAnswering       | 128 |  0.8721  |
|           LayoutLMForMaskedLM           | 16  |  0.8682  |
|            MBartForCausalLM             | 32  |  0.8672  |
|           PegasusForCausalLM            | 32  |  0.8666  |
|    MegatronBertForQuestionAnswering     | 16  |  0.8616  |
|             BertForMaskedLM             | 64  |  0.8443  |
|     DistilBertForQuestionAnswering      | 64  |  0.8442  |
|        BertForQuestionAnswering         | 128 |  0.8413  |
|          DistilBertForMaskedLM          | 64  |  0.8356  |
|             BartForCausalLM             |  4  |  0.8227  |
|      BartForConditionalGeneration       |  2  |  0.8191  |
|       ElectraForQuestionAnswering       | 64  |  0.8126  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.8103  |
|       BlenderbotSmallForCausalLM        | 64  |  0.7972  |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|          MobileBertForMaskedLM          | 32  | 21.5827  |
|            YituTechConvBert             |  1  | 20.8297  |
|     MobileBertForQuestionAnswering      | 64  | 20.8148  |
|          AllenaiLongformerBase          |  1  | 20.7919  |
|           PegasusForCausalLM            | 32  | 20.7313  |
|     PegasusForConditionalGeneration     | 16  | 20.7223  |
|       DebertaForQuestionAnswering       |  8  | 20.5132  |
|           DebertaForMaskedLM            |  4  | 19.8944  |
|                 BigBird                 |  1  | 19.2729  |
|     M2M100ForConditionalGeneration      |  8  | 19.2597  |
|       AlbertForQuestionAnswering        |  4  | 18.5805  |
|      BartForConditionalGeneration       |  2  | 18.2648  |
|      MBartForConditionalGeneration      | 16  |  17.935  |
|             XGLMForCausalLM             |  8  | 17.3405  |
|     DistilBertForQuestionAnswering      | 64  | 16.6965  |
|         MegatronBertForCausalLM         | 16  | 16.4154  |
|    MegatronBertForQuestionAnswering     | 16  | 16.1948  |
|     PLBartForConditionalGeneration      | 16  | 14.9079  |
| BlenderbotSmallForConditionalGeneration | 64  | 14.6041  |
|       MT5ForConditionalGeneration       |  8  | 12.8268  |
|           RobertaForCausalLM            | 64  | 12.7428  |
|       T5ForConditionalGeneration        |  4  | 12.6852  |
|       RobertaForQuestionAnswering       | 128 | 12.6526  |
|           LayoutLMForMaskedLM           | 16  | 12.4495  |
|       ElectraForQuestionAnswering       | 64  | 12.3243  |
|           ElectraForCausalLM            | 32  | 12.2365  |
|             BertForMaskedLM             | 64  | 12.1568  |
|    LayoutLMForSequenceClassification    | 16  | 12.0709  |
|             BartForCausalLM             |  4  | 11.8716  |
|            MBartForCausalLM             | 32  | 11.7556  |
|        BertForQuestionAnswering         | 128 | 11.5338  |
|            TrOCRForCausalLM             | 32  | 11.3487  |
|                 T5Small                 |  1  | 11.3371  |
|      GPT2ForSequenceClassification      |  4  | 11.2746  |
|            AlbertForMaskedLM            |  4  | 11.0849  |
|             OPTForCausalLM              | 32  | 11.0221  |
|               GoogleFnet                |  1  | 10.7448  |
|                CamemBert                |  1  |  10.619  |
|         Speech2Text2ForCausalLM         | 128 | 10.3587  |
|       BlenderbotSmallForCausalLM        | 64  | 10.1789  |
|          DistilBertForMaskedLM          | 64  |  9.5233  |
|            PLBartForCausalLM            | 32  |  9.1554  |
|               DistillGPT2               |  1  |  8.2035  |
|            XLNetLMHeadModel             | 32  | -8.5389  |
+-----------------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  |  0.9977  |
|       AlbertForQuestionAnswering        |  4  |  0.9977  |
|             BartForCausalLM             |  4  |  0.9962  |
|       ElectraForQuestionAnswering       | 64  |  0.9962  |
|           RobertaForCausalLM            | 64  |  0.9956  |
|           ElectraForCausalLM            | 32  |  0.9954  |
|       BlenderbotSmallForCausalLM        | 64  |  0.9953  |
|             BertForMaskedLM             | 64  |  0.9952  |
|          DistilBertForMaskedLM          | 64  |  0.9946  |
|            TrOCRForCausalLM             | 32  |  0.9941  |
|      GPT2ForSequenceClassification      |  4  |  0.9941  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.9939  |
|         Speech2Text2ForCausalLM         | 128 |  0.9938  |
|       DebertaForQuestionAnswering       |  8  |  0.9938  |
|           PegasusForCausalLM            | 32  |  0.9935  |
|            MBartForCausalLM             | 32  |  0.9933  |
|       T5ForConditionalGeneration        |  4  |  0.9932  |
|             OPTForCausalLM              | 32  |  0.993   |
|           LayoutLMForMaskedLM           | 16  |  0.993   |
|    LayoutLMForSequenceClassification    | 16  |  0.9927  |
|            PLBartForCausalLM            | 32  |  0.9925  |
|     PegasusForConditionalGeneration     | 16  |  0.9921  |
|      BartForConditionalGeneration       |  2  |  0.9919  |
|                 BigBird                 |  1  |  0.9915  |
|           DebertaForMaskedLM            |  4  |  0.9913  |
|       RobertaForQuestionAnswering       | 128 |  0.9912  |
|      MBartForConditionalGeneration      | 16  |  0.9876  |
|         MegatronBertForCausalLM         | 16  |  0.9867  |
|               GoogleFnet                |  1  |  0.9838  |
|     PLBartForConditionalGeneration      | 16  |  0.9834  |
|               DistillGPT2               |  1  |  0.9828  |
|        BertForQuestionAnswering         | 128 |  0.9817  |
|          MobileBertForMaskedLM          | 32  |  0.9802  |
|     DistilBertForQuestionAnswering      | 64  |  0.9799  |
|            XLNetLMHeadModel             | 32  |  0.9733  |
|             XGLMForCausalLM             |  8  |  0.964   |
|    MegatronBertForQuestionAnswering     | 16  |  0.9603  |
|            YituTechConvBert             |  1  |  0.9454  |
|                 T5Small                 |  1  |  0.9432  |
|     M2M100ForConditionalGeneration      |  8  |  0.9419  |
|          AllenaiLongformerBase          |  1  |  0.9138  |
|     MobileBertForQuestionAnswering      | 64  |  0.8877  |
|                CamemBert                |  1  |  0.8388  |
|       MT5ForConditionalGeneration       |  8  |  0.7901  |
+-----------------------------------------+-----+----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  |  1.4327  |
|          inception_v3           | 128 |  1.2815  |
|       gluon_inception_v3        | 128 |  1.2746  |
|        adv_inception_v3         | 128 |   1.26   |
|           fbnetc_100            | 128 |  1.1964  |
|           res2next50            | 128 |  1.1947  |
|           mnasnet_100           | 128 |  1.1861  |
|          spnasnet_100           | 128 |  1.1838  |
|        res2net50_14w_8s         | 128 |  1.1719  |
|        ese_vovnet19b_dw         | 128 |  1.1705  |
|      mobilenetv3_large_100      | 128 |  1.1667  |
|             dpn107              | 32  |  1.1605  |
|             dla102              | 128 |  1.1539  |
|          ghostnet_100           | 128 |  1.1521  |
|            lcnet_050            | 128 |  1.1492  |
|            hrnet_w18            | 128 |  1.1482  |
|        res2net101_26w_4s        | 64  |  1.1464  |
|           volo_d1_224           | 64  |  1.1299  |
|            gernet_l             | 128 |  1.122   |
|            repvgg_a2            | 128 |  1.1182  |
|        gluon_xception65         | 32  |  1.1142  |
|         visformer_small         | 128 |  1.093   |
|           selecsls42b           | 128 |  1.0929  |
|            fbnetv3_b            | 128 |  1.0894  |
|     swsl_resnext101_32x16d      | 32  |  1.0821  |
|         mobilenetv2_100         | 128 |  1.0725  |
|           regnety_002           | 128 |  1.0644  |
|           tf_mixnet_l           | 128 |  1.0574  |
|          cspdarknet53           | 64  |  1.0558  |
|           resnest101e           | 64  |  1.0425  |
|      xcit_large_24_p8_224       |  5  |  1.0173  |
|          gmixer_24_224          | 128 |  1.0126  |
|            mixnet_l             | 128 |  1.0071  |
|          cait_m36_384           |  4  |  0.9831  |
|            nfnet_l0             | 128 |  0.9106  |
|  swin_base_patch4_window7_224   | 64  |  0.8936  |
|      beit_base_patch16_224      | 64  |  0.8895  |
|           rexnet_100            | 128 |  0.8856  |
|          convnext_base          | 64  |  0.8827  |
|           dm_nfnet_f0           | 128 |  0.879   |
|           convit_base           | 64  |  0.8711  |
|         poolformer_m36          | 64  |  0.8597  |
|       tf_efficientnet_b0        | 128 |  0.8576  |
|      vit_base_patch16_224       | 64  |  0.8576  |
| deit_base_distilled_patch16_224 | 64  |  0.8571  |
|           mobilevit_s           | 64  |  0.8447  |
|          jx_nest_base           | 32  |  0.8343  |
|          resmlp_12_224          | 128 |  0.8309  |
|            tinynet_a            | 128 |  0.8178  |
|        tnt_s_patch16_224        | 128 |  0.8126  |
|            pit_b_224            | 64  |  0.8018  |
|          mixer_b16_224          | 128 |  0.7926  |
|         coat_lite_mini          | 128 |  0.7568  |
|        twins_pcpvt_base         | 64  |  0.7359  |
|         crossvit_9_240          | 128 |  0.7294  |
|          gmlp_s16_224           | 128 |  0.5807  |
|        sebotnet33ts_256         |  0  |   0.0    |
|       eca_botnext26ts_256       |  0  |   0.0    |
|        eca_halonext26ts         |  0  |   0.0    |
|          botnet26t_256          |  0  |   0.0    |
|        convmixer_768_32         |  0  |   0.0    |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|             dpn107              | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|           convit_base           | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|            hrnet_w18            | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|         visformer_small         | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|           resnest101e           | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |     pass      |
|  swin_base_patch4_window7_224   | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|        eca_halonext26ts         | 2  |  fail_to_run  |
|        convmixer_768_32         | 2  |  fail_to_run  |
|        sebotnet33ts_256         | 2  |  fail_to_run  |
|          botnet26t_256          | 2  |  fail_to_run  |
|       eca_botnext26ts_256       | 2  |  fail_to_run  |
|          ghostnet_100           | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|            hrnet_w18            | 128 | 38.6761  |
|          pnasnet5large          | 16  | 30.2884  |
|  swin_base_patch4_window7_224   | 64  | 24.3665  |
|     swsl_resnext101_32x16d      | 32  | 22.5364  |
|          cait_m36_384           |  4  | 22.3963  |
|        twins_pcpvt_base         | 64  | 21.5813  |
|           tf_mixnet_l           | 128 | 21.4847  |
|           dm_nfnet_f0           | 128 | 21.4823  |
|           resnest101e           | 64  | 21.2826  |
|        res2net50_14w_8s         | 128 | 21.0787  |
|             dla102              | 128 | 20.9507  |
|         poolformer_m36          | 64  | 20.5791  |
|           mobilevit_s           | 64  | 20.5489  |
|      xcit_large_24_p8_224       |  5  | 20.2446  |
|        res2net101_26w_4s        | 64  | 20.0055  |
|            mixnet_l             | 128 | 19.8109  |
|        tnt_s_patch16_224        | 128 | 19.3401  |
|            fbnetv3_b            | 128 | 18.1484  |
|          gmlp_s16_224           | 128 |  18.061  |
|           regnety_002           | 128 | 18.0097  |
|             dpn107              | 32  | 17.9336  |
|           rexnet_100            | 128 | 17.8279  |
|            nfnet_l0             | 128 | 17.8076  |
|          jx_nest_base           | 32  | 17.2234  |
|        gluon_xception65         | 32  |  16.187  |
|           res2next50            | 128 | 15.6389  |
|          convnext_base          | 64  | 15.6367  |
|         crossvit_9_240          | 128 | 15.5356  |
|           volo_d1_224           | 64  | 15.4737  |
|       tf_efficientnet_b0        | 128 | 15.2815  |
|            tinynet_a            | 128 | 15.0834  |
|        adv_inception_v3         | 128 | 14.7912  |
|          inception_v3           | 128 | 14.7196  |
|       gluon_inception_v3        | 128 | 14.6679  |
|          ghostnet_100           | 128 | 14.5108  |
|         coat_lite_mini          | 128 | 14.4649  |
|           convit_base           | 64  | 14.0784  |
|            pit_b_224            | 64  |  14.03   |
|          cspdarknet53           | 64  | 13.4574  |
|          gmixer_24_224          | 128 | 12.8121  |
|          mixer_b16_224          | 128 | 12.6285  |
|      mobilenetv3_large_100      | 128 | 12.2343  |
|      beit_base_patch16_224      | 64  | 12.1593  |
|         mobilenetv2_100         | 128 | 12.1048  |
|           fbnetc_100            | 128 | 11.9571  |
|          spnasnet_100           | 128 | 11.8834  |
| deit_base_distilled_patch16_224 | 64  |  11.866  |
|         visformer_small         | 128 | 11.8494  |
|      vit_base_patch16_224       | 64  | 11.7031  |
|            gernet_l             | 128 | 11.1115  |
|           mnasnet_100           | 128 | 10.9839  |
|            repvgg_a2            | 128 |  10.406  |
|        ese_vovnet19b_dw         | 128 |  9.8925  |
|           selecsls42b           | 128 |  9.7574  |
|          resmlp_12_224          | 128 |  9.6532  |
|            lcnet_050            | 128 |  9.0859  |
|          botnet26t_256          |  0  |   nan    |
|        convmixer_768_32         |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          mixer_b16_224          | 128 |  0.9979  |
|        ese_vovnet19b_dw         | 128 |  0.9974  |
|           res2next50            | 128 |  0.9972  |
|         visformer_small         | 128 |  0.9971  |
|     swsl_resnext101_32x16d      | 32  |  0.997   |
|         coat_lite_mini          | 128 |  0.9969  |
|           dm_nfnet_f0           | 128 |  0.9968  |
|           tf_mixnet_l           | 128 |  0.9966  |
|            mixnet_l             | 128 |  0.9965  |
| deit_base_distilled_patch16_224 | 64  |  0.9965  |
|           convit_base           | 64  |  0.9963  |
|      vit_base_patch16_224       | 64  |  0.9963  |
|            gernet_l             | 128 |  0.9962  |
|        adv_inception_v3         | 128 |  0.9962  |
|      beit_base_patch16_224      | 64  |  0.9961  |
|          resmlp_12_224          | 128 |  0.996   |
|        gluon_xception65         | 32  |  0.996   |
|          gmixer_24_224          | 128 |  0.996   |
|            fbnetv3_b            | 128 |  0.9959  |
|       tf_efficientnet_b0        | 128 |  0.9959  |
|          convnext_base          | 64  |  0.9958  |
|          cspdarknet53           | 64  |  0.9958  |
|           resnest101e           | 64  |  0.9958  |
|        res2net50_14w_8s         | 128 |  0.9957  |
|            pit_b_224            | 64  |  0.9957  |
|           selecsls42b           | 128 |  0.9957  |
|           mobilevit_s           | 64  |  0.9956  |
|          gmlp_s16_224           | 128 |  0.9955  |
|            nfnet_l0             | 128 |  0.9948  |
|           fbnetc_100            | 128 |  0.9948  |
|          spnasnet_100           | 128 |  0.9945  |
|           rexnet_100            | 128 |  0.9945  |
|           mnasnet_100           | 128 |  0.9944  |
|        tnt_s_patch16_224        | 128 |  0.9942  |
|             dpn107              | 32  |  0.994   |
|            repvgg_a2            | 128 |  0.9939  |
|      mobilenetv3_large_100      | 128 |  0.9938  |
|         mobilenetv2_100         | 128 |  0.9936  |
|       gluon_inception_v3        | 128 |  0.9935  |
|  swin_base_patch4_window7_224   | 64  |  0.9935  |
|          inception_v3           | 128 |  0.9934  |
|         poolformer_m36          | 64  |  0.9933  |
|            tinynet_a            | 128 |  0.9928  |
|          cait_m36_384           |  4  |  0.9927  |
|          jx_nest_base           | 32  |  0.9926  |
|          ghostnet_100           | 128 |  0.9921  |
|        twins_pcpvt_base         | 64  |  0.992   |
|        res2net101_26w_4s        | 64  |  0.992   |
|          pnasnet5large          | 16  |  0.9919  |
|            hrnet_w18            | 128 |  0.9917  |
|         crossvit_9_240          | 128 |  0.9915  |
|           volo_d1_224           | 64  |  0.9911  |
|      xcit_large_24_p8_224       |  5  |  0.9894  |
|           regnety_002           | 128 |  0.9887  |
|            lcnet_050            | 128 |  0.9881  |
|             dla102              | 128 |  0.9829  |
|          botnet26t_256          |  0  |   nan    |
|        convmixer_768_32         |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

@ESI-SYD
Copy link

ESI-SYD commented Nov 3, 2022

Performance Dashboard for float32 precision -- Single-core Single-thread

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 91%, 50/55 | 100%, 44/44 | 89%, 54/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.07x    |     1.00x   |    1.10x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    18.23   |    18.39    |    21.87    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.18x    |    1.32x    |    1.11x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+----+----------+
|               name                | bs | inductor |
+-----------------------------------+----+----------+
|        shufflenet_v2_x1_0         | 1  |  1.6303  |
|           timm_resnest            | 1  |  1.3047  |
| attention_is_all_you_need_pytorch | 1  |  1.2927  |
|           mobilenet_v2            | 1  |  1.2252  |
|           squeezenet1_1           | 1  |  1.2142  |
|   pytorch_CycleGAN_and_pix2pix    | 1  |  1.1865  |
|            densenet121            | 1  |  1.1803  |
|            mnasnet1_0             | 1  |  1.1663  |
|        mobilenet_v3_large         | 1  |  1.1543  |
|       functorch_dp_cifar10        | 1  |  1.1534  |
|          resnext50_32x4d          | 1  |  1.1496  |
|             resnet18              | 1  |  1.1423  |
|             resnet50              | 1  |  1.136   |
|          vision_maskrcnn          | 1  |  1.1325  |
|            timm_vovnet            | 1  |  1.1164  |
|     detectron2_fcos_r_50_fpn      | 1  |  1.0986  |
|               vgg16               | 1  |  1.0902  |
|              alexnet              | 1  |  1.0841  |
|                drq                | 1  |  1.0802  |
|           pytorch_unet            | 1  |  1.0801  |
|               dlrm                | 1  |   1.07   |
|            timm_regnet            | 1  |  1.0692  |
|          LearningToPaint          | 1  |  1.0669  |
|              yolov3               | 1  |  1.0582  |
|            Super_SloMo            | 1  |  1.0568  |
|               dcgan               | 1  |   1.03   |
|          pytorch_stargan          | 16 |  1.0153  |
|        Background_Matting         | 1  |  1.015   |
|              demucs               | 1  |  1.0007  |
|      resnet50_quantized_qat       | 1  |  0.9971  |
|            tts_angular            | 1  |  0.9956  |
|            timm_nfnet             | 1  |  0.994   |
|            hf_BigBird             | 1  |  0.9933  |
|    mobilenet_v2_quantized_qat     | 1  |  0.9929  |
|      nvidia_deeprecommender       | 1  |  0.9801  |
|              hf_GPT2              | 1  |  0.8869  |
|             hf_Albert             | 1  |  0.8366  |
|           hf_Longformer           | 1  |  0.8335  |
|            hf_Reformer            | 1  |  0.824   |
|           lennard_jones           | 1  |  0.8225  |
|           BERT_pytorch            | 1  |  0.8156  |
|            hf_T5_large            | 1  |  0.8146  |
|   timm_vision_transformer_large   | 1  |  0.8083  |
|           hf_GPT2_large           | 1  |  0.7638  |
|               hf_T5               | 1  |  0.7614  |
|           hf_DistilBert           | 1  |  0.7511  |
|              hf_Bert              | 1  |  0.719   |
|              hf_Bart              | 1  |  0.7189  |
|            hf_T5_base             | 1  |  0.6985  |
|      timm_vision_transformer      | 1  |  0.6076  |
|         timm_efficientnet         | 1  |  0.5181  |
|           fastNLP_Bert            | 0  |   0.0    |
|             tacotron2             | 0  |   0.0    |
|        speech_transformer         | 0  |   0.0    |
|         soft_actor_critic         | 0  |   0.0    |
+-----------------------------------+----+----------+

Accuracy

+-----------------------------------+----+------------------+
|               name                | bs |     inductor     |
+-----------------------------------+----+------------------+
|            hf_T5_large            | 1  | pass_due_to_skip |
|           hf_GPT2_large           | 1  | pass_due_to_skip |
|   timm_vision_transformer_large   | 1  | pass_due_to_skip |
|               hf_T5               | 1  |       pass       |
|               dlrm                | 1  |       pass       |
|            hf_T5_base             | 1  |       pass       |
|          LearningToPaint          | 1  |       pass       |
|            Super_SloMo            | 1  |       pass       |
|              alexnet              | 1  |       pass       |
| attention_is_all_you_need_pytorch | 1  |       pass       |
|               dcgan               | 1  |       pass       |
|              demucs               | 1  |       pass       |
|            densenet121            | 1  |       pass       |
|     detectron2_fcos_r_50_fpn      | 1  |       pass       |
|                drq                | 1  |       pass       |
|            hf_Reformer            | 1  |       pass       |
|            mnasnet1_0             | 1  |       pass       |
|       functorch_dp_cifar10        | 1  |       pass       |
|             hf_Albert             | 1  |       pass       |
|              hf_Bart              | 1  |       pass       |
|              hf_Bert              | 1  |       pass       |
|            hf_BigBird             | 1  |       pass       |
|           hf_DistilBert           | 1  |       pass       |
|              hf_GPT2              | 1  |       pass       |
|           hf_Longformer           | 1  |       pass       |
|              yolov3               | 1  |       pass       |
|           lennard_jones           | 1  |       pass       |
|          resnext50_32x4d          | 1  |       pass       |
|           BERT_pytorch            | 1  |       pass       |
|      resnet50_quantized_qat       | 1  |       pass       |
|           mobilenet_v2            | 1  |       pass       |
|    mobilenet_v2_quantized_qat     | 1  |       pass       |
|        mobilenet_v3_large         | 1  |       pass       |
|      nvidia_deeprecommender       | 1  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    | 1  |       pass       |
|          pytorch_stargan          | 16 |       pass       |
|           pytorch_unet            | 1  |       pass       |
|               vgg16               | 1  |       pass       |
|             resnet50              | 1  |       pass       |
|             resnet18              | 1  |       pass       |
|        Background_Matting         | 1  |       pass       |
|        shufflenet_v2_x1_0         | 1  |       pass       |
|           squeezenet1_1           | 1  |       pass       |
|         timm_efficientnet         | 1  |       pass       |
|            timm_nfnet             | 1  |       pass       |
|            timm_regnet            | 1  |       pass       |
|           timm_resnest            | 1  |       pass       |
|      timm_vision_transformer      | 1  |       pass       |
|            timm_vovnet            | 1  |       pass       |
|            tts_angular            | 1  |       pass       |
|         soft_actor_critic         | 0  |      0.0000      |
|           fastNLP_Bert            | 0  |      0.0000      |
|             tacotron2             | 0  |      0.0000      |
|          vision_maskrcnn          | 0  |      0.0000      |
|        speech_transformer         | 0  |      0.0000      |
+-----------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------+----+----------+
|               name                | bs | inductor |
+-----------------------------------+----+----------+
|          vision_maskrcnn          | 1  | 67.9055  |
|     detectron2_fcos_r_50_fpn      | 1  | 65.0179  |
|            hf_T5_base             | 1  | 61.0279  |
|            hf_T5_large            | 1  | 55.3729  |
|           hf_GPT2_large           | 1  | 44.9441  |
|            densenet121            | 1  | 32.9563  |
|           hf_Longformer           | 1  | 31.9838  |
|            timm_nfnet             | 1  | 26.7587  |
|            hf_BigBird             | 1  | 26.0525  |
|              yolov3               | 1  | 25.8128  |
|   timm_vision_transformer_large   | 1  | 25.5415  |
|         timm_efficientnet         | 1  | 24.3729  |
|            Super_SloMo            | 1  | 23.1083  |
|        mobilenet_v3_large         | 1  | 22.8879  |
|            timm_vovnet            | 1  | 22.6841  |
|            timm_regnet            | 1  | 19.7768  |
|              hf_Bart              | 1  | 17.1645  |
|            mnasnet1_0             | 1  |  17.093  |
|        shufflenet_v2_x1_0         | 1  | 17.0703  |
|               hf_T5               | 1  | 16.8451  |
|           mobilenet_v2            | 1  | 16.7573  |
|            hf_Reformer            | 1  | 16.7426  |
|           timm_resnest            | 1  |  15.893  |
|        Background_Matting         | 1  | 15.1108  |
|             resnet50              | 1  | 15.0244  |
|          resnext50_32x4d          | 1  | 15.0111  |
|              hf_GPT2              | 1  | 14.6155  |
|           BERT_pytorch            | 1  | 13.7351  |
|              hf_Bert              | 1  | 13.1208  |
|      timm_vision_transformer      | 1  | 13.1204  |
| attention_is_all_you_need_pytorch | 1  | 12.8987  |
|       functorch_dp_cifar10        | 1  | 12.6869  |
|             hf_Albert             | 1  | 12.3245  |
|           pytorch_unet            | 1  |  12.05   |
|             resnet18              | 1  | 11.8429  |
|           hf_DistilBert           | 1  |  11.227  |
|          LearningToPaint          | 1  | 11.1157  |
|           squeezenet1_1           | 1  | 11.0953  |
|   pytorch_CycleGAN_and_pix2pix    | 1  | 10.7537  |
|          pytorch_stargan          | 16 |  10.425  |
|               vgg16               | 1  |  9.3719  |
|               dlrm                | 1  |  9.1913  |
|      nvidia_deeprecommender       | 1  |  9.0608  |
|                drq                | 1  |  8.9684  |
|            tts_angular            | 1  |  8.1898  |
|              alexnet              | 1  |  7.8231  |
|           lennard_jones           | 1  |  7.7809  |
|               dcgan               | 1  |  7.5431  |
|              demucs               | 1  |  0.9926  |
|    mobilenet_v2_quantized_qat     | 1  |  0.1719  |
|      resnet50_quantized_qat       | 1  |  0.1575  |
|           fastNLP_Bert            | 0  |   nan    |
|         soft_actor_critic         | 0  |   nan    |
|        speech_transformer         | 0  |   nan    |
|             tacotron2             | 0  |   nan    |
+-----------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------+----+----------+
|               name                | bs | inductor |
+-----------------------------------+----+----------+
|            hf_T5_base             | 1  |  4.0697  |
|           pytorch_unet            | 1  |  2.4777  |
|            hf_BigBird             | 1  |  2.1386  |
|           hf_GPT2_large           | 1  |  1.9364  |
|            Super_SloMo            | 1  |  1.6529  |
|            hf_T5_large            | 1  |  1.5674  |
|           hf_Longformer           | 1  |  1.4812  |
|          vision_maskrcnn          | 1  |  1.4019  |
|              yolov3               | 1  |  1.2458  |
|          LearningToPaint          | 1  |  1.2383  |
|           hf_DistilBert           | 1  |  1.2054  |
|            timm_nfnet             | 1  |  1.1525  |
|             hf_Albert             | 1  |  1.1471  |
|            timm_regnet            | 1  |  1.144   |
|   timm_vision_transformer_large   | 1  |  1.1144  |
|          resnext50_32x4d          | 1  |  1.1056  |
|             resnet50              | 1  |  1.0811  |
|                drq                | 1  |  1.0667  |
|           mobilenet_v2            | 1  |  1.0624  |
|            timm_vovnet            | 1  |  1.0523  |
|         timm_efficientnet         | 1  |  1.0511  |
|           timm_resnest            | 1  |  1.0487  |
|      timm_vision_transformer      | 1  |  1.0453  |
|              hf_Bart              | 1  |  1.0411  |
|            mnasnet1_0             | 1  |  1.0159  |
|              demucs               | 1  |  0.9985  |
|               dlrm                | 1  |  0.9976  |
|             resnet18              | 1  |  0.9969  |
|      nvidia_deeprecommender       | 1  |  0.9965  |
|            tts_angular            | 1  |  0.9958  |
|      resnet50_quantized_qat       | 1  |  0.9956  |
|        Background_Matting         | 1  |  0.9937  |
|    mobilenet_v2_quantized_qat     | 1  |  0.9934  |
|          pytorch_stargan          | 16 |  0.9883  |
| attention_is_all_you_need_pytorch | 1  |  0.9837  |
|        mobilenet_v3_large         | 1  |  0.9822  |
|   pytorch_CycleGAN_and_pix2pix    | 1  |  0.9813  |
|              alexnet              | 1  |  0.9807  |
|           lennard_jones           | 1  |  0.9804  |
|     detectron2_fcos_r_50_fpn      | 1  |  0.9786  |
|               vgg16               | 1  |  0.9737  |
|       functorch_dp_cifar10        | 1  |  0.9703  |
|           squeezenet1_1           | 1  |  0.9689  |
|              hf_GPT2              | 1  |  0.9674  |
|        shufflenet_v2_x1_0         | 1  |  0.9536  |
|               dcgan               | 1  |  0.9478  |
|              hf_Bert              | 1  |  0.9125  |
|           BERT_pytorch            | 1  |  0.8995  |
|               hf_T5               | 1  |  0.808   |
|            densenet121            | 1  |  0.7885  |
|            hf_Reformer            | 1  |  0.7624  |
|           fastNLP_Bert            | 0  |   nan    |
|         soft_actor_critic         | 0  |   nan    |
|        speech_transformer         | 0  |   nan    |
|             tacotron2             | 0  |   nan    |
+-----------------------------------+----+----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|     MobileBertForQuestionAnswering      | 1  |  1.0915  |
|            XLNetLMHeadModel             | 1  |  1.0461  |
|       AlbertForQuestionAnswering        | 1  |  0.9584  |
|          MobileBertForMaskedLM          | 1  |  0.9445  |
|            AlbertForMaskedLM            | 1  |  0.9374  |
|                 BigBird                 | 1  |  0.9061  |
|     M2M100ForConditionalGeneration      | 1  |  0.9058  |
|             OPTForCausalLM              | 1  |  0.881   |
|               GoogleFnet                | 1  |  0.8584  |
|            YituTechConvBert             | 1  |  0.824   |
|         Speech2Text2ForCausalLM         | 1  |  0.823   |
|       DebertaForQuestionAnswering       | 1  |  0.814   |
|      MBartForConditionalGeneration      | 1  |  0.8049  |
|    MegatronBertForQuestionAnswering     | 1  |  0.8045  |
|     PegasusForConditionalGeneration     | 1  |  0.8037  |
|         MegatronBertForCausalLM         | 1  |  0.7996  |
|          AllenaiLongformerBase          | 1  |  0.7925  |
|       RobertaForQuestionAnswering       | 1  |  0.7884  |
|            MBartForCausalLM             | 1  |  0.7875  |
|            TrOCRForCausalLM             | 1  |  0.7868  |
|           PegasusForCausalLM            | 1  |  0.7839  |
|             XGLMForCausalLM             | 1  |  0.7793  |
|           DebertaForMaskedLM            | 1  |  0.7789  |
|           RobertaForCausalLM            | 1  |  0.7719  |
|     DistilBertForQuestionAnswering      | 1  |  0.7685  |
|     PLBartForConditionalGeneration      | 1  |  0.768   |
|        BertForQuestionAnswering         | 1  |  0.7573  |
|          DistilBertForMaskedLM          | 1  |  0.7509  |
|             BertForMaskedLM             | 1  |  0.7464  |
|               DistillGPT2               | 1  |  0.7368  |
|            PLBartForCausalLM            | 1  |  0.7231  |
|       MT5ForConditionalGeneration       | 1  |  0.7122  |
|    LayoutLMForSequenceClassification    | 1  |  0.6877  |
| BlenderbotSmallForConditionalGeneration | 1  |  0.6839  |
|           LayoutLMForMaskedLM           | 1  |  0.6829  |
|      GPT2ForSequenceClassification      | 1  |  0.6814  |
|                CamemBert                | 1  |  0.675   |
|             BartForCausalLM             | 1  |  0.6741  |
|       BlenderbotSmallForCausalLM        | 1  |  0.6694  |
|       T5ForConditionalGeneration        | 1  |  0.6436  |
|                 T5Small                 | 1  |  0.6422  |
|      BartForConditionalGeneration       | 1  |  0.6387  |
|       ElectraForQuestionAnswering       | 1  |  0.5124  |
|           ElectraForCausalLM            | 1  |  0.494   |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|      BartForConditionalGeneration       | 1  | 41.6146  |
|          AllenaiLongformerBase          | 1  | 27.9724  |
|          MobileBertForMaskedLM          | 1  | 24.7466  |
|     MobileBertForQuestionAnswering      | 1  | 24.6238  |
|     PegasusForConditionalGeneration     | 1  | 23.6489  |
|     M2M100ForConditionalGeneration      | 1  | 23.5255  |
|      MBartForConditionalGeneration      | 1  | 22.7092  |
|           DebertaForMaskedLM            | 1  | 22.4907  |
|                 BigBird                 | 1  | 22.4712  |
|       MT5ForConditionalGeneration       | 1  | 22.4523  |
|             XGLMForCausalLM             | 1  | 22.1615  |
|       T5ForConditionalGeneration        | 1  | 22.1289  |
|       DebertaForQuestionAnswering       | 1  | 21.9579  |
|                 T5Small                 | 1  |  21.842  |
|             BartForCausalLM             | 1  | 21.1736  |
|       BlenderbotSmallForCausalLM        | 1  | 20.9169  |
|         MegatronBertForCausalLM         | 1  | 20.7428  |
|    LayoutLMForSequenceClassification    | 1  | 20.6166  |
|            XLNetLMHeadModel             | 1  | 20.5761  |
|    MegatronBertForQuestionAnswering     | 1  | 20.2754  |
|      GPT2ForSequenceClassification      | 1  | 19.0535  |
|            YituTechConvBert             | 1  | 17.6785  |
|           PegasusForCausalLM            | 1  | 16.5059  |
| BlenderbotSmallForConditionalGeneration | 1  | 16.2353  |
|     PLBartForConditionalGeneration      | 1  | 15.9034  |
|                CamemBert                | 1  | 15.6886  |
|            MBartForCausalLM             | 1  | 15.5201  |
|           LayoutLMForMaskedLM           | 1  | 15.3705  |
|       AlbertForQuestionAnswering        | 1  | 14.7732  |
|             OPTForCausalLM              | 1  |  14.347  |
|           ElectraForCausalLM            | 1  | 13.7837  |
|             BertForMaskedLM             | 1  | 13.7694  |
|           RobertaForCausalLM            | 1  | 13.7615  |
|            TrOCRForCausalLM             | 1  | 13.5365  |
|       RobertaForQuestionAnswering       | 1  | 13.5005  |
|       ElectraForQuestionAnswering       | 1  | 13.4956  |
|        BertForQuestionAnswering         | 1  | 13.3995  |
|               DistillGPT2               | 1  | 13.3664  |
|               GoogleFnet                | 1  | 12.4917  |
|         Speech2Text2ForCausalLM         | 1  | 12.3276  |
|            AlbertForMaskedLM            | 1  | 11.6542  |
|            PLBartForCausalLM            | 1  | 11.6237  |
|          DistilBertForMaskedLM          | 1  |  11.446  |
|     DistilBertForQuestionAnswering      | 1  | 11.2391  |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |  2.802   |
|       AlbertForQuestionAnswering        | 1  |  2.583   |
|      BartForConditionalGeneration       | 1  |  2.1312  |
|       T5ForConditionalGeneration        | 1  |  2.1166  |
|                 BigBird                 | 1  |  2.0647  |
|                 T5Small                 | 1  |  1.9002  |
|      GPT2ForSequenceClassification      | 1  |  1.846   |
|          AllenaiLongformerBase          | 1  |  1.7207  |
|             BartForCausalLM             | 1  |  1.4786  |
|           DebertaForMaskedLM            | 1  |  1.4048  |
|               GoogleFnet                | 1  |  1.3821  |
|            XLNetLMHeadModel             | 1  |  1.3705  |
|       DebertaForQuestionAnswering       | 1  |  1.3687  |
|                CamemBert                | 1  |  1.3494  |
|            YituTechConvBert             | 1  |  1.3208  |
|           LayoutLMForMaskedLM           | 1  |  1.3173  |
|    LayoutLMForSequenceClassification    | 1  |  1.291   |
|           ElectraForCausalLM            | 1  |  1.2799  |
|               DistillGPT2               | 1  |  1.223   |
|       ElectraForQuestionAnswering       | 1  |  1.1931  |
|       MT5ForConditionalGeneration       | 1  |  1.1079  |
| BlenderbotSmallForConditionalGeneration | 1  |  1.1071  |
|     PegasusForConditionalGeneration     | 1  |  1.0873  |
|     M2M100ForConditionalGeneration      | 1  |  1.086   |
|      MBartForConditionalGeneration      | 1  |  1.0757  |
|         MegatronBertForCausalLM         | 1  |  1.0552  |
|            TrOCRForCausalLM             | 1  |  1.0532  |
|     PLBartForConditionalGeneration      | 1  |  1.051   |
|    MegatronBertForQuestionAnswering     | 1  |  1.0509  |
|             BertForMaskedLM             | 1  |  1.0492  |
|           RobertaForCausalLM            | 1  |  1.0466  |
|             XGLMForCausalLM             | 1  |  1.0427  |
|            PLBartForCausalLM            | 1  |  1.0404  |
|          DistilBertForMaskedLM          | 1  |  1.0374  |
|       BlenderbotSmallForCausalLM        | 1  |  1.037   |
|       RobertaForQuestionAnswering       | 1  |  1.0349  |
|        BertForQuestionAnswering         | 1  |  1.0348  |
|             OPTForCausalLM              | 1  |  1.0214  |
|     DistilBertForQuestionAnswering      | 1  |  1.0185  |
|            MBartForCausalLM             | 1  |  1.0161  |
|           PegasusForCausalLM            | 1  |  1.0117  |
|          MobileBertForMaskedLM          | 1  |  0.9992  |
|         Speech2Text2ForCausalLM         | 1  |  0.9802  |
|     MobileBertForQuestionAnswering      | 1  |  0.9763  |
+-----------------------------------------+----+----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  |  1.7341  |
|        ese_vovnet19b_dw         | 1  |  1.3548  |
|           regnety_002           | 1  |  1.3477  |
|            lcnet_050            | 1  |  1.3356  |
|          inception_v3           | 1  |  1.3331  |
|          spnasnet_100           | 1  |  1.3173  |
|       gluon_inception_v3        | 1  |  1.313   |
|        adv_inception_v3         | 1  |  1.3073  |
|           mnasnet_100           | 1  |  1.2884  |
|          gmixer_24_224          | 1  |  1.2732  |
|           fbnetc_100            | 1  |  1.2512  |
|        gluon_xception65         | 1  |  1.2378  |
|         mobilenetv2_100         | 1  |  1.2232  |
|           res2next50            | 1  |  1.2148  |
|        res2net50_14w_8s         | 1  |  1.195   |
|          ghostnet_100           | 1  |  1.1809  |
|      mobilenetv3_large_100      | 1  |  1.169   |
|            fbnetv3_b            | 1  |  1.162   |
|             dla102              | 1  |  1.1449  |
|        res2net101_26w_4s        | 1  |  1.1322  |
|            hrnet_w18            | 1  |  1.1296  |
|            gernet_l             | 1  |  1.0825  |
|           selecsls42b           | 1  |  1.0775  |
|             dpn107              | 1  |  1.067   |
|          cspdarknet53           | 1  |  1.0659  |
|            repvgg_a2            | 1  |  1.0495  |
|           resnest101e           | 1  |  1.0373  |
|     swsl_resnext101_32x16d      | 1  |  1.0226  |
|            nfnet_l0             | 1  |  0.9919  |
|           dm_nfnet_f0           | 1  |  0.9164  |
|         visformer_small         | 1  |  0.9081  |
|      xcit_large_24_p8_224       | 1  |  0.8928  |
|      beit_base_patch16_224      | 1  |  0.8315  |
|  swin_base_patch4_window7_224   | 1  |  0.7798  |
| deit_base_distilled_patch16_224 | 1  |  0.7748  |
|           convit_base           | 1  |  0.7729  |
|      vit_base_patch16_224       | 1  |  0.7706  |
|          cait_m36_384           | 1  |  0.7586  |
|           volo_d1_224           | 1  |  0.7516  |
|         poolformer_m36          | 1  |  0.7361  |
|           tf_mixnet_l           | 1  |  0.7334  |
|            mixnet_l             | 1  |  0.7243  |
|          mixer_b16_224          | 1  |  0.718   |
|           rexnet_100            | 1  |  0.6964  |
|          resmlp_12_224          | 1  |  0.6721  |
|            pit_b_224            | 1  |  0.6687  |
|          convnext_base          | 1  |  0.6642  |
|        tnt_s_patch16_224        | 1  |  0.6411  |
|          jx_nest_base           | 1  |  0.6265  |
|        twins_pcpvt_base         | 1  |  0.6247  |
|         crossvit_9_240          | 1  |  0.6108  |
|           mobilevit_s           | 1  |  0.5634  |
|         coat_lite_mini          | 1  |  0.5461  |
|            tinynet_a            | 1  |  0.5407  |
|       tf_efficientnet_b0        | 1  |  0.4227  |
|          gmlp_s16_224           | 1  |  0.3766  |
|        eca_halonext26ts         | 0  |   0.0    |
|       eca_botnext26ts_256       | 0  |   0.0    |
|        sebotnet33ts_256         | 0  |   0.0    |
|          botnet26t_256          | 0  |   0.0    |
|        convmixer_768_32         | 0  |   0.0    |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|         coat_lite_mini          | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|             dla102              | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|           volo_d1_224           | 1  |     pass      |
|      xcit_large_24_p8_224       | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|        gluon_xception65         | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|            hrnet_w18            | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|           res2next50            | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|          pnasnet5large          | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|         visformer_small         | 1  |     pass      |
|        res2net50_14w_8s         | 1  |     pass      |
|        res2net101_26w_4s        | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|           resnest101e           | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|     swsl_resnext101_32x16d      | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|           selecsls42b           | 1  |     pass      |
|           rexnet_100            | 1  |     pass      |
|        eca_halonext26ts         | 1  |  fail_to_run  |
|        convmixer_768_32         | 1  |  fail_to_run  |
|        sebotnet33ts_256         | 1  |  fail_to_run  |
|          botnet26t_256          | 1  |  fail_to_run  |
|       eca_botnext26ts_256       | 1  |  fail_to_run  |
|          ghostnet_100           | 1  | fail_accuracy |
|          cait_m36_384           | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 41.7089  |
|            hrnet_w18            | 1  | 39.2931  |
|           tf_mixnet_l           | 1  | 33.1455  |
|           rexnet_100            | 1  | 31.1758  |
|        twins_pcpvt_base         | 1  |  31.058  |
|          ghostnet_100           | 1  | 30.5595  |
|        res2net50_14w_8s         | 1  | 30.4734  |
|  swin_base_patch4_window7_224   | 1  |  30.162  |
|           mobilevit_s           | 1  | 29.9955  |
|         coat_lite_mini          | 1  | 29.8229  |
|            nfnet_l0             | 1  | 29.7563  |
|            mixnet_l             | 1  | 29.7319  |
|          cait_m36_384           | 1  | 29.3847  |
|           resnest101e           | 1  | 28.1108  |
|        res2net101_26w_4s        | 1  | 27.3083  |
|            fbnetv3_b            | 1  | 26.8071  |
|      xcit_large_24_p8_224       | 1  | 26.5514  |
|             dpn107              | 1  | 26.4295  |
|           dm_nfnet_f0           | 1  | 25.5659  |
|       tf_efficientnet_b0        | 1  | 24.8257  |
|         poolformer_m36          | 1  | 24.7907  |
|            tinynet_a            | 1  | 24.6314  |
|          jx_nest_base           | 1  | 22.7314  |
|           res2next50            | 1  | 22.3748  |
|      mobilenetv3_large_100      | 1  | 21.9447  |
|         crossvit_9_240          | 1  | 21.3376  |
|        gluon_xception65         | 1  | 21.1032  |
|        tnt_s_patch16_224        | 1  | 21.0291  |
|           volo_d1_224           | 1  | 20.9827  |
|       gluon_inception_v3        | 1  | 20.4352  |
|             dla102              | 1  | 19.9123  |
|        adv_inception_v3         | 1  | 19.8067  |
|          inception_v3           | 1  | 19.6737  |
|          convnext_base          | 1  | 19.1797  |
|          cspdarknet53           | 1  | 19.1125  |
|           fbnetc_100            | 1  | 19.0881  |
|          spnasnet_100           | 1  | 18.6126  |
|            pit_b_224            | 1  | 18.0546  |
|           regnety_002           | 1  | 17.5177  |
|     swsl_resnext101_32x16d      | 1  | 17.4402  |
|          gmlp_s16_224           | 1  | 17.4338  |
|         mobilenetv2_100         | 1  | 16.8956  |
|           mnasnet_100           | 1  | 16.8593  |
|           convit_base           | 1  | 16.6751  |
|         visformer_small         | 1  | 16.2995  |
|            gernet_l             | 1  | 14.7961  |
|          gmixer_24_224          | 1  | 14.7167  |
|           selecsls42b           | 1  | 13.9762  |
|        ese_vovnet19b_dw         | 1  | 13.5153  |
|          mixer_b16_224          | 1  | 13.4293  |
|            repvgg_a2            | 1  | 13.3048  |
| deit_base_distilled_patch16_224 | 1  | 12.8625  |
|            lcnet_050            | 1  |  12.858  |
|      beit_base_patch16_224      | 1  |  12.767  |
|      vit_base_patch16_224       | 1  | 12.5794  |
|          resmlp_12_224          | 1  | 10.5512  |
|          botnet26t_256          | 0  |   nan    |
|        convmixer_768_32         | 0  |   nan    |
|       eca_botnext26ts_256       | 0  |   nan    |
|        eca_halonext26ts         | 0  |   nan    |
|        sebotnet33ts_256         | 0  |   nan    |
+---------------------------------+----+----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|      xcit_large_24_p8_224       | 1  |  2.3088  |
|          cait_m36_384           | 1  |  2.1772  |
|          pnasnet5large          | 1  |  1.6668  |
|        gluon_xception65         | 1  |  1.4668  |
|           mobilevit_s           | 1  |  1.2066  |
|           dm_nfnet_f0           | 1  |  1.205   |
|          jx_nest_base           | 1  |  1.1979  |
|     swsl_resnext101_32x16d      | 1  |  1.1881  |
|            nfnet_l0             | 1  |  1.1865  |
|           resnest101e           | 1  |  1.1833  |
|             dpn107              | 1  |  1.1777  |
|           convit_base           | 1  |  1.1426  |
|          cspdarknet53           | 1  |  1.1421  |
|             dla102              | 1  |  1.1357  |
|          convnext_base          | 1  |  1.1312  |
|         poolformer_m36          | 1  |  1.1191  |
|          gmixer_24_224          | 1  |  1.1105  |
|           selecsls42b           | 1  |  1.1089  |
|  swin_base_patch4_window7_224   | 1  |  1.1085  |
|            pit_b_224            | 1  |  1.1047  |
|           volo_d1_224           | 1  |  1.0932  |
|           res2next50            | 1  |  1.0922  |
|        res2net101_26w_4s        | 1  |   1.09   |
|        tnt_s_patch16_224        | 1  |  1.0877  |
| deit_base_distilled_patch16_224 | 1  |  1.0821  |
|      vit_base_patch16_224       | 1  |  1.0819  |
|          mixer_b16_224          | 1  |  1.0805  |
|        twins_pcpvt_base         | 1  |  1.0702  |
|           rexnet_100            | 1  |  1.0687  |
|       tf_efficientnet_b0        | 1  |  1.0654  |
|         mobilenetv2_100         | 1  |  1.0622  |
|           tf_mixnet_l           | 1  |  1.0591  |
|          inception_v3           | 1  |  1.0579  |
|        ese_vovnet19b_dw         | 1  |  1.0563  |
|       gluon_inception_v3        | 1  |  1.054   |
|        res2net50_14w_8s         | 1  |  1.0513  |
|            mixnet_l             | 1  |  1.0512  |
|          resmlp_12_224          | 1  |  1.0472  |
|        adv_inception_v3         | 1  |  1.0458  |
|         coat_lite_mini          | 1  |  1.0436  |
|            fbnetv3_b            | 1  |  1.0302  |
|            tinynet_a            | 1  |  1.0243  |
|            gernet_l             | 1  |  1.0241  |
|            repvgg_a2            | 1  |   1.02   |
|          spnasnet_100           | 1  |  1.0174  |
|           mnasnet_100           | 1  |  1.0144  |
|           fbnetc_100            | 1  |  1.0034  |
|            hrnet_w18            | 1  |  0.984   |
|         visformer_small         | 1  |  0.9648  |
|            lcnet_050            | 1  |  0.9643  |
|           regnety_002           | 1  |  0.9594  |
|      beit_base_patch16_224      | 1  |  0.9559  |
|          ghostnet_100           | 1  |  0.9436  |
|      mobilenetv3_large_100      | 1  |  0.8822  |
|         crossvit_9_240          | 1  |  0.8758  |
|          gmlp_s16_224           | 1  |  0.8741  |
|          botnet26t_256          | 0  |   nan    |
|        convmixer_768_32         | 0  |   nan    |
|       eca_botnext26ts_256       | 0  |   nan    |
|        eca_halonext26ts         | 0  |   nan    |
|        sebotnet33ts_256         | 0  |   nan    |
+---------------------------------+----+----------+

@ESI-SYD
Copy link

ESI-SYD commented Nov 8, 2022

Performance Dashboard for float32 precision -- Single-Socket Multi-threads

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 91%, 49/54 | 100%, 44/44 | 89%, 54/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.07x    |    1.07x    |    1.09x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    11.68    |    14.90    |    18.76    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.94x    |    0.96x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|        shufflenet_v2_x1_0         |  64  |  1.8229  |
|            densenet121            |  64  |  1.4481  |
|            mnasnet1_0             |  32  |  1.2714  |
|             resnet18              |  8   |  1.2227  |
|           pytorch_unet            |  1   |  1.187   |
|           timm_resnest            |  32  |  1.1784  |
|           squeezenet1_1           |  16  |  1.1622  |
|       functorch_dp_cifar10        |  64  |  1.1606  |
|              alexnet              | 128  |  1.1507  |
|           mobilenet_v2            |  16  |  1.145   |
|        mobilenet_v3_large         |  32  |  1.1357  |
|            timm_vovnet            |  32  |  1.1106  |
|                drq                |  1   |  1.1049  |
|             resnet50              |  32  |  1.093   |
|               dcgan               | 256  |  1.0908  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0891  |
|               vgg16               |  4   |  1.0725  |
|            hf_Reformer            |  1   |  1.0698  |
|            Super_SloMo            |  6   |  1.0689  |
|            hf_T5_large            |  1   |  1.0617  |
|          LearningToPaint          |  96  |  1.0579  |
|               dlrm                | 2048 |  1.0565  |
|        Background_Matting         |  1   |  1.0554  |
|            timm_regnet            |  32  |  1.0524  |
|            hf_BigBird             |  1   |  1.0321  |
|          pytorch_stargan          |  16  |  1.0292  |
|               hf_T5               |  1   |  1.0188  |
|    mobilenet_v2_quantized_qat     |  96  |  0.9988  |
|              demucs               |  1   |  0.9975  |
|            tts_angular            |  64  |  0.9962  |
|      resnet50_quantized_qat       |  32  |  0.9959  |
|           BERT_pytorch            |  2   |  0.9875  |
|           hf_GPT2_large           |  1   |  0.9401  |
|              hf_GPT2              |  1   |  0.9269  |
|           hf_Longformer           |  1   |  0.921   |
|             hf_Albert             |  1   |  0.8995  |
|              yolov3               |  8   |  0.8824  |
|            hf_T5_base             |  1   |  0.8824  |
|   timm_vision_transformer_large   |  8   |  0.8753  |
|          resnext50_32x4d          |  8   |  0.8594  |
|      nvidia_deeprecommender       | 256  |  0.8387  |
|           hf_DistilBert           |  1   |  0.8317  |
|            timm_nfnet             | 128  |  0.8173  |
|              hf_Bert              |  1   |  0.8049  |
|         timm_efficientnet         |  64  |  0.7993  |
|              hf_Bart              |  1   |  0.7981  |
| attention_is_all_you_need_pytorch |  32  |  0.728   |
|      timm_vision_transformer      |  8   |  0.637   |
|           lennard_jones           | 1000 |  0.3592  |
|             tacotron2             |  0   |   0.0    |
|        speech_transformer         |  0   |   0.0    |
|         soft_actor_critic         |  0   |   0.0    |
|           fastNLP_Bert            |  0   |   0.0    |
|         timm_efficientdet         |  0   |   0.0    |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+----+------------------+
|               name                | bs |     inductor     |
+-----------------------------------+----+------------------+
|            hf_T5_large            | 2  | pass_due_to_skip |
|           hf_GPT2_large           | 2  | pass_due_to_skip |
|   timm_vision_transformer_large   | 2  | pass_due_to_skip |
|               hf_T5               | 2  |       pass       |
|            hf_Reformer            | 2  |       pass       |
|            hf_T5_base             | 2  |       pass       |
|          LearningToPaint          | 2  |       pass       |
|            Super_SloMo            | 2  |       pass       |
|              alexnet              | 2  |       pass       |
| attention_is_all_you_need_pytorch | 2  |       pass       |
|               dcgan               | 2  |       pass       |
|              demucs               | 1  |       pass       |
|            densenet121            | 2  |       pass       |
|               dlrm                | 2  |       pass       |
|                drq                | 1  |       pass       |
|              yolov3               | 2  |       pass       |
|           mobilenet_v2            | 2  |       pass       |
|             hf_Albert             | 2  |       pass       |
|              hf_Bart              | 2  |       pass       |
|              hf_Bert              | 2  |       pass       |
|            hf_BigBird             | 2  |       pass       |
|           hf_DistilBert           | 2  |       pass       |
|              hf_GPT2              | 2  |       pass       |
|           hf_Longformer           | 2  |       pass       |
|       functorch_dp_cifar10        | 2  |       pass       |
|           lennard_jones           | 2  |       pass       |
|        Background_Matting         | 1  |       pass       |
|          resnext50_32x4d          | 2  |       pass       |
|           BERT_pytorch            | 2  |       pass       |
|      resnet50_quantized_qat       | 2  |       pass       |
|    mobilenet_v2_quantized_qat     | 2  |       pass       |
|        mobilenet_v3_large         | 2  |       pass       |
|      nvidia_deeprecommender       | 2  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    | 1  |       pass       |
|          pytorch_stargan          | 16 |       pass       |
|           pytorch_unet            | 2  |       pass       |
|               vgg16               | 2  |       pass       |
|             resnet50              | 2  |       pass       |
|             resnet18              | 2  |       pass       |
|            mnasnet1_0             | 2  |       pass       |
|        shufflenet_v2_x1_0         | 2  |       pass       |
|           squeezenet1_1           | 2  |       pass       |
|         timm_efficientnet         | 2  |       pass       |
|            timm_nfnet             | 2  |       pass       |
|            timm_regnet            | 2  |       pass       |
|           timm_resnest            | 2  |       pass       |
|      timm_vision_transformer      | 2  |       pass       |
|            timm_vovnet            | 2  |       pass       |
|            tts_angular            | 2  |       pass       |
|         soft_actor_critic         | 0  |      0.0000      |
|           fastNLP_Bert            | 0  |      0.0000      |
|             tacotron2             | 0  |      0.0000      |
|         timm_efficientdet         | 0  |      0.0000      |
|        speech_transformer         | 0  |      0.0000      |
+-----------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|            hf_T5_large            |  1   | 33.7788  |
|   timm_vision_transformer_large   |  8   |  32.45   |
|            timm_nfnet             | 128  | 23.7158  |
|           hf_GPT2_large           |  1   | 22.4235  |
|            densenet121            |  64  | 21.8843  |
|            hf_T5_base             |  1   | 21.4543  |
|            hf_BigBird             |  1   | 21.3205  |
|           hf_Longformer           |  1   | 20.3187  |
|              yolov3               |  8   | 18.3641  |
|            Super_SloMo            |  6   | 16.8824  |
|         timm_efficientnet         |  64  | 14.9878  |
|            timm_regnet            |  32  | 14.5997  |
|              hf_Bert              |  1   | 13.6024  |
|              hf_Bart              |  1   | 12.6615  |
|            hf_Reformer            |  1   | 12.1682  |
| attention_is_all_you_need_pytorch |  32  | 11.8647  |
|               hf_T5               |  1   | 11.8495  |
|           BERT_pytorch            |  2   | 11.4679  |
|              hf_GPT2              |  1   | 10.6672  |
|      timm_vision_transformer      |  8   | 10.5754  |
|        Background_Matting         |  1   | 10.4842  |
|             resnet50              |  32  | 10.3343  |
|             hf_Albert             |  1   | 10.2016  |
|            timm_vovnet            |  32  | 10.1441  |
|        mobilenet_v3_large         |  32  | 10.0769  |
|        shufflenet_v2_x1_0         |  64  |  9.7873  |
|          resnext50_32x4d          |  8   |  9.7049  |
|            mnasnet1_0             |  32  |  9.2318  |
|           mobilenet_v2            |  16  |  9.2068  |
|           hf_DistilBert           |  1   |  8.9652  |
|           timm_resnest            |  32  |  8.827   |
|           pytorch_unet            |  1   |  8.7658  |
|       functorch_dp_cifar10        |  64  |  8.4817  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  8.1832  |
|          pytorch_stargan          |  16  |  8.176   |
|          LearningToPaint          |  96  |  7.7343  |
|           squeezenet1_1           |  16  |  7.5185  |
|             resnet18              |  8   |  7.5156  |
|      nvidia_deeprecommender       | 256  |  7.5014  |
|               vgg16               |  4   |  7.4072  |
|               dlrm                | 2048 |  7.3599  |
|              alexnet              | 128  |  7.1185  |
|            tts_angular            |  64  |  7.072   |
|                drq                |  1   |  6.7346  |
|           lennard_jones           | 1000 |  6.6819  |
|              demucs               |  1   |  1.4613  |
|               dcgan               | 256  |  0.2598  |
|    mobilenet_v2_quantized_qat     |  96  |  0.2521  |
|      resnet50_quantized_qat       |  32  |  0.2215  |
|           fastNLP_Bert            |  0   |   nan    |
|         soft_actor_critic         |  0   |   nan    |
|        speech_transformer         |  0   |   nan    |
|             tacotron2             |  0   |   nan    |
|         timm_efficientdet         |  0   |   nan    |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|              demucs               |  1   |  0.9988  |
|               dlrm                | 2048 |  0.9968  |
|      resnet50_quantized_qat       |  32  |  0.9959  |
|        Background_Matting         |  1   |  0.994   |
|   timm_vision_transformer_large   |  8   |  0.9935  |
|         timm_efficientnet         |  64  |  0.9925  |
|               vgg16               |  4   |  0.9924  |
|            timm_nfnet             | 128  |  0.9918  |
|          LearningToPaint          |  96  |  0.9915  |
|           pytorch_unet            |  1   |  0.9909  |
|            timm_regnet            |  32  |  0.9908  |
|            hf_BigBird             |  1   |  0.9908  |
|            Super_SloMo            |  6   |  0.9902  |
|            densenet121            |  64  |  0.9894  |
|             resnet50              |  32  |  0.9881  |
|            mnasnet1_0             |  32  |  0.9859  |
|                drq                |  1   |  0.9854  |
|            hf_T5_base             |  1   |  0.9838  |
|           lennard_jones           | 1000 |  0.9822  |
|        mobilenet_v3_large         |  32  |  0.9817  |
| attention_is_all_you_need_pytorch |  32  |  0.9815  |
|           mobilenet_v2            |  16  |  0.9805  |
|           hf_GPT2_large           |  1   |  0.9805  |
|        shufflenet_v2_x1_0         |  64  |  0.9804  |
|            timm_vovnet            |  32  |  0.9794  |
|            tts_angular            |  64  |  0.9786  |
|              alexnet              | 128  |  0.9784  |
|           hf_DistilBert           |  1   |  0.9774  |
|          resnext50_32x4d          |  8   |  0.9757  |
|    mobilenet_v2_quantized_qat     |  96  |  0.9735  |
|             hf_Albert             |  1   |  0.9723  |
|              hf_Bert              |  1   |  0.9689  |
|              hf_Bart              |  1   |  0.9658  |
|          pytorch_stargan          |  16  |  0.9632  |
|              hf_GPT2              |  1   |  0.9616  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.949   |
|      timm_vision_transformer      |  8   |  0.9335  |
|             resnet18              |  8   |  0.9325  |
|               dcgan               | 256  |  0.9308  |
|           hf_Longformer           |  1   |  0.9176  |
|           BERT_pytorch            |  2   |  0.9168  |
|            hf_Reformer            |  1   |  0.9164  |
|       functorch_dp_cifar10        |  64  |  0.9003  |
|           timm_resnest            |  32  |  0.8899  |
|              yolov3               |  8   |  0.8676  |
|               hf_T5               |  1   |  0.7678  |
|           squeezenet1_1           |  16  |  0.7248  |
|            hf_T5_large            |  1   |  0.6029  |
|      nvidia_deeprecommender       | 256  |  0.5832  |
|           fastNLP_Bert            |  0   |   nan    |
|         soft_actor_critic         |  0   |   nan    |
|        speech_transformer         |  0   |   nan    |
|             tacotron2             |  0   |   nan    |
|         timm_efficientdet         |  0   |   nan    |
+-----------------------------------+------+----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       MT5ForConditionalGeneration       |  8  |   2.52   |
|            XLNetLMHeadModel             | 32  |  1.9735  |
|     MobileBertForQuestionAnswering      | 64  |  1.3506  |
|               DistillGPT2               |  1  |  1.3222  |
|               GoogleFnet                |  1  |  1.2726  |
|            YituTechConvBert             |  1  |  1.1963  |
|          MobileBertForMaskedLM          | 32  |  1.1388  |
|                 BigBird                 |  1  |  1.0872  |
|     M2M100ForConditionalGeneration      |  8  |  1.0839  |
|             XGLMForCausalLM             |  8  |  1.0554  |
|                 T5Small                 |  1  |  1.0447  |
|          AllenaiLongformerBase          |  1  |  1.0342  |
|            AlbertForMaskedLM            |  4  |  1.0302  |
|       AlbertForQuestionAnswering        |  4  |  1.0284  |
|             OPTForCausalLM              | 32  |  1.0192  |
|           DebertaForMaskedLM            |  4  |  0.9714  |
|                CamemBert                |  1  |  0.9494  |
|      GPT2ForSequenceClassification      |  4  |  0.9432  |
|         Speech2Text2ForCausalLM         | 128 |  0.9414  |
|       DebertaForQuestionAnswering       |  8  |  0.9174  |
|           RobertaForCausalLM            | 64  |  0.8998  |
|     PLBartForConditionalGeneration      | 16  |  0.8968  |
|           ElectraForCausalLM            | 32  |  0.8924  |
|         MegatronBertForCausalLM         | 16  |   0.89   |
|      MBartForConditionalGeneration      | 16  |  0.8895  |
|            PLBartForCausalLM            | 32  |  0.8875  |
|     PegasusForConditionalGeneration     | 16  |  0.8874  |
|    LayoutLMForSequenceClassification    | 16  |  0.8763  |
|            TrOCRForCausalLM             | 32  |  0.8712  |
|       RobertaForQuestionAnswering       | 128 |  0.8694  |
|    MegatronBertForQuestionAnswering     | 16  |  0.8664  |
|           PegasusForCausalLM            | 32  |  0.8664  |
|           LayoutLMForMaskedLM           | 16  |  0.8655  |
|            MBartForCausalLM             | 32  |  0.8629  |
|       T5ForConditionalGeneration        |  4  |  0.8604  |
|             BertForMaskedLM             | 64  |  0.8416  |
|     DistilBertForQuestionAnswering      | 64  |  0.8389  |
|        BertForQuestionAnswering         | 128 |  0.836   |
|          DistilBertForMaskedLM          | 64  |  0.8349  |
|             BartForCausalLM             |  4  |  0.8196  |
|       ElectraForQuestionAnswering       | 64  |  0.8134  |
|      BartForConditionalGeneration       |  2  |  0.8084  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.8081  |
|       BlenderbotSmallForCausalLM        | 64  |  0.793   |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|     PegasusForConditionalGeneration     | 16  | 22.3478  |
|          MobileBertForMaskedLM          | 32  | 21.9429  |
|       DebertaForQuestionAnswering       |  8  |  21.77   |
|            YituTechConvBert             |  1  | 21.6894  |
|     MobileBertForQuestionAnswering      | 64  | 21.4578  |
|          AllenaiLongformerBase          |  1  | 21.2457  |
|           DebertaForMaskedLM            |  4  | 20.9841  |
|     M2M100ForConditionalGeneration      |  8  | 20.5287  |
|                 BigBird                 |  1  | 20.2965  |
|      BartForConditionalGeneration       |  2  | 19.7537  |
|      MBartForConditionalGeneration      | 16  | 19.0224  |
|             XGLMForCausalLM             |  8  | 18.5344  |
|         MegatronBertForCausalLM         | 16  | 17.2338  |
|    MegatronBertForQuestionAnswering     | 16  | 16.8028  |
|       RobertaForQuestionAnswering       | 128 | 16.3171  |
| BlenderbotSmallForConditionalGeneration | 64  | 15.7945  |
|           PegasusForCausalLM            | 32  |  14.928  |
|            AlbertForMaskedLM            |  4  | 14.4438  |
|       AlbertForQuestionAnswering        |  4  | 14.3569  |
|        BertForQuestionAnswering         | 128 | 14.2543  |
|       MT5ForConditionalGeneration       |  8  | 14.1973  |
|           LayoutLMForMaskedLM           | 16  | 13.6621  |
|       T5ForConditionalGeneration        |  4  | 13.6308  |
|             BartForCausalLM             |  4  | 13.3232  |
|       ElectraForQuestionAnswering       | 64  | 13.2929  |
|           RobertaForCausalLM            | 64  | 13.2914  |
|     PLBartForConditionalGeneration      | 16  | 13.2436  |
|           ElectraForCausalLM            | 32  | 13.2227  |
|             BertForMaskedLM             | 64  |  13.182  |
|    LayoutLMForSequenceClassification    | 16  | 13.1042  |
|            MBartForCausalLM             | 32  | 12.5802  |
|      GPT2ForSequenceClassification      |  4  | 12.4793  |
|            TrOCRForCausalLM             | 32  | 12.4106  |
|             OPTForCausalLM              | 32  | 12.0205  |
|                 T5Small                 |  1  | 11.7247  |
|                CamemBert                |  1  | 11.1804  |
|               GoogleFnet                |  1  | 11.1298  |
|         Speech2Text2ForCausalLM         | 128 | 10.7654  |
|       BlenderbotSmallForCausalLM        | 64  | 10.7171  |
|          DistilBertForMaskedLM          | 64  | 10.1725  |
|     DistilBertForQuestionAnswering      | 64  |  9.6337  |
|            PLBartForCausalLM            | 32  |  9.5557  |
|               DistillGPT2               |  1  |  8.4453  |
|            XLNetLMHeadModel             | 32  |  4.8687  |
+-----------------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  |  0.9978  |
|       AlbertForQuestionAnswering        |  4  |  0.9976  |
|       ElectraForQuestionAnswering       | 64  |  0.9962  |
|             BartForCausalLM             |  4  |  0.9959  |
|           RobertaForCausalLM            | 64  |  0.9956  |
|           ElectraForCausalLM            | 32  |  0.9954  |
|       BlenderbotSmallForCausalLM        | 64  |  0.9952  |
|             BertForMaskedLM             | 64  |  0.9951  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.9947  |
|          DistilBertForMaskedLM          | 64  |  0.9944  |
|           PegasusForCausalLM            | 32  |  0.9944  |
|            TrOCRForCausalLM             | 32  |  0.9943  |
|      GPT2ForSequenceClassification      |  4  |  0.9943  |
|       DebertaForQuestionAnswering       |  8  |  0.9937  |
|            MBartForCausalLM             | 32  |  0.9935  |
|       T5ForConditionalGeneration        |  4  |  0.9931  |
|             OPTForCausalLM              | 32  |  0.993   |
|         Speech2Text2ForCausalLM         | 128 |  0.9929  |
|            PLBartForCausalLM            | 32  |  0.9924  |
|     DistilBertForQuestionAnswering      | 64  |  0.9916  |
|     PegasusForConditionalGeneration     | 16  |  0.9915  |
|                 BigBird                 |  1  |  0.9914  |
|        BertForQuestionAnswering         | 128 |  0.9913  |
|      BartForConditionalGeneration       |  2  |  0.9913  |
|       RobertaForQuestionAnswering       | 128 |  0.9912  |
|           DebertaForMaskedLM            |  4  |  0.9893  |
|           LayoutLMForMaskedLM           | 16  |  0.9876  |
|      MBartForConditionalGeneration      | 16  |  0.9874  |
|     PLBartForConditionalGeneration      | 16  |  0.987   |
|               GoogleFnet                |  1  |  0.987   |
|    LayoutLMForSequenceClassification    | 16  |  0.9853  |
|               DistillGPT2               |  1  |  0.9838  |
|            XLNetLMHeadModel             | 32  |  0.9773  |
|          MobileBertForMaskedLM          | 32  |  0.9771  |
|         MegatronBertForCausalLM         | 16  |  0.9645  |
|    MegatronBertForQuestionAnswering     | 16  |  0.9602  |
|     M2M100ForConditionalGeneration      |  8  |  0.9538  |
|             XGLMForCausalLM             |  8  |  0.9248  |
|          AllenaiLongformerBase          |  1  |  0.8733  |
|     MobileBertForQuestionAnswering      | 64  |  0.848   |
|                CamemBert                |  1  |  0.8381  |
|            YituTechConvBert             |  1  |  0.8042  |
|       MT5ForConditionalGeneration       |  8  |  0.7392  |
|                 T5Small                 |  1  |  0.7249  |
+-----------------------------------------+-----+----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  |  1.5267  |
|        gluon_xception65         | 32  |  1.4004  |
|           resnest101e           | 64  |  1.3192  |
|          inception_v3           | 128 |  1.3004  |
|        adv_inception_v3         | 128 |  1.2898  |
|       gluon_inception_v3        | 128 |  1.2825  |
|           fbnetc_100            | 128 |  1.2105  |
|           res2next50            | 128 |  1.2083  |
|        res2net50_14w_8s         | 128 |  1.2004  |
|        res2net101_26w_4s        | 64  |  1.1989  |
|           volo_d1_224           | 64  |   1.19   |
|          spnasnet_100           | 128 |  1.1888  |
|           mnasnet_100           | 128 |  1.1874  |
|      mobilenetv3_large_100      | 128 |  1.1814  |
|        ese_vovnet19b_dw         | 128 |  1.1771  |
|            hrnet_w18            | 128 |  1.1738  |
|          ghostnet_100           | 128 |  1.1673  |
|            lcnet_050            | 128 |  1.1595  |
|             dpn107              | 32  |  1.1524  |
|            gernet_l             | 128 |  1.1266  |
|            repvgg_a2            | 128 |  1.1169  |
|         visformer_small         | 128 |  1.1012  |
|            fbnetv3_b            | 128 |  1.1005  |
|           selecsls42b           | 128 |  1.0886  |
|         mobilenetv2_100         | 128 |  1.0687  |
|           regnety_002           | 128 |  1.0555  |
|           tf_mixnet_l           | 128 |  1.0453  |
|          cspdarknet53           | 64  |  1.0445  |
|      xcit_large_24_p8_224       |  5  |  1.0317  |
|          gmixer_24_224          | 128 |   1.03   |
|            mixnet_l             | 128 |   1.0    |
|          cait_m36_384           |  4  |  0.9731  |
|             dla102              | 128 |  0.9443  |
|            nfnet_l0             | 128 |  0.9023  |
|  swin_base_patch4_window7_224   | 64  |  0.8897  |
|           dm_nfnet_f0           | 128 |  0.885   |
|      beit_base_patch16_224      | 64  |  0.8841  |
|           rexnet_100            | 128 |  0.882   |
|           convit_base           | 64  |  0.8643  |
|          convnext_base          | 64  |  0.8637  |
| deit_base_distilled_patch16_224 | 64  |  0.8615  |
|       tf_efficientnet_b0        | 128 |  0.8598  |
|      vit_base_patch16_224       | 64  |  0.8571  |
|         poolformer_m36          | 64  |  0.8554  |
|          jx_nest_base           | 32  |  0.8342  |
|          resmlp_12_224          | 128 |  0.8261  |
|            tinynet_a            | 128 |  0.8205  |
|           mobilevit_s           | 64  |  0.8145  |
|        tnt_s_patch16_224        | 128 |  0.8144  |
|            pit_b_224            | 64  |  0.7913  |
|          mixer_b16_224          | 128 |  0.7906  |
|         coat_lite_mini          | 128 |  0.7535  |
|        twins_pcpvt_base         | 64  |  0.7371  |
|         crossvit_9_240          | 128 |  0.7225  |
|          gmlp_s16_224           | 128 |  0.5764  |
|     swsl_resnext101_32x16d      | 32  |  0.0717  |
|        convmixer_768_32         |  0  |   0.0    |
|        sebotnet33ts_256         |  0  |   0.0    |
|        eca_halonext26ts         |  0  |   0.0    |
|          botnet26t_256          |  0  |   0.0    |
|       eca_botnext26ts_256       |  0  |   0.0    |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|             dpn107              | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|           convit_base           | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|            hrnet_w18            | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|         visformer_small         | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|           resnest101e           | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |     pass      |
|  swin_base_patch4_window7_224   | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|        eca_halonext26ts         | 2  |  fail_to_run  |
|        convmixer_768_32         | 2  |  fail_to_run  |
|        sebotnet33ts_256         | 2  |  fail_to_run  |
|          botnet26t_256          | 2  |  fail_to_run  |
|       eca_botnext26ts_256       | 2  |  fail_to_run  |
|          ghostnet_100           | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|     swsl_resnext101_32x16d      | 32  | 116.6718 |
|          pnasnet5large          | 16  |  32.273  |
|           dm_nfnet_f0           | 128 | 30.8393  |
|            hrnet_w18            | 128 | 28.1686  |
|        tnt_s_patch16_224        | 128 | 27.0522  |
|  swin_base_patch4_window7_224   | 64  | 26.1303  |
|          cait_m36_384           |  4  | 24.0501  |
|           tf_mixnet_l           | 128 |  23.045  |
|        twins_pcpvt_base         | 64  | 22.4356  |
|           mobilevit_s           | 64  | 21.9736  |
|         poolformer_m36          | 64  | 21.9105  |
|      xcit_large_24_p8_224       |  5  | 21.1167  |
|          inception_v3           | 128 | 21.0264  |
|            mixnet_l             | 128 | 20.9835  |
|        res2net50_14w_8s         | 128 | 20.5074  |
|           rexnet_100            | 128 | 19.8709  |
|             dla102              | 128 | 19.7566  |
|            fbnetv3_b            | 128 | 19.6552  |
|           resnest101e           | 64  | 19.5838  |
|            nfnet_l0             | 128 | 19.4525  |
|        res2net101_26w_4s        | 64  | 19.4293  |
|             dpn107              | 32  | 19.1965  |
|          jx_nest_base           | 32  |  18.167  |
|          gmlp_s16_224           | 128 | 16.9396  |
|          convnext_base          | 64  | 16.9235  |
|           volo_d1_224           | 64  | 16.5552  |
|         crossvit_9_240          | 128 | 16.5376  |
|       tf_efficientnet_b0        | 128 | 16.2456  |
|           convit_base           | 64  |  16.061  |
|           res2next50            | 128 | 15.5874  |
|         coat_lite_mini          | 128 | 15.5764  |
|            pit_b_224            | 64  | 15.2213  |
|        adv_inception_v3         | 128 | 14.4561  |
|       gluon_inception_v3        | 128 | 14.2199  |
|        gluon_xception65         | 32  | 14.1294  |
|          mixer_b16_224          | 128 | 14.0365  |
|          gmixer_24_224          | 128 | 13.8209  |
|            tinynet_a            | 128 |  13.23   |
|      beit_base_patch16_224      | 64  | 13.1765  |
|         visformer_small         | 128 | 13.0997  |
|      mobilenetv3_large_100      | 128 | 12.8485  |
| deit_base_distilled_patch16_224 | 64  |  12.723  |
|          cspdarknet53           | 64  | 12.7215  |
|      vit_base_patch16_224       | 64  |  12.632  |
|          ghostnet_100           | 128 | 11.9344  |
|            gernet_l             | 128 | 11.7819  |
|           fbnetc_100            | 128 |  11.327  |
|          spnasnet_100           | 128 | 11.2226  |
|           regnety_002           | 128 |  11.123  |
|            repvgg_a2            | 128 | 10.9643  |
|         mobilenetv2_100         | 128 | 10.8973  |
|        ese_vovnet19b_dw         | 128 | 10.5822  |
|           mnasnet_100           | 128 | 10.2721  |
|          resmlp_12_224          | 128 | 10.0726  |
|           selecsls42b           | 128 |  9.447   |
|            lcnet_050            | 128 |  9.1307  |
|          botnet26t_256          |  0  |   nan    |
|        convmixer_768_32         |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        ese_vovnet19b_dw         | 128 |  0.9975  |
|          mixer_b16_224          | 128 |  0.9972  |
|        adv_inception_v3         | 128 |  0.9971  |
|           dm_nfnet_f0           | 128 |  0.9968  |
|           convit_base           | 64  |  0.9967  |
| deit_base_distilled_patch16_224 | 64  |  0.9963  |
|      vit_base_patch16_224       | 64  |  0.9963  |
|             dla102              | 128 |  0.9963  |
|          gmixer_24_224          | 128 |  0.9962  |
|          resmlp_12_224          | 128 |  0.9962  |
|       gluon_inception_v3        | 128 |  0.9962  |
|        gluon_xception65         | 32  |  0.9962  |
|          inception_v3           | 128 |  0.9961  |
|            fbnetv3_b            | 128 |  0.996   |
|        res2net50_14w_8s         | 128 |  0.996   |
|      beit_base_patch16_224      | 64  |  0.9959  |
|           mobilevit_s           | 64  |  0.9957  |
|          convnext_base          | 64  |  0.9957  |
|            gernet_l             | 128 |  0.9957  |
|           rexnet_100            | 128 |  0.9956  |
|         visformer_small         | 128 |  0.9954  |
|            pit_b_224            | 64  |  0.9953  |
|         coat_lite_mini          | 128 |  0.9952  |
|            hrnet_w18            | 128 |  0.9949  |
|        tnt_s_patch16_224        | 128 |  0.9949  |
|           res2next50            | 128 |  0.9947  |
|            nfnet_l0             | 128 |  0.9946  |
|            mixnet_l             | 128 |  0.9946  |
|           tf_mixnet_l           | 128 |  0.9943  |
|            repvgg_a2            | 128 |  0.994   |
|        res2net101_26w_4s        | 64  |  0.994   |
|       tf_efficientnet_b0        | 128 |  0.9938  |
|          cait_m36_384           |  4  |  0.9936  |
|           resnest101e           | 64  |  0.9936  |
|             dpn107              | 32  |  0.9935  |
|  swin_base_patch4_window7_224   | 64  |  0.9934  |
|         poolformer_m36          | 64  |  0.9933  |
|         crossvit_9_240          | 128 |  0.9919  |
|          pnasnet5large          | 16  |  0.9919  |
|          jx_nest_base           | 32  |  0.9918  |
|      mobilenetv3_large_100      | 128 |  0.9914  |
|        twins_pcpvt_base         | 64  |  0.9914  |
|           volo_d1_224           | 64  |  0.9907  |
|      xcit_large_24_p8_224       |  5  |  0.9902  |
|     swsl_resnext101_32x16d      | 32  |  0.989   |
|            lcnet_050            | 128 |  0.9886  |
|           regnety_002           | 128 |  0.9871  |
|           fbnetc_100            | 128 |  0.9865  |
|           selecsls42b           | 128 |  0.9864  |
|          cspdarknet53           | 64  |  0.9863  |
|           mnasnet_100           | 128 |  0.9839  |
|          spnasnet_100           | 128 |  0.9833  |
|          ghostnet_100           | 128 |  0.9796  |
|            tinynet_a            | 128 |  0.9731  |
|          gmlp_s16_224           | 128 |  0.9425  |
|         mobilenetv2_100         | 128 |  0.8072  |
|          botnet26t_256          |  0  |   nan    |
|        convmixer_768_32         |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

@ESI-SYD
Copy link

ESI-SYD commented Nov 8, 2022

Performance Dashboard for float32 precision -- Single-core Single-thread

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 91%, 49/54 | 100%, 44/44 | 89%, 54/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    1.04x   |    1.00x    |    1.07x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    17.21   |    19.50    |    22.93    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    1.23x   |     1.30x   |    1.05x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+----+----------+
|               name                | bs | inductor |
+-----------------------------------+----+----------+
|        shufflenet_v2_x1_0         | 1  |  1.4924  |
|           squeezenet1_1           | 1  |  1.2456  |
|   pytorch_CycleGAN_and_pix2pix    | 1  |  1.1829  |
|             resnet18              | 1  |  1.1615  |
|       functorch_dp_cifar10        | 1  |  1.1532  |
|           pytorch_unet            | 1  |  1.1414  |
| attention_is_all_you_need_pytorch | 1  |  1.1106  |
|            timm_vovnet            | 1  |  1.1089  |
|            Super_SloMo            | 1  |  1.0927  |
|               vgg16               | 1  |  1.0908  |
|              alexnet              | 1  |  1.0802  |
|                drq                | 1  |  1.0768  |
|            timm_regnet            | 1  |  1.0688  |
|        Background_Matting         | 1  |  1.0576  |
|          LearningToPaint          | 1  |  1.0514  |
|               dlrm                | 1  |  1.0468  |
|            densenet121            | 1  |  1.0435  |
|           timm_resnest            | 1  |  1.042   |
|               dcgan               | 1  |  1.0316  |
|          pytorch_stargan          | 16 |  1.0133  |
|              demucs               | 1  |  0.9988  |
|            tts_angular            | 1  |  0.9982  |
|      resnet50_quantized_qat       | 1  |  0.9973  |
|            hf_BigBird             | 1  |  0.9936  |
|    mobilenet_v2_quantized_qat     | 1  |  0.9924  |
|            timm_nfnet             | 1  |  0.9911  |
|      nvidia_deeprecommender       | 1  |  0.9847  |
|           mobilenet_v2            | 1  |  0.9292  |
|            mnasnet1_0             | 1  |  0.9092  |
|              yolov3               | 1  |  0.8915  |
|              hf_GPT2              | 1  |  0.8666  |
|             resnet50              | 1  |  0.8611  |
|           hf_Longformer           | 1  |  0.8127  |
|           BERT_pytorch            | 1  |  0.8125  |
|            hf_T5_large            | 1  |  0.8077  |
|             hf_Albert             | 1  |  0.8057  |
|   timm_vision_transformer_large   | 1  |  0.8052  |
|            hf_Reformer            | 1  |  0.7879  |
|        mobilenet_v3_large         | 1  |  0.7861  |
|           lennard_jones           | 1  |  0.7822  |
|          resnext50_32x4d          | 1  |  0.7543  |
|           hf_GPT2_large           | 1  |  0.7516  |
|               hf_T5               | 1  |  0.7441  |
|           hf_DistilBert           | 1  |  0.7326  |
|              hf_Bart              | 1  |  0.7097  |
|            hf_T5_base             | 1  |  0.6893  |
|              hf_Bert              | 1  |  0.6852  |
|      timm_vision_transformer      | 1  |  0.5967  |
|         timm_efficientnet         | 1  |  0.5114  |
|         soft_actor_critic         | 0  |   0.0    |
|        speech_transformer         | 0  |   0.0    |
|         timm_efficientdet         | 0  |   0.0    |
|           fastNLP_Bert            | 0  |   0.0    |
|             tacotron2             | 0  |   0.0    |
+-----------------------------------+----+----------+

Accuracy

+-----------------------------------+----+------------------+
|               name                | bs |     inductor     |
+-----------------------------------+----+------------------+
|            hf_T5_large            | 1  | pass_due_to_skip |
|           hf_GPT2_large           | 1  | pass_due_to_skip |
|   timm_vision_transformer_large   | 1  | pass_due_to_skip |
|               hf_T5               | 1  |       pass       |
|            hf_Reformer            | 1  |       pass       |
|            hf_T5_base             | 1  |       pass       |
|          LearningToPaint          | 1  |       pass       |
|            Super_SloMo            | 1  |       pass       |
|              alexnet              | 1  |       pass       |
| attention_is_all_you_need_pytorch | 1  |       pass       |
|               dcgan               | 1  |       pass       |
|              demucs               | 1  |       pass       |
|            densenet121            | 1  |       pass       |
|               dlrm                | 1  |       pass       |
|                drq                | 1  |       pass       |
|              yolov3               | 1  |       pass       |
|           mobilenet_v2            | 1  |       pass       |
|             hf_Albert             | 1  |       pass       |
|              hf_Bart              | 1  |       pass       |
|              hf_Bert              | 1  |       pass       |
|            hf_BigBird             | 1  |       pass       |
|           hf_DistilBert           | 1  |       pass       |
|              hf_GPT2              | 1  |       pass       |
|           hf_Longformer           | 1  |       pass       |
|       functorch_dp_cifar10        | 1  |       pass       |
|           lennard_jones           | 1  |       pass       |
|        Background_Matting         | 1  |       pass       |
|          resnext50_32x4d          | 1  |       pass       |
|           BERT_pytorch            | 1  |       pass       |
|      resnet50_quantized_qat       | 1  |       pass       |
|    mobilenet_v2_quantized_qat     | 1  |       pass       |
|        mobilenet_v3_large         | 1  |       pass       |
|      nvidia_deeprecommender       | 1  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    | 1  |       pass       |
|          pytorch_stargan          | 16 |       pass       |
|           pytorch_unet            | 1  |       pass       |
|               vgg16               | 1  |       pass       |
|             resnet50              | 1  |       pass       |
|             resnet18              | 1  |       pass       |
|            mnasnet1_0             | 1  |       pass       |
|        shufflenet_v2_x1_0         | 1  |       pass       |
|           squeezenet1_1           | 1  |       pass       |
|         timm_efficientnet         | 1  |       pass       |
|            timm_nfnet             | 1  |       pass       |
|            timm_regnet            | 1  |       pass       |
|           timm_resnest            | 1  |       pass       |
|      timm_vision_transformer      | 1  |       pass       |
|            timm_vovnet            | 1  |       pass       |
|            tts_angular            | 1  |       pass       |
|         soft_actor_critic         | 0  |      0.0000      |
|           fastNLP_Bert            | 0  |      0.0000      |
|             tacotron2             | 0  |      0.0000      |
|         timm_efficientdet         | 0  |      0.0000      |
|        speech_transformer         | 0  |      0.0000      |
+-----------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------+----+----------+
|               name                | bs | inductor |
+-----------------------------------+----+----------+
|            hf_T5_base             | 1  | 76.7783  |
|            hf_T5_large            | 1  | 60.5122  |
|           hf_GPT2_large           | 1  | 59.4694  |
|            densenet121            | 1  | 30.8545  |
|   timm_vision_transformer_large   | 1  | 28.9246  |
|            timm_nfnet             | 1  | 25.4024  |
|              yolov3               | 1  | 24.6705  |
|           hf_Longformer           | 1  | 24.4957  |
|            Super_SloMo            | 1  | 24.3262  |
|         timm_efficientnet         | 1  | 23.1212  |
|        Background_Matting         | 1  | 22.8918  |
|            hf_BigBird             | 1  | 22.3391  |
|           pytorch_unet            | 1  | 18.9934  |
|            timm_regnet            | 1  | 18.4639  |
|            hf_Reformer            | 1  | 18.0788  |
|        mobilenet_v3_large         | 1  | 17.9186  |
|              hf_Bart              | 1  | 17.7819  |
|               hf_T5               | 1  | 17.7063  |
|            timm_vovnet            | 1  | 15.5172  |
|          resnext50_32x4d          | 1  | 14.7342  |
|             resnet50              | 1  | 14.6658  |
|              hf_Bert              | 1  | 14.5967  |
|           mobilenet_v2            | 1  |  14.549  |
|              hf_GPT2              | 1  | 13.9911  |
|           BERT_pytorch            | 1  | 13.8472  |
|           hf_DistilBert           | 1  | 13.8324  |
|           timm_resnest            | 1  |  13.25   |
|            mnasnet1_0             | 1  | 13.1723  |
|        shufflenet_v2_x1_0         | 1  | 12.8739  |
|             hf_Albert             | 1  | 12.6541  |
|      timm_vision_transformer      | 1  | 12.2517  |
| attention_is_all_you_need_pytorch | 1  | 11.9372  |
|       functorch_dp_cifar10        | 1  | 11.8487  |
|          pytorch_stargan          | 16 | 11.5192  |
|          LearningToPaint          | 1  | 10.9144  |
|   pytorch_CycleGAN_and_pix2pix    | 1  | 10.1494  |
|           squeezenet1_1           | 1  |  9.7543  |
|               vgg16               | 1  |  8.5459  |
|             resnet18              | 1  |  8.449   |
|      nvidia_deeprecommender       | 1  |  8.3243  |
|                drq                | 1  |  8.1052  |
|               dlrm                | 1  |  7.9679  |
|              alexnet              | 1  |  7.3684  |
|            tts_angular            | 1  |  6.9832  |
|           lennard_jones           | 1  |  6.5252  |
|              demucs               | 1  |  1.4428  |
|               dcgan               | 1  |  0.2565  |
|    mobilenet_v2_quantized_qat     | 1  |  0.174   |
|      resnet50_quantized_qat       | 1  |  0.1546  |
|           fastNLP_Bert            | 0  |   nan    |
|         soft_actor_critic         | 0  |   nan    |
|        speech_transformer         | 0  |   nan    |
|             tacotron2             | 0  |   nan    |
|         timm_efficientdet         | 0  |   nan    |
+-----------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------+----+----------+
|               name                | bs | inductor |
+-----------------------------------+----+----------+
|            hf_T5_base             | 1  |  4.3494  |
|           pytorch_unet            | 1  |  2.1317  |
|             hf_Albert             | 1  |  2.1001  |
|           hf_GPT2_large           | 1  |  1.9307  |
|            hf_BigBird             | 1  |  1.7971  |
|            Super_SloMo            | 1  |  1.7855  |
|            hf_T5_large            | 1  |  1.5672  |
|   pytorch_CycleGAN_and_pix2pix    | 1  |  1.4777  |
|               hf_T5               | 1  |  1.3382  |
|              hf_Bart              | 1  |  1.3275  |
|              hf_Bert              | 1  |  1.2964  |
|           hf_Longformer           | 1  |  1.2845  |
|               dlrm                | 1  |  1.2155  |
|          LearningToPaint          | 1  |  1.1866  |
|           hf_DistilBert           | 1  |  1.1789  |
|            timm_nfnet             | 1  |  1.1521  |
|            timm_regnet            | 1  |  1.1463  |
|              hf_GPT2              | 1  |  1.1158  |
|   timm_vision_transformer_large   | 1  |  1.1142  |
|                drq                | 1  |  1.0669  |
|            timm_vovnet            | 1  |  1.0587  |
|           mobilenet_v2            | 1  |  1.0514  |
|         timm_efficientnet         | 1  |  1.0509  |
|      timm_vision_transformer      | 1  |  1.0436  |
|              yolov3               | 1  |  1.0023  |
|              demucs               | 1  |  0.9985  |
|      nvidia_deeprecommender       | 1  |  0.9962  |
|      resnet50_quantized_qat       | 1  |  0.9955  |
|        Background_Matting         | 1  |  0.9953  |
|    mobilenet_v2_quantized_qat     | 1  |  0.9928  |
|            tts_angular            | 1  |  0.9921  |
|          pytorch_stargan          | 16 |  0.9883  |
|            mnasnet1_0             | 1  |  0.9844  |
|            densenet121            | 1  |  0.9831  |
| attention_is_all_you_need_pytorch | 1  |  0.983   |
|           lennard_jones           | 1  |  0.9818  |
|              alexnet              | 1  |  0.9803  |
|        mobilenet_v3_large         | 1  |  0.9771  |
|               vgg16               | 1  |  0.9758  |
|          resnext50_32x4d          | 1  |  0.975   |
|           squeezenet1_1           | 1  |  0.966   |
|               dcgan               | 1  |  0.9639  |
|        shufflenet_v2_x1_0         | 1  |  0.9582  |
|           timm_resnest            | 1  |  0.9582  |
|       functorch_dp_cifar10        | 1  |  0.9542  |
|             resnet50              | 1  |  0.9529  |
|            hf_Reformer            | 1  |  0.9437  |
|           BERT_pytorch            | 1  |  0.9242  |
|             resnet18              | 1  |  0.917   |
|           fastNLP_Bert            | 0  |   nan    |
|         soft_actor_critic         | 0  |   nan    |
|        speech_transformer         | 0  |   nan    |
|             tacotron2             | 0  |   nan    |
|         timm_efficientdet         | 0  |   nan    |
+-----------------------------------+----+----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|     MobileBertForQuestionAnswering      | 1  |  1.0761  |
|            XLNetLMHeadModel             | 1  |  1.049   |
|       AlbertForQuestionAnswering        | 1  |  0.9522  |
|            AlbertForMaskedLM            | 1  |  0.9483  |
|          MobileBertForMaskedLM          | 1  |  0.9464  |
|                 BigBird                 | 1  |  0.9053  |
|     M2M100ForConditionalGeneration      | 1  |  0.9032  |
|             OPTForCausalLM              | 1  |  0.8764  |
|               GoogleFnet                | 1  |  0.8565  |
|            YituTechConvBert             | 1  |  0.8172  |
|         Speech2Text2ForCausalLM         | 1  |  0.8158  |
|    MegatronBertForQuestionAnswering     | 1  |  0.8047  |
|     PegasusForConditionalGeneration     | 1  |  0.8046  |
|       DebertaForQuestionAnswering       | 1  |  0.8035  |
|      MBartForConditionalGeneration      | 1  |  0.8029  |
|         MegatronBertForCausalLM         | 1  |  0.7995  |
|       RobertaForQuestionAnswering       | 1  |  0.7913  |
|          AllenaiLongformerBase          | 1  |  0.791   |
|            TrOCRForCausalLM             | 1  |  0.7903  |
|           PegasusForCausalLM            | 1  |  0.7878  |
|             XGLMForCausalLM             | 1  |  0.7832  |
|            MBartForCausalLM             | 1  |  0.7824  |
|           RobertaForCausalLM            | 1  |  0.7728  |
|     PLBartForConditionalGeneration      | 1  |  0.7699  |
|     DistilBertForQuestionAnswering      | 1  |  0.7654  |
|           DebertaForMaskedLM            | 1  |  0.7648  |
|        BertForQuestionAnswering         | 1  |  0.756   |
|             BertForMaskedLM             | 1  |  0.7524  |
|            PLBartForCausalLM            | 1  |  0.7459  |
|          DistilBertForMaskedLM          | 1  |  0.7431  |
|               DistillGPT2               | 1  |  0.726   |
|       MT5ForConditionalGeneration       | 1  |  0.7145  |
|      GPT2ForSequenceClassification      | 1  |  0.6909  |
|           LayoutLMForMaskedLM           | 1  |  0.6807  |
| BlenderbotSmallForConditionalGeneration | 1  |   0.68   |
|    LayoutLMForSequenceClassification    | 1  |  0.6773  |
|             BartForCausalLM             | 1  |  0.6707  |
|                CamemBert                | 1  |   0.67   |
|       BlenderbotSmallForCausalLM        | 1  |  0.6692  |
|      BartForConditionalGeneration       | 1  |  0.6425  |
|       T5ForConditionalGeneration        | 1  |  0.6322  |
|                 T5Small                 | 1  |  0.6311  |
|       ElectraForQuestionAnswering       | 1  |  0.5028  |
|           ElectraForCausalLM            | 1  |  0.4886  |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|      BartForConditionalGeneration       | 1  | 49.6995  |
|            AlbertForMaskedLM            | 1  | 32.9211  |
|          AllenaiLongformerBase          | 1  |  31.457  |
|       AlbertForQuestionAnswering        | 1  | 30.8554  |
|            YituTechConvBert             | 1  | 25.3315  |
|             BartForCausalLM             | 1  | 24.3902  |
|           PegasusForCausalLM            | 1  |  23.914  |
|          MobileBertForMaskedLM          | 1  | 23.6263  |
|            XLNetLMHeadModel             | 1  |  23.513  |
|     PegasusForConditionalGeneration     | 1  | 23.4144  |
|     M2M100ForConditionalGeneration      | 1  | 23.2889  |
|     MobileBertForQuestionAnswering      | 1  | 23.2762  |
|                 BigBird                 | 1  | 22.9858  |
|      MBartForConditionalGeneration      | 1  | 22.3898  |
|           DebertaForMaskedLM            | 1  |  22.357  |
|       T5ForConditionalGeneration        | 1  |  22.172  |
|       MT5ForConditionalGeneration       | 1  | 22.1254  |
|             XGLMForCausalLM             | 1  | 22.0928  |
|                 T5Small                 | 1  | 21.8409  |
|       DebertaForQuestionAnswering       | 1  | 21.3565  |
|         MegatronBertForCausalLM         | 1  | 20.1642  |
|             BertForMaskedLM             | 1  | 20.0772  |
|    MegatronBertForQuestionAnswering     | 1  | 19.6571  |
|      GPT2ForSequenceClassification      | 1  | 19.1087  |
|               GoogleFnet                | 1  | 18.4626  |
| BlenderbotSmallForConditionalGeneration | 1  | 15.2419  |
|                CamemBert                | 1  | 15.1637  |
|     PLBartForConditionalGeneration      | 1  | 14.9998  |
|           LayoutLMForMaskedLM           | 1  | 14.9718  |
|            MBartForCausalLM             | 1  |  14.451  |
|    LayoutLMForSequenceClassification    | 1  | 14.2165  |
|        BertForQuestionAnswering         | 1  | 13.9057  |
|             OPTForCausalLM              | 1  | 13.7452  |
|               DistillGPT2               | 1  |  13.034  |
|            TrOCRForCausalLM             | 1  | 12.9223  |
|           ElectraForCausalLM            | 1  | 12.9222  |
|           RobertaForCausalLM            | 1  | 12.9136  |
|       RobertaForQuestionAnswering       | 1  | 12.6522  |
|       ElectraForQuestionAnswering       | 1  | 12.3606  |
|       BlenderbotSmallForCausalLM        | 1  | 11.2526  |
|         Speech2Text2ForCausalLM         | 1  | 11.1537  |
|            PLBartForCausalLM            | 1  | 10.7109  |
|          DistilBertForMaskedLM          | 1  | 10.5215  |
|     DistilBertForQuestionAnswering      | 1  | 10.2008  |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |  2.7784  |
|       AlbertForQuestionAnswering        | 1  |  2.726   |
|      BartForConditionalGeneration       | 1  |  2.1303  |
|       T5ForConditionalGeneration        | 1  |  1.8725  |
|                 T5Small                 | 1  |  1.8719  |
|                 BigBird                 | 1  |  1.8559  |
|      GPT2ForSequenceClassification      | 1  |  1.8462  |
|             BartForCausalLM             | 1  |  1.515   |
|          AllenaiLongformerBase          | 1  |  1.486   |
|            XLNetLMHeadModel             | 1  |  1.3611  |
|       DebertaForQuestionAnswering       | 1  |  1.3577  |
|                CamemBert                | 1  |  1.3398  |
|           LayoutLMForMaskedLM           | 1  |  1.3339  |
|            YituTechConvBert             | 1  |  1.3279  |
|           DebertaForMaskedLM            | 1  |  1.3117  |
|               GoogleFnet                | 1  |  1.2932  |
|    LayoutLMForSequenceClassification    | 1  |  1.2861  |
|           ElectraForCausalLM            | 1  |  1.2708  |
|               DistillGPT2               | 1  |  1.1948  |
|       ElectraForQuestionAnswering       | 1  |  1.1867  |
|     PegasusForConditionalGeneration     | 1  |  1.1209  |
|      MBartForConditionalGeneration      | 1  |  1.0693  |
|       MT5ForConditionalGeneration       | 1  |  1.0645  |
|             BertForMaskedLM             | 1  |  1.0616  |
|     M2M100ForConditionalGeneration      | 1  |  1.0565  |
|         MegatronBertForCausalLM         | 1  |  1.056   |
|           RobertaForCausalLM            | 1  |  1.0498  |
|            TrOCRForCausalLM             | 1  |  1.0498  |
|    MegatronBertForQuestionAnswering     | 1  |  1.0494  |
|       BlenderbotSmallForCausalLM        | 1  |  1.0494  |
|             OPTForCausalLM              | 1  |  1.0467  |
|        BertForQuestionAnswering         | 1  |  1.038   |
|          DistilBertForMaskedLM          | 1  |  1.0359  |
|       RobertaForQuestionAnswering       | 1  |  1.0347  |
|             XGLMForCausalLM             | 1  |  1.0325  |
|     PLBartForConditionalGeneration      | 1  |  1.0286  |
| BlenderbotSmallForConditionalGeneration | 1  |  1.0284  |
|            MBartForCausalLM             | 1  |  1.026   |
|            PLBartForCausalLM            | 1  |  1.0254  |
|     DistilBertForQuestionAnswering      | 1  |  1.017   |
|           PegasusForCausalLM            | 1  |  1.0034  |
|          MobileBertForMaskedLM          | 1  |  0.9964  |
|         Speech2Text2ForCausalLM         | 1  |  0.9816  |
|     MobileBertForQuestionAnswering      | 1  |  0.9741  |
+-----------------------------------------+----+----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  |  1.5744  |
|           regnety_002           | 1  |  1.3378  |
|        ese_vovnet19b_dw         | 1  |  1.3321  |
|          inception_v3           | 1  |  1.332   |
|       gluon_inception_v3        | 1  |  1.3149  |
|           mnasnet_100           | 1  |  1.3143  |
|        adv_inception_v3         | 1  |  1.3085  |
|            lcnet_050            | 1  |  1.2876  |
|          spnasnet_100           | 1  |  1.2836  |
|           fbnetc_100            | 1  |  1.2526  |
|         mobilenetv2_100         | 1  |  1.2212  |
|            fbnetv3_b            | 1  |   1.16   |
|      mobilenetv3_large_100      | 1  |  1.1506  |
|            gernet_l             | 1  |  1.0846  |
|             dpn107              | 1  |  1.0695  |
|        gluon_xception65         | 1  |  1.0594  |
|          cspdarknet53           | 1  |  1.0591  |
|            repvgg_a2            | 1  |  1.0524  |
|            hrnet_w18            | 1  |  1.0432  |
|          ghostnet_100           | 1  |  1.0058  |
|            nfnet_l0             | 1  |  1.0031  |
|           resnest101e           | 1  |  0.9801  |
|         crossvit_9_240          | 1  |  0.9629  |
|        twins_pcpvt_base         | 1  |  0.9453  |
|           selecsls42b           | 1  |  0.9185  |
|           dm_nfnet_f0           | 1  |  0.9119  |
|         visformer_small         | 1  |  0.9093  |
|        res2net101_26w_4s        | 1  |  0.9033  |
|      xcit_large_24_p8_224       | 1  |  0.9028  |
|          convnext_base          | 1  |  0.8551  |
|        res2net50_14w_8s         | 1  |  0.8483  |
|      beit_base_patch16_224      | 1  |  0.8427  |
|          gmixer_24_224          | 1  |  0.8267  |
| deit_base_distilled_patch16_224 | 1  |   0.79   |
|           res2next50            | 1  |  0.7897  |
|  swin_base_patch4_window7_224   | 1  |  0.7765  |
|           convit_base           | 1  |  0.7703  |
|          cait_m36_384           | 1  |  0.7632  |
|      vit_base_patch16_224       | 1  |  0.761   |
|           volo_d1_224           | 1  |  0.7515  |
|         poolformer_m36          | 1  |  0.736   |
|            mixnet_l             | 1  |  0.729   |
|             dla102              | 1  |  0.7196  |
|           tf_mixnet_l           | 1  |  0.7186  |
|          mixer_b16_224          | 1  |  0.7082  |
|          resmlp_12_224          | 1  |  0.666   |
|            pit_b_224            | 1  |  0.6644  |
|           rexnet_100            | 1  |  0.6578  |
|        tnt_s_patch16_224        | 1  |  0.6416  |
|          gmlp_s16_224           | 1  |  0.6327  |
|          jx_nest_base           | 1  |  0.6187  |
|           mobilevit_s           | 1  |  0.5702  |
|            tinynet_a            | 1  |  0.5466  |
|         coat_lite_mini          | 1  |  0.5378  |
|       tf_efficientnet_b0        | 1  |  0.5215  |
|     swsl_resnext101_32x16d      | 1  |  0.0669  |
|        sebotnet33ts_256         | 0  |   0.0    |
|       eca_botnext26ts_256       | 0  |   0.0    |
|        eca_halonext26ts         | 0  |   0.0    |
|          botnet26t_256          | 0  |   0.0    |
|        convmixer_768_32         | 0  |   0.0    |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|         coat_lite_mini          | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|             dla102              | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|           volo_d1_224           | 1  |     pass      |
|      xcit_large_24_p8_224       | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|        gluon_xception65         | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|            hrnet_w18            | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|           res2next50            | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|          pnasnet5large          | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|         visformer_small         | 1  |     pass      |
|        res2net50_14w_8s         | 1  |     pass      |
|        res2net101_26w_4s        | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|           resnest101e           | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|     swsl_resnext101_32x16d      | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|           selecsls42b           | 1  |     pass      |
|           rexnet_100            | 1  |     pass      |
|        eca_halonext26ts         | 1  |  fail_to_run  |
|        convmixer_768_32         | 1  |  fail_to_run  |
|        sebotnet33ts_256         | 1  |  fail_to_run  |
|          botnet26t_256          | 1  |  fail_to_run  |
|       eca_botnext26ts_256       | 1  |  fail_to_run  |
|          ghostnet_100           | 1  | fail_accuracy |
|          cait_m36_384           | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|     swsl_resnext101_32x16d      | 1  | 85.6242  |
|          pnasnet5large          | 1  | 43.6874  |
|            fbnetv3_b            | 1  | 34.4074  |
|          cait_m36_384           | 1  | 33.7081  |
|           tf_mixnet_l           | 1  | 33.6983  |
|        twins_pcpvt_base         | 1  | 31.5227  |
|            hrnet_w18            | 1  | 31.1337  |
|           rexnet_100            | 1  | 30.7918  |
|  swin_base_patch4_window7_224   | 1  | 30.3129  |
|           mobilevit_s           | 1  | 30.2527  |
|            mixnet_l             | 1  |  29.901  |
|      xcit_large_24_p8_224       | 1  | 28.4048  |
|             dpn107              | 1  | 26.7877  |
|        adv_inception_v3         | 1  | 26.4651  |
|        res2net50_14w_8s         | 1  | 26.3563  |
|           dm_nfnet_f0           | 1  | 26.1631  |
|          ghostnet_100           | 1  | 25.6518  |
|         poolformer_m36          | 1  | 24.9019  |
|       tf_efficientnet_b0        | 1  | 24.8772  |
|           resnest101e           | 1  | 24.5232  |
|         visformer_small         | 1  | 24.3597  |
|            tinynet_a            | 1  | 24.2595  |
|        res2net101_26w_4s        | 1  | 23.5626  |
|          jx_nest_base           | 1  | 23.1414  |
|         coat_lite_mini          | 1  | 22.4646  |
|            nfnet_l0             | 1  | 22.4105  |
|      mobilenetv3_large_100      | 1  | 22.3182  |
|             dla102              | 1  | 21.2783  |
|           volo_d1_224           | 1  | 21.2219  |
|         crossvit_9_240          | 1  | 21.2091  |
|        tnt_s_patch16_224        | 1  | 21.0931  |
|          convnext_base          | 1  | 19.4932  |
|          cspdarknet53           | 1  | 19.3065  |
|           fbnetc_100            | 1  | 18.9707  |
|       gluon_inception_v3        | 1  | 18.6493  |
|           res2next50            | 1  | 18.5211  |
|            pit_b_224            | 1  | 18.1465  |
|          spnasnet_100           | 1  | 18.1066  |
|          inception_v3           | 1  | 17.9952  |
|          gmlp_s16_224           | 1  | 17.9558  |
|      beit_base_patch16_224      | 1  |  17.513  |
|           regnety_002           | 1  | 17.0633  |
|        gluon_xception65         | 1  | 16.8521  |
|           mnasnet_100           | 1  | 16.5367  |
|         mobilenetv2_100         | 1  | 16.3189  |
|           convit_base           | 1  | 16.2035  |
|            gernet_l             | 1  | 14.7975  |
|          gmixer_24_224          | 1  |  14.685  |
|           selecsls42b           | 1  | 14.2609  |
|        ese_vovnet19b_dw         | 1  | 13.4973  |
|          mixer_b16_224          | 1  |  13.462  |
|      vit_base_patch16_224       | 1  | 13.3318  |
| deit_base_distilled_patch16_224 | 1  | 13.0111  |
|            repvgg_a2            | 1  | 12.9665  |
|            lcnet_050            | 1  | 12.8088  |
|          resmlp_12_224          | 1  | 10.4233  |
|          botnet26t_256          | 0  |   nan    |
|        convmixer_768_32         | 0  |   nan    |
|       eca_botnext26ts_256       | 0  |   nan    |
|        eca_halonext26ts         | 0  |   nan    |
|        sebotnet33ts_256         | 0  |   nan    |
+---------------------------------+----+----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|      xcit_large_24_p8_224       | 1  |  2.2814  |
|          cait_m36_384           | 1  |  2.1693  |
|           dm_nfnet_f0           | 1  |  1.2079  |
|           mobilevit_s           | 1  |  1.2047  |
|          jx_nest_base           | 1  |  1.1988  |
|            nfnet_l0             | 1  |  1.1854  |
|             dpn107              | 1  |  1.1766  |
|        gluon_xception65         | 1  |  1.1551  |
|          convnext_base          | 1  |  1.155   |
|           convit_base           | 1  |  1.1399  |
|          cspdarknet53           | 1  |  1.1261  |
|         poolformer_m36          | 1  |  1.1142  |
|  swin_base_patch4_window7_224   | 1  |  1.106   |
|            pit_b_224            | 1  |  1.1013  |
|      beit_base_patch16_224      | 1  |  1.0972  |
|        twins_pcpvt_base         | 1  |  1.0907  |
|        tnt_s_patch16_224        | 1  |  1.0873  |
|           volo_d1_224           | 1  |  1.087   |
| deit_base_distilled_patch16_224 | 1  |  1.082   |
|      vit_base_patch16_224       | 1  |  1.0789  |
|          mixer_b16_224          | 1  |  1.0786  |
|         mobilenetv2_100         | 1  |  1.0611  |
|        ese_vovnet19b_dw         | 1  |  1.0596  |
|       tf_efficientnet_b0        | 1  |  1.057   |
|          resmlp_12_224          | 1  |  1.0457  |
|         coat_lite_mini          | 1  |  1.0422  |
|           rexnet_100            | 1  |  1.0408  |
|            mixnet_l             | 1  |  1.0377  |
|            gernet_l             | 1  |  1.0281  |
|            fbnetv3_b            | 1  |  1.0236  |
|            tinynet_a            | 1  |  1.0189  |
|           fbnetc_100            | 1  |  1.0121  |
|          spnasnet_100           | 1  |  1.0098  |
|           mnasnet_100           | 1  |  1.0097  |
|            repvgg_a2            | 1  |  1.0028  |
|         crossvit_9_240          | 1  |  0.9982  |
|           resnest101e           | 1  |  0.9928  |
|      mobilenetv3_large_100      | 1  |  0.9677  |
|            lcnet_050            | 1  |  0.9643  |
|           regnety_002           | 1  |  0.9604  |
|         visformer_small         | 1  |  0.9598  |
|          inception_v3           | 1  |  0.9554  |
|           res2next50            | 1  |  0.9538  |
|             dla102              | 1  |  0.9508  |
|        adv_inception_v3         | 1  |  0.9508  |
|            hrnet_w18            | 1  |  0.9504  |
|          ghostnet_100           | 1  |  0.9456  |
|        res2net50_14w_8s         | 1  |  0.9426  |
|       gluon_inception_v3        | 1  |  0.9393  |
|          gmixer_24_224          | 1  |  0.9237  |
|        res2net101_26w_4s        | 1  |  0.9135  |
|          gmlp_s16_224           | 1  |  0.8851  |
|     swsl_resnext101_32x16d      | 1  |  0.8458  |
|           tf_mixnet_l           | 1  |  0.8379  |
|           selecsls42b           | 1  |  0.8177  |
|          pnasnet5large          | 1  |  0.7496  |
|          botnet26t_256          | 0  |   nan    |
|        convmixer_768_32         | 0  |   nan    |
|       eca_botnext26ts_256       | 0  |   nan    |
|        eca_halonext26ts         | 0  |   nan    |
|        sebotnet33ts_256         | 0  |   nan    |
+---------------------------------+----+----------+

@chuanqi129
Copy link
Collaborator

chuanqi129 commented Nov 11, 2022

Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2022-11-09 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information

SW Nightly commit Master/Main commit
Pytorch 3b29687 a7420d2
Torchbench / 022dfe3
torchaudio 4b10b6a 74f9a89
torchtext 71e4561 c047efe
torchvision 797e1ac ffd5a56

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 20.04.5 LTS
Kernel 5.15.0-1022-aws
Microcode 0xd000331
GCC gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
GLIBC ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
Binutils GNU ld (GNU Binutils for Ubuntu) 2.34
Python Python 3.8.13
OpenSSL OpenSSL 1.1.1s 1 Nov 2022

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+-------------+-------------+-------------+
| Compiler | torchbench  | huggingface | timm_models |
+----------+-------------+-------------+-------------+
| inductor | 100%, 55/55 | 100%, 44/44 | 92%, 56/61  |
+----------+-------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.08x    |    1.07x    |    1.09x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   14.89    |    16.47    |    20.62    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.95x    |    0.97x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|        shufflenet_v2_x1_0         |  64  |  1.8287  |
|         soft_actor_critic         | 256  |  1.6029  |
|            densenet121            |  64  |   1.46   |
|            mnasnet1_0             |  32  |  1.2959  |
|             resnet18              |  8   |  1.2223  |
|           pytorch_unet            |  1   |  1.1886  |
|       functorch_dp_cifar10        |  64  |  1.1859  |
|              alexnet              | 128  |  1.1594  |
|           squeezenet1_1           |  16  |  1.1573  |
|             resnet152             |  32  |  1.1529  |
|           mobilenet_v2            |  16  |  1.1485  |
|           timm_resnest            |  32  |  1.1455  |
|        mobilenet_v3_large         |  32  |  1.1413  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.1121  |
|            timm_vovnet            |  32  |  1.1105  |
|             resnet50              |  32  |  1.0939  |
|               dcgan               | 256  |  1.0811  |
|               dlrm                | 2048 |  1.081   |
|            Super_SloMo            |  6   |  1.0735  |
|          LearningToPaint          |  96  |  1.0641  |
|        Background_Matting         |  1   |  1.0627  |
|               vgg16               |  4   |  1.0589  |
|           BERT_pytorch            |  2   |  1.0586  |
|            timm_regnet            |  32  |  1.0365  |
|            hf_T5_large            |  1   |  1.036   |
|          pytorch_stargan          |  16  |  1.0281  |
|     detectron2_fcos_r_50_fpn      |  1   |  1.0179  |
|            hf_Reformer            |  1   |  1.0089  |
|            tts_angular            |  64  |  1.0026  |
|            hf_BigBird             |  1   |  0.9993  |
|              demucs               |  1   |  0.999   |
|    mobilenet_v2_quantized_qat     |  96  |  0.9988  |
|                drq                |  1   |  0.9979  |
|      resnet50_quantized_qat       |  32  |  0.9952  |
|           hf_Longformer           |  1   |  0.9891  |
|               hf_T5               |  1   |  0.978   |
|          vision_maskrcnn          |  1   |  0.9622  |
|        speech_transformer         |  1   |  0.9319  |
|           hf_GPT2_large           |  1   |  0.9271  |
|              hf_GPT2              |  1   |  0.9202  |
|   timm_vision_transformer_large   |  8   |  0.9158  |
|            timm_nfnet             | 128  |  0.9154  |
|           hf_DistilBert           |  1   |  0.8986  |
|             hf_Albert             |  1   |  0.8814  |
|              hf_Bert              |  1   |  0.8779  |
|              yolov3               |  8   |  0.8727  |
|           fastNLP_Bert            |  1   |  0.8721  |
|              hf_Bart              |  1   |  0.8498  |
|          resnext50_32x4d          |  8   |  0.8362  |
|      nvidia_deeprecommender       | 256  |  0.8304  |
|            hf_T5_base             |  1   |  0.8286  |
|      timm_vision_transformer      |  8   |  0.764   |
|         timm_efficientnet         |  64  |  0.7439  |
| attention_is_all_you_need_pytorch |  32  |  0.7132  |
|           lennard_jones           | 1000 |  0.3489  |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  2  | pass_due_to_skip |
|           hf_GPT2_large           |  2  | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip |
|              yolov3               |  2  |       pass       |
|                drq                |  1  |       pass       |
|        Background_Matting         |  1  |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |
|          LearningToPaint          |  2  |       pass       |
|            Super_SloMo            |  2  |       pass       |
|              alexnet              |  2  |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |
|               dcgan               |  2  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  2  |       pass       |
|               dlrm                |  2  |       pass       |
|           lennard_jones           |  2  |       pass       |
|            hf_T5_base             |  2  |       pass       |
|            mnasnet1_0             |  2  |       pass       |
|             hf_Albert             |  2  |       pass       |
|              hf_Bart              |  2  |       pass       |
|              hf_Bert              |  2  |       pass       |
|            hf_BigBird             |  2  |       pass       |
|           hf_DistilBert           |  2  |       pass       |
|              hf_GPT2              |  2  |       pass       |
|           hf_Longformer           |  2  |       pass       |
|            hf_Reformer            |  2  |       pass       |
|               hf_T5               |  2  |       pass       |
|           fastNLP_Bert            |  2  |       pass       |
|    mobilenet_v2_quantized_qat     |  2  |       pass       |
|               vgg16               |  2  |       pass       |
|            tts_angular            |  2  |       pass       |
|           BERT_pytorch            |  2  |       pass       |
|          resnext50_32x4d          |  2  |       pass       |
|        mobilenet_v3_large         |  2  |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|           pytorch_unet            |  2  |       pass       |
|             resnet152             |  2  |       pass       |
|             resnet18              |  2  |       pass       |
|             resnet50              |  2  |       pass       |
|      resnet50_quantized_qat       |  2  |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |
|           mobilenet_v2            |  2  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|        speech_transformer         |  2  |       pass       |
|           squeezenet1_1           |  2  |       pass       |
|         timm_efficientnet         |  2  |       pass       |
|            timm_nfnet             |  2  |       pass       |
|            timm_regnet            |  2  |       pass       |
|           timm_resnest            |  2  |       pass       |
|      timm_vision_transformer      |  2  |       pass       |
|            timm_vovnet            |  2  |       pass       |
|     detectron2_fcos_r_50_fpn      |  2  |   fail_to_run    |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|     detectron2_fcos_r_50_fpn      |  1   | 64.5524  |
|          vision_maskrcnn          |  1   | 63.9086  |
|           BERT_pytorch            |  2   | 45.7082  |
|            hf_T5_large            |  1   | 33.6462  |
|           hf_GPT2_large           |  1   | 22.5423  |
|            hf_BigBird             |  1   | 22.4972  |
|            hf_T5_base             |  1   | 22.2742  |
|            timm_nfnet             | 128  | 22.1908  |
|   timm_vision_transformer_large   |  8   | 21.7555  |
|           hf_Longformer           |  1   | 21.6172  |
|            densenet121            |  64  | 21.4901  |
|              yolov3               |  8   |  19.69   |
|           mobilenet_v2            |  16  | 18.9421  |
|            Super_SloMo            |  6   | 17.2204  |
|             resnet152             |  32  | 17.0653  |
|        speech_transformer         |  1   | 15.5597  |
|            timm_regnet            |  32  |  15.255  |
|              hf_Bart              |  1   | 13.7245  |
|         timm_efficientnet         |  64  | 13.6015  |
|            hf_Reformer            |  1   | 13.5318  |
|           fastNLP_Bert            |  1   | 13.1859  |
| attention_is_all_you_need_pytorch |  32  | 13.0332  |
|               hf_T5               |  1   | 12.8551  |
|              hf_Bert              |  1   | 12.2325  |
|              hf_GPT2              |  1   | 11.7985  |
|      timm_vision_transformer      |  8   | 11.7079  |
|        Background_Matting         |  1   | 11.3904  |
|             hf_Albert             |  1   | 11.3643  |
|            timm_vovnet            |  32  | 11.1937  |
|             resnet50              |  32  | 11.0994  |
|        mobilenet_v3_large         |  32  | 10.9908  |
|          resnext50_32x4d          |  8   | 10.7624  |
|        shufflenet_v2_x1_0         |  64  | 10.6591  |
|            mnasnet1_0             |  32  | 10.2952  |
|           hf_DistilBert           |  1   | 10.1395  |
|           timm_resnest            |  32  |  9.9293  |
|       functorch_dp_cifar10        |  64  |  9.5011  |
|           pytorch_unet            |  1   |  9.3573  |
|          pytorch_stargan          |  16  |  9.3224  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  9.2365  |
|          LearningToPaint          |  96  |  8.6886  |
|           squeezenet1_1           |  16  |  8.6509  |
|             resnet18              |  8   |  8.522   |
|      nvidia_deeprecommender       | 256  |  8.3779  |
|               vgg16               |  4   |  8.2719  |
|               dlrm                | 2048 |  8.2115  |
|              alexnet              | 128  |  8.1818  |
|            tts_angular            |  64  |  8.0338  |
|                drq                |  1   |  7.9702  |
|         soft_actor_critic         | 256  |  7.7336  |
|           lennard_jones           | 1000 |  7.7074  |
|              demucs               |  1   |  1.2856  |
|               dcgan               | 256  |  0.3822  |
|    mobilenet_v2_quantized_qat     |  96  |  0.0985  |
|      resnet50_quantized_qat       |  32  |  0.0088  |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|              demucs               |  1   |  0.9988  |
|               dlrm                | 2048 |  0.9971  |
|      resnet50_quantized_qat       |  32  |  0.996   |
|        Background_Matting         |  1   |  0.9938  |
|            timm_regnet            |  32  |  0.9936  |
|               vgg16               |  4   |  0.9927  |
|            timm_nfnet             | 128  |  0.9917  |
|          LearningToPaint          |  96  |  0.9912  |
|            hf_BigBird             |  1   |  0.9895  |
|            densenet121            |  64  |  0.9893  |
|             resnet152             |  32  |  0.9889  |
|             resnet50              |  32  |  0.9866  |
|           pytorch_unet            |  1   |  0.986   |
|   timm_vision_transformer_large   |  8   |  0.9859  |
|            mnasnet1_0             |  32  |  0.9852  |
|           lennard_jones           | 1000 |  0.9837  |
|           mobilenet_v2            |  16  |  0.9828  |
|         soft_actor_critic         | 256  |  0.9828  |
|                drq                |  1   |  0.9819  |
| attention_is_all_you_need_pytorch |  32  |  0.9818  |
|        mobilenet_v3_large         |  32  |  0.9814  |
|        shufflenet_v2_x1_0         |  64  |  0.9813  |
|           hf_GPT2_large           |  1   |  0.9793  |
|            Super_SloMo            |  6   |  0.9789  |
|              hf_Bart              |  1   |  0.9772  |
|     detectron2_fcos_r_50_fpn      |  1   |  0.9771  |
|            timm_vovnet            |  32  |  0.9767  |
|           hf_DistilBert           |  1   |  0.9761  |
|    mobilenet_v2_quantized_qat     |  96  |  0.9748  |
|             hf_Albert             |  1   |  0.9741  |
|           squeezenet1_1           |  16  |  0.9733  |
|          resnext50_32x4d          |  8   |  0.9732  |
|            tts_angular            |  64  |  0.9722  |
|           BERT_pytorch            |  2   |  0.9715  |
|         timm_efficientnet         |  64  |  0.9707  |
|        speech_transformer         |  1   |  0.9686  |
|          vision_maskrcnn          |  1   |  0.9681  |
|              hf_GPT2              |  1   |  0.9645  |
|          pytorch_stargan          |  16  |  0.9623  |
|               dcgan               | 256  |  0.957   |
|             resnet18              |  8   |  0.9524  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.951   |
|            hf_T5_base             |  1   |  0.9448  |
|      timm_vision_transformer      |  8   |  0.9343  |
|            hf_Reformer            |  1   |  0.9338  |
|              alexnet              | 128  |  0.9295  |
|           fastNLP_Bert            |  1   |  0.9217  |
|              yolov3               |  8   |  0.9142  |
|       functorch_dp_cifar10        |  64  |  0.9129  |
|           timm_resnest            |  32  |  0.8811  |
|           hf_Longformer           |  1   |  0.877   |
|              hf_Bert              |  1   |  0.7856  |
|            hf_T5_large            |  1   |  0.6783  |
|               hf_T5               |  1   |  0.6081  |
|      nvidia_deeprecommender       | 256  |  0.5844  |
+-----------------------------------+------+----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       MT5ForConditionalGeneration       |  8  |  2.4476  |
|            XLNetLMHeadModel             | 32  |  1.9966  |
|            YituTechConvBert             |  1  |  1.3041  |
|               DistillGPT2               |  1  |  1.2765  |
|     MobileBertForQuestionAnswering      | 64  |  1.2614  |
|               GoogleFnet                |  1  |  1.2128  |
|          MobileBertForMaskedLM          | 32  |  1.0902  |
|                 BigBird                 |  1  |  1.0684  |
|             XGLMForCausalLM             |  8  |  1.0601  |
|     M2M100ForConditionalGeneration      |  8  |  1.0597  |
|          AllenaiLongformerBase          |  1  |  1.0388  |
|                CamemBert                |  1  |  1.0358  |
|            AlbertForMaskedLM            |  4  |  1.0232  |
|       AlbertForQuestionAnswering        |  4  |  1.0218  |
|             OPTForCausalLM              | 32  |  0.9935  |
|           DebertaForMaskedLM            |  4  |  0.9902  |
|                 T5Small                 |  1  |  0.9826  |
|       DebertaForQuestionAnswering       |  8  |  0.9465  |
|     PLBartForConditionalGeneration      | 16  |  0.9233  |
|      MBartForConditionalGeneration      | 16  |  0.9222  |
|         MegatronBertForCausalLM         | 16  |  0.9217  |
|     PegasusForConditionalGeneration     | 16  |  0.9195  |
|      GPT2ForSequenceClassification      |  4  |  0.9192  |
|    MegatronBertForQuestionAnswering     | 16  |  0.9112  |
|    LayoutLMForSequenceClassification    | 16  |  0.9082  |
|           RobertaForCausalLM            | 64  |  0.9069  |
|         Speech2Text2ForCausalLM         | 128 |  0.9055  |
|            TrOCRForCausalLM             | 32  |  0.9011  |
|           ElectraForCausalLM            | 32  |  0.8992  |
|     DistilBertForQuestionAnswering      | 64  |  0.8984  |
|            PLBartForCausalLM            | 32  |  0.8943  |
|           PegasusForCausalLM            | 32  |  0.8924  |
|       RobertaForQuestionAnswering       | 128 |  0.8892  |
|           LayoutLMForMaskedLM           | 16  |  0.8876  |
|             BertForMaskedLM             | 64  |  0.8794  |
|        BertForQuestionAnswering         | 128 |  0.8784  |
|       ElectraForQuestionAnswering       | 64  |  0.8612  |
|          DistilBertForMaskedLM          | 64  |  0.8588  |
|       T5ForConditionalGeneration        |  4  |  0.8556  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.8261  |
|             BartForCausalLM             |  4  |  0.8217  |
|      BartForConditionalGeneration       |  2  |  0.8188  |
|       BlenderbotSmallForCausalLM        | 64  |  0.8048  |
|            MBartForCausalLM             | 32  |  0.7923  |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|     PegasusForConditionalGeneration     | 16  | 23.9196  |
|          MobileBertForMaskedLM          | 32  | 23.8928  |
|          AllenaiLongformerBase          |  1  | 23.1661  |
|     MobileBertForQuestionAnswering      | 64  | 22.9028  |
|           DebertaForMaskedLM            |  4  | 22.8131  |
|       DebertaForQuestionAnswering       |  8  | 22.6847  |
|     M2M100ForConditionalGeneration      |  8  | 22.5781  |
|      BartForConditionalGeneration       |  2  | 21.6736  |
|                 BigBird                 |  1  | 21.5731  |
|      MBartForConditionalGeneration      | 16  | 21.0786  |
|             XGLMForCausalLM             |  8  | 20.0701  |
|          DistilBertForMaskedLM          | 64  |  19.039  |
|         MegatronBertForCausalLM         | 16  | 18.8882  |
|       AlbertForQuestionAnswering        |  4  | 18.6163  |
|    MegatronBertForQuestionAnswering     | 16  | 18.5201  |
|            YituTechConvBert             |  1  | 17.9706  |
| BlenderbotSmallForConditionalGeneration | 64  | 17.3304  |
|           PegasusForCausalLM            | 32  | 16.4097  |
|            AlbertForMaskedLM            |  4  | 16.1552  |
|       MT5ForConditionalGeneration       |  8  | 15.8319  |
|       RobertaForQuestionAnswering       | 128 | 15.5256  |
|        BertForQuestionAnswering         | 128 | 15.4197  |
|           LayoutLMForMaskedLM           | 16  | 15.2302  |
|       T5ForConditionalGeneration        |  4  | 14.9184  |
|           RobertaForCausalLM            | 64  | 14.8398  |
|             BartForCausalLM             |  4  |  14.813  |
|       ElectraForQuestionAnswering       | 64  | 14.6766  |
|             BertForMaskedLM             | 64  |  14.621  |
|    LayoutLMForSequenceClassification    | 16  | 14.5713  |
|           ElectraForCausalLM            | 32  | 14.4872  |
|     PLBartForConditionalGeneration      | 16  | 14.4639  |
|            MBartForCausalLM             | 32  | 14.2826  |
|      GPT2ForSequenceClassification      |  4  | 14.1216  |
|            TrOCRForCausalLM             | 32  | 13.7212  |
|             OPTForCausalLM              | 32  | 13.2485  |
|                 T5Small                 |  1  |  13.094  |
|                CamemBert                |  1  | 12.6413  |
|               GoogleFnet                |  1  | 12.4936  |
|         Speech2Text2ForCausalLM         | 128 | 12.3975  |
|       BlenderbotSmallForCausalLM        | 64  | 12.2115  |
|     DistilBertForQuestionAnswering      | 64  | 11.0523  |
|            PLBartForCausalLM            | 32  |  10.871  |
|               DistillGPT2               |  1  |  9.8446  |
|            XLNetLMHeadModel             | 32  |  5.9197  |
+-----------------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  |  0.9977  |
|       AlbertForQuestionAnswering        |  4  |  0.9976  |
|           ElectraForCausalLM            | 32  |  0.9961  |
|             BartForCausalLM             |  4  |  0.9959  |
|           RobertaForCausalLM            | 64  |  0.9955  |
|       BlenderbotSmallForCausalLM        | 64  |  0.9954  |
|             BertForMaskedLM             | 64  |  0.9952  |
|       ElectraForQuestionAnswering       | 64  |  0.995   |
| BlenderbotSmallForConditionalGeneration | 64  |  0.9948  |
|      GPT2ForSequenceClassification      |  4  |  0.9945  |
|          DistilBertForMaskedLM          | 64  |  0.9944  |
|            TrOCRForCausalLM             | 32  |  0.9941  |
|           PegasusForCausalLM            | 32  |  0.9939  |
|       T5ForConditionalGeneration        |  4  |  0.9936  |
|            MBartForCausalLM             | 32  |  0.9932  |
|             OPTForCausalLM              | 32  |  0.9931  |
|            PLBartForCausalLM            | 32  |  0.9925  |
|         Speech2Text2ForCausalLM         | 128 |  0.9925  |
|      BartForConditionalGeneration       |  2  |  0.9919  |
|     DistilBertForQuestionAnswering      | 64  |  0.9915  |
|           DebertaForMaskedLM            |  4  |  0.9915  |
|     PegasusForConditionalGeneration     | 16  |  0.9914  |
|    LayoutLMForSequenceClassification    | 16  |  0.9913  |
|        BertForQuestionAnswering         | 128 |  0.9912  |
|       RobertaForQuestionAnswering       | 128 |  0.9911  |
|                 BigBird                 |  1  |  0.9911  |
|         MegatronBertForCausalLM         | 16  |  0.9903  |
|      MBartForConditionalGeneration      | 16  |  0.9879  |
|           LayoutLMForMaskedLM           | 16  |  0.9876  |
|               GoogleFnet                |  1  |  0.9849  |
|     PLBartForConditionalGeneration      | 16  |  0.9832  |
|               DistillGPT2               |  1  |  0.9832  |
|          MobileBertForMaskedLM          | 32  |  0.9793  |
|            XLNetLMHeadModel             | 32  |  0.9764  |
|          AllenaiLongformerBase          |  1  |  0.9739  |
|             XGLMForCausalLM             |  8  |  0.9707  |
|    MegatronBertForQuestionAnswering     | 16  |   0.96   |
|     M2M100ForConditionalGeneration      |  8  |  0.9463  |
|            YituTechConvBert             |  1  |  0.9448  |
|                CamemBert                |  1  |  0.9319  |
|     MobileBertForQuestionAnswering      | 64  |  0.8914  |
|       DebertaForQuestionAnswering       |  8  |  0.805   |
|                 T5Small                 |  1  |  0.7752  |
|       MT5ForConditionalGeneration       |  8  |  0.7396  |
+-----------------------------------------+-----+----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  |  1.5208  |
|        gluon_xception65         | 32  |  1.4163  |
|           resnest101e           | 64  |  1.3082  |
|          inception_v3           | 128 |  1.3007  |
|        adv_inception_v3         | 128 |  1.286   |
|       gluon_inception_v3        | 128 |  1.2831  |
|        res2net101_26w_4s        | 64  |  1.2057  |
|           volo_d1_224           | 64  |  1.2056  |
|           fbnetc_100            | 128 |  1.2011  |
|        res2net50_14w_8s         | 128 |  1.1988  |
|          ghostnet_100           | 128 |  1.1984  |
|           res2next50            | 128 |  1.198   |
|           mnasnet_100           | 128 |  1.1891  |
|          spnasnet_100           | 128 |  1.1889  |
|      mobilenetv3_large_100      | 128 |  1.1738  |
|        ese_vovnet19b_dw         | 128 |  1.1707  |
|            hrnet_w18            | 128 |  1.1678  |
|             dpn107              | 32  |  1.1573  |
|            lcnet_050            | 128 |  1.1561  |
|            repvgg_a2            | 128 |  1.1224  |
|            gernet_l             | 128 |  1.1206  |
|            fbnetv3_b            | 128 |  1.1005  |
|         visformer_small         | 128 |  1.0995  |
|           selecsls42b           | 128 |  1.0811  |
|         mobilenetv2_100         | 128 |  1.0765  |
|           regnety_002           | 128 |  1.0605  |
|          cspdarknet53           | 64  |  1.0587  |
|           tf_mixnet_l           | 128 |  1.012   |
|      xcit_large_24_p8_224       |  5  |  1.0046  |
|          gmixer_24_224          | 128 |  0.9926  |
|            mixnet_l             | 128 |  0.9656  |
|          cait_m36_384           |  4  |  0.9542  |
|             dla102              | 128 |  0.9532  |
|           dm_nfnet_f0           | 128 |  0.9113  |
|  swin_base_patch4_window7_224   | 64  |  0.8918  |
|      beit_base_patch16_224      | 64  |  0.8875  |
|          convnext_base          | 64  |  0.8669  |
|         poolformer_m36          | 64  |  0.8643  |
| deit_base_distilled_patch16_224 | 64  |  0.8496  |
|      vit_base_patch16_224       | 64  |  0.8432  |
|            nfnet_l0             | 128 |  0.8416  |
|           convit_base           | 64  |  0.8362  |
|          resmlp_12_224          | 128 |  0.8293  |
|          jx_nest_base           | 32  |  0.8262  |
|         coat_lite_mini          | 128 |  0.822   |
|           rexnet_100            | 128 |  0.8213  |
|          mixer_b16_224          | 128 |  0.8027  |
|        tnt_s_patch16_224        | 128 |  0.8001  |
|            tinynet_a            | 128 |   0.8    |
|       tf_efficientnet_b0        | 128 |  0.7965  |
|            pit_b_224            | 64  |  0.7856  |
|           mobilevit_s           | 64  |  0.7611  |
|        twins_pcpvt_base         | 64  |  0.7438  |
|         crossvit_9_240          | 128 |  0.7234  |
|          gmlp_s16_224           | 128 |  0.6081  |
|     swsl_resnext101_32x16d      | 32  |  0.0721  |
|        sebotnet33ts_256         |  0  |   0.0    |
|        convmixer_768_32         |  0  |   0.0    |
|       eca_botnext26ts_256       |  0  |   0.0    |
|          botnet26t_256          |  0  |   0.0    |
|        eca_halonext26ts         |  0  |   0.0    |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|             dpn107              | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|           convit_base           | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|            hrnet_w18            | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|         visformer_small         | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|           resnest101e           | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |     pass      |
|  swin_base_patch4_window7_224   | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|        eca_halonext26ts         | 2  |  fail_to_run  |
|        convmixer_768_32         | 2  |  fail_to_run  |
|        sebotnet33ts_256         | 2  |  fail_to_run  |
|          botnet26t_256          | 2  |  fail_to_run  |
|       eca_botnext26ts_256       | 2  |  fail_to_run  |
|          ghostnet_100           | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|     swsl_resnext101_32x16d      | 32  | 116.2647 |
|          pnasnet5large          | 16  | 34.3568  |
|            hrnet_w18            | 128 | 33.1729  |
|  swin_base_patch4_window7_224   | 64  | 30.3657  |
|          cait_m36_384           |  4  |  26.113  |
|       tf_efficientnet_b0        | 128 | 26.0213  |
|           dm_nfnet_f0           | 128 | 25.8731  |
|           tf_mixnet_l           | 128 | 25.1199  |
|        twins_pcpvt_base         | 64  | 24.4181  |
|             dla102              | 128 | 24.3209  |
|           mobilevit_s           | 64  | 24.2889  |
|         poolformer_m36          | 64  | 23.7559  |
|            mixnet_l             | 128 | 22.9678  |
|        res2net50_14w_8s         | 128 | 22.6444  |
|      xcit_large_24_p8_224       |  5  | 22.5081  |
|        tnt_s_patch16_224        | 128 |  22.374  |
|           resnest101e           | 64  | 22.0767  |
|            nfnet_l0             | 128 | 21.8411  |
|            fbnetv3_b            | 128 | 21.3069  |
|        res2net101_26w_4s        | 64  |  21.205  |
|             dpn107              | 32  | 21.1366  |
|           rexnet_100            | 128 | 20.9301  |
|          gmlp_s16_224           | 128 | 20.5459  |
|          jx_nest_base           | 32  | 19.9281  |
|          convnext_base          | 64  |  18.721  |
|           volo_d1_224           | 64  | 18.2821  |
|         crossvit_9_240          | 128 | 18.1307  |
|           convit_base           | 64  | 17.8616  |
|            tinynet_a            | 128 | 17.5486  |
|           res2next50            | 128 | 17.3809  |
|            pit_b_224            | 64  | 17.1044  |
|         coat_lite_mini          | 128 | 17.0934  |
|          cspdarknet53           | 64  |  16.315  |
|       gluon_inception_v3        | 128 |  16.027  |
|          inception_v3           | 128 | 15.9994  |
|        adv_inception_v3         | 128 | 15.9628  |
|        gluon_xception65         | 32  | 15.8657  |
|          gmixer_24_224          | 128 | 15.5962  |
|          mixer_b16_224          | 128 | 15.3517  |
|          ghostnet_100           | 128 | 15.1605  |
|         visformer_small         | 128 | 14.9027  |
|      beit_base_patch16_224      | 64  | 14.7057  |
| deit_base_distilled_patch16_224 | 64  | 14.5395  |
|         mobilenetv2_100         | 128 | 14.4137  |
|      mobilenetv3_large_100      | 128 | 14.2885  |
|      vit_base_patch16_224       | 64  | 14.2558  |
|           fbnetc_100            | 128 | 14.1446  |
|          spnasnet_100           | 128 | 13.8994  |
|            gernet_l             | 128 | 13.4121  |
|           mnasnet_100           | 128 | 12.7666  |
|           regnety_002           | 128 | 12.5997  |
|            repvgg_a2            | 128 | 12.5894  |
|        ese_vovnet19b_dw         | 128 | 12.1089  |
|           selecsls42b           | 128 | 12.0156  |
|          resmlp_12_224          | 128 | 11.5499  |
|            lcnet_050            | 128 | 10.6792  |
|          botnet26t_256          |  0  |   nan    |
|        convmixer_768_32         |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        adv_inception_v3         | 128 |  0.9973  |
|          mixer_b16_224          | 128 |  0.9973  |
|           resnest101e           | 64  |  0.997   |
|        ese_vovnet19b_dw         | 128 |  0.997   |
|          inception_v3           | 128 |  0.9969  |
|             dla102              | 128 |  0.9969  |
|           dm_nfnet_f0           | 128 |  0.9968  |
|           tf_mixnet_l           | 128 |  0.9967  |
|         coat_lite_mini          | 128 |  0.9966  |
|            mixnet_l             | 128 |  0.9964  |
| deit_base_distilled_patch16_224 | 64  |  0.9964  |
|           convit_base           | 64  |  0.9964  |
|          cspdarknet53           | 64  |  0.9963  |
|      vit_base_patch16_224       | 64  |  0.9963  |
|          gmixer_24_224          | 128 |  0.9962  |
|      beit_base_patch16_224      | 64  |  0.9962  |
|           selecsls42b           | 128 |  0.9962  |
|        gluon_xception65         | 32  |  0.9961  |
|            gernet_l             | 128 |  0.9961  |
|          resmlp_12_224          | 128 |  0.996   |
|            fbnetv3_b            | 128 |  0.996   |
|          convnext_base          | 64  |  0.9956  |
|           mobilevit_s           | 64  |  0.9954  |
|          gmlp_s16_224           | 128 |  0.9954  |
|            pit_b_224            | 64  |  0.9954  |
|       tf_efficientnet_b0        | 128 |  0.9952  |
|            tinynet_a            | 128 |  0.9949  |
|           res2next50            | 128 |  0.9947  |
|           fbnetc_100            | 128 |  0.9946  |
|           mnasnet_100           | 128 |  0.9944  |
|          spnasnet_100           | 128 |  0.9943  |
|            hrnet_w18            | 128 |  0.9943  |
|           rexnet_100            | 128 |  0.9942  |
|         visformer_small         | 128 |  0.9941  |
|        tnt_s_patch16_224        | 128 |  0.9941  |
|            repvgg_a2            | 128 |  0.9941  |
|         mobilenetv2_100         | 128 |  0.9941  |
|      mobilenetv3_large_100      | 128 |  0.994   |
|       gluon_inception_v3        | 128 |  0.9939  |
|             dpn107              | 32  |  0.9937  |
|          cait_m36_384           |  4  |  0.9935  |
|            nfnet_l0             | 128 |  0.9934  |
|        res2net50_14w_8s         | 128 |  0.9933  |
|         poolformer_m36          | 64  |  0.9931  |
|  swin_base_patch4_window7_224   | 64  |  0.9929  |
|          jx_nest_base           | 32  |  0.9925  |
|        res2net101_26w_4s        | 64  |  0.9924  |
|          ghostnet_100           | 128 |  0.9923  |
|        twins_pcpvt_base         | 64  |  0.9915  |
|         crossvit_9_240          | 128 |  0.991   |
|          pnasnet5large          | 16  |  0.9904  |
|           volo_d1_224           | 64  |  0.9903  |
|      xcit_large_24_p8_224       |  5  |  0.9901  |
|     swsl_resnext101_32x16d      | 32  |  0.9898  |
|            lcnet_050            | 128 |  0.9883  |
|           regnety_002           | 128 |  0.9879  |
|          botnet26t_256          |  0  |   nan    |
|        convmixer_768_32         |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

@chuanqi129
Copy link
Collaborator

chuanqi129 commented Nov 11, 2022

Performance Dashboard for float32 precision -- Single-core Single-thread (2022-11-09 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information

SW Nightly commit Master/Main commit
Pytorch 3b29687 a7420d2
Torchbench / 022dfe3
torchaudio 4b10b6a 74f9a89
torchtext 71e4561 c047efe
torchvision 797e1ac ffd5a56

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 20.04.5 LTS
Kernel 5.15.0-1022-aws
Microcode 0xd000331
GCC gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
GLIBC ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
Binutils GNU ld (GNU Binutils for Ubuntu) 2.34
Python Python 3.8.13
OpenSSL OpenSSL 1.1.1s 1 Nov 2022

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+-------------+-------------+-------------+
| Compiler | torchbench  | huggingface | timm_models |
+----------+-------------+-------------+-------------+
| inductor | 100%, 55/55 | 100%, 44/44 | 92%, 56/61  |
+----------+-------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.05x    |    1.00x    |    1.08x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   20.31    |    21.37    |    24.75    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.10x    |    1.29x    |    1.07x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|        shufflenet_v2_x1_0         |  1  |  1.4852  |
|           squeezenet1_1           |  1  |  1.2136  |
|           hf_Longformer           |  1  |  1.1919  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  1.1811  |
|            timm_nfnet             |  1  |  1.1635  |
|       functorch_dp_cifar10        |  1  |  1.1517  |
|             resnet18              |  1  |  1.1458  |
|           pytorch_unet            |  1  |  1.1379  |
|            timm_vovnet            |  1  |  1.1126  |
| attention_is_all_you_need_pytorch |  1  |  1.1067  |
|     detectron2_fcos_r_50_fpn      |  1  |  1.094   |
|            Super_SloMo            |  1  |  1.0921  |
|               vgg16               |  1  |  1.0901  |
|              alexnet              |  1  |  1.0825  |
|                drq                |  1  |  1.0759  |
|            timm_regnet            |  1  |  1.0654  |
|        Background_Matting         |  1  |  1.0627  |
|          LearningToPaint          |  1  |  1.0595  |
|         soft_actor_critic         | 256 |  1.0328  |
|               dcgan               |  1  |  1.032   |
|               dlrm                |  1  |  1.0286  |
|           timm_resnest            |  1  |  1.0247  |
|          pytorch_stargan          | 16  |  1.0146  |
|            densenet121            |  1  |  1.001   |
|              demucs               |  1  |  0.9994  |
|            tts_angular            |  1  |  0.998   |
|      resnet50_quantized_qat       |  1  |  0.9967  |
|            hf_BigBird             |  1  |  0.9922  |
|    mobilenet_v2_quantized_qat     |  1  |  0.9921  |
|      nvidia_deeprecommender       |  1  |  0.984   |
|             resnet152             |  1  |  0.9384  |
|        speech_transformer         |  1  |  0.9359  |
|           mobilenet_v2            |  1  |  0.9322  |
|            mnasnet1_0             |  1  |  0.9098  |
|          vision_maskrcnn          |  1  |  0.8935  |
|           BERT_pytorch            |  1  |  0.891   |
|              yolov3               |  1  |  0.8903  |
|              hf_GPT2              |  1  |  0.8772  |
|             resnet50              |  1  |  0.8629  |
|   timm_vision_transformer_large   |  1  |  0.8603  |
|             hf_Albert             |  1  |  0.8272  |
|            hf_Reformer            |  1  |  0.8229  |
|        mobilenet_v3_large         |  1  |  0.8028  |
|            hf_T5_large            |  1  |  0.7941  |
|           hf_DistilBert           |  1  |  0.7891  |
|           lennard_jones           |  1  |  0.7829  |
|           fastNLP_Bert            |  1  |  0.7693  |
|              hf_Bert              |  1  |  0.7587  |
|               hf_T5               |  1  |  0.7556  |
|           hf_GPT2_large           |  1  |  0.7493  |
|          resnext50_32x4d          |  1  |  0.7484  |
|              hf_Bart              |  1  |  0.742   |
|      timm_vision_transformer      |  1  |  0.6809  |
|            hf_T5_base             |  1  |   0.67   |
|         timm_efficientnet         |  1  |  0.5005  |
+-----------------------------------+-----+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  1  | pass_due_to_skip |
|           hf_GPT2_large           |  1  | pass_due_to_skip |
|   timm_vision_transformer_large   |  1  | pass_due_to_skip |
|              yolov3               |  1  |       pass       |
|                drq                |  1  |       pass       |
|           lennard_jones           |  1  |       pass       |
|          LearningToPaint          |  1  |       pass       |
|            Super_SloMo            |  1  |       pass       |
|              alexnet              |  1  |       pass       |
| attention_is_all_you_need_pytorch |  1  |       pass       |
|               dcgan               |  1  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  1  |       pass       |
|     detectron2_fcos_r_50_fpn      |  1  |       pass       |
|               dlrm                |  1  |       pass       |
|           fastNLP_Bert            |  1  |       pass       |
|            hf_T5_base             |  1  |       pass       |
|        Background_Matting         |  1  |       pass       |
|             hf_Albert             |  1  |       pass       |
|              hf_Bart              |  1  |       pass       |
|              hf_Bert              |  1  |       pass       |
|            hf_BigBird             |  1  |       pass       |
|           hf_DistilBert           |  1  |       pass       |
|              hf_GPT2              |  1  |       pass       |
|           hf_Longformer           |  1  |       pass       |
|            hf_Reformer            |  1  |       pass       |
|               hf_T5               |  1  |       pass       |
|       functorch_dp_cifar10        |  1  |       pass       |
|    mobilenet_v2_quantized_qat     |  1  |       pass       |
|            mnasnet1_0             |  1  |       pass       |
|               vgg16               |  1  |       pass       |
|           BERT_pytorch            |  1  |       pass       |
|          resnext50_32x4d          |  1  |       pass       |
|        mobilenet_v3_large         |  1  |       pass       |
|      nvidia_deeprecommender       |  1  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|           pytorch_unet            |  1  |       pass       |
|             resnet152             |  1  |       pass       |
|             resnet18              |  1  |       pass       |
|             resnet50              |  1  |       pass       |
|      resnet50_quantized_qat       |  1  |       pass       |
|        shufflenet_v2_x1_0         |  1  |       pass       |
|           mobilenet_v2            |  1  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|        speech_transformer         |  1  |       pass       |
|           squeezenet1_1           |  1  |       pass       |
|         timm_efficientnet         |  1  |       pass       |
|            timm_nfnet             |  1  |       pass       |
|            timm_regnet            |  1  |       pass       |
|           timm_resnest            |  1  |       pass       |
|      timm_vision_transformer      |  1  |       pass       |
|            timm_vovnet            |  1  |       pass       |
|            tts_angular            |  1  |       pass       |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|            hf_T5_base             |  1  | 81.7092  |
|          vision_maskrcnn          |  1  | 72.4651  |
|     detectron2_fcos_r_50_fpn      |  1  |  69.401  |
|           hf_GPT2_large           |  1  | 59.9626  |
|            hf_T5_large            |  1  | 54.6313  |
|            densenet121            |  1  | 33.7722  |
|              yolov3               |  1  |  28.69   |
|   timm_vision_transformer_large   |  1  | 28.2311  |
|            timm_nfnet             |  1  | 26.8436  |
|           hf_Longformer           |  1  |  26.637  |
|            Super_SloMo            |  1  | 26.4081  |
|            hf_BigBird             |  1  | 24.8659  |
|         timm_efficientnet         |  1  | 24.7895  |
|        Background_Matting         |  1  | 24.5048  |
|           BERT_pytorch            |  1  | 24.3098  |
|             resnet152             |  1  | 21.2911  |
|            timm_regnet            |  1  |  20.07   |
|        speech_transformer         |  1  | 19.8299  |
|               hf_T5               |  1  | 19.5154  |
|        mobilenet_v3_large         |  1  | 19.4807  |
|            hf_Reformer            |  1  | 18.6868  |
|              hf_Bart              |  1  | 18.0463  |
|           pytorch_unet            |  1  | 17.9894  |
|           fastNLP_Bert            |  1  | 17.0308  |
|          resnext50_32x4d          |  1  | 16.6334  |
|           mobilenet_v2            |  1  | 16.5489  |
|             resnet50              |  1  | 16.2093  |
|              hf_GPT2              |  1  | 16.1715  |
|              hf_Bert              |  1  |  15.522  |
|            mnasnet1_0             |  1  | 14.8517  |
|           timm_resnest            |  1  | 14.7064  |
|            timm_vovnet            |  1  | 14.6828  |
|             hf_Albert             |  1  |  14.343  |
|        shufflenet_v2_x1_0         |  1  | 14.1955  |
| attention_is_all_you_need_pytorch |  1  | 14.1544  |
|       functorch_dp_cifar10        |  1  | 13.9881  |
|      timm_vision_transformer      |  1  | 13.7393  |
|           hf_DistilBert           |  1  | 13.1191  |
|          pytorch_stargan          | 16  | 12.5482  |
|   pytorch_CycleGAN_and_pix2pix    |  1  | 11.5582  |
|           squeezenet1_1           |  1  | 11.2071  |
|          LearningToPaint          |  1  | 10.6182  |
|                drq                |  1  |  9.9081  |
|               vgg16               |  1  |  9.7425  |
|             resnet18              |  1  |  9.7095  |
|               dlrm                |  1  |  9.5449  |
|      nvidia_deeprecommender       |  1  |  9.441   |
|              alexnet              |  1  |   9.03   |
|            tts_angular            |  1  |  8.247   |
|         soft_actor_critic         | 256 |  8.2421  |
|           lennard_jones           |  1  |  7.8457  |
|              demucs               |  1  |  1.1898  |
|               dcgan               |  1  |  0.1695  |
|    mobilenet_v2_quantized_qat     |  1  |  0.0748  |
|      resnet50_quantized_qat       |  1  |  0.0579  |
+-----------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|           pytorch_unet            |  1  |  2.1274  |
|             hf_Albert             |  1  |  2.0308  |
|            hf_BigBird             |  1  |  1.7822  |
|            Super_SloMo            |  1  |  1.7473  |
|            hf_T5_large            |  1  |  1.4919  |
|          vision_maskrcnn          |  1  |  1.4585  |
|        speech_transformer         |  1  |  1.4474  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  1.4467  |
|           fastNLP_Bert            |  1  |  1.359   |
|            hf_T5_base             |  1  |  1.3165  |
|               hf_T5               |  1  |  1.2507  |
|            timm_nfnet             |  1  |  1.1479  |
|            timm_regnet            |  1  |  1.1406  |
|   timm_vision_transformer_large   |  1  |  1.1151  |
|         soft_actor_critic         | 256 |  1.1148  |
|              hf_GPT2              |  1  |  1.1111  |
|         timm_efficientnet         |  1  |  1.0556  |
|            timm_vovnet            |  1  |  1.0511  |
|           mobilenet_v2            |  1  |  1.0508  |
|      timm_vision_transformer      |  1  |  1.0422  |
|              yolov3               |  1  |  1.0148  |
|           hf_GPT2_large           |  1  |  1.0104  |
|               dlrm                |  1  |  1.0008  |
|              demucs               |  1  |  0.9986  |
|      nvidia_deeprecommender       |  1  |  0.9973  |
|      resnet50_quantized_qat       |  1  |  0.9953  |
|        Background_Matting         |  1  |  0.9952  |
|            tts_angular            |  1  |  0.9931  |
|    mobilenet_v2_quantized_qat     |  1  |  0.9927  |
|          pytorch_stargan          | 16  |  0.9886  |
|           lennard_jones           |  1  |  0.9833  |
|        mobilenet_v3_large         |  1  |  0.9811  |
|            mnasnet1_0             |  1  |  0.9796  |
|     detectron2_fcos_r_50_fpn      |  1  |  0.9783  |
|                drq                |  1  |  0.9753  |
|               vgg16               |  1  |  0.9738  |
| attention_is_all_you_need_pytorch |  1  |  0.9696  |
|           squeezenet1_1           |  1  |  0.9683  |
|              alexnet              |  1  |  0.9671  |
|           timm_resnest            |  1  |  0.9604  |
|        shufflenet_v2_x1_0         |  1  |  0.9602  |
|           hf_Longformer           |  1  |  0.9518  |
|       functorch_dp_cifar10        |  1  |  0.9511  |
|               dcgan               |  1  |  0.9463  |
|          LearningToPaint          |  1  |  0.9427  |
|             resnet50              |  1  |  0.9418  |
|              hf_Bart              |  1  |  0.9408  |
|           BERT_pytorch            |  1  |  0.9357  |
|              hf_Bert              |  1  |  0.9295  |
|             resnet18              |  1  |  0.9199  |
|             resnet152             |  1  |  0.9161  |
|           hf_DistilBert           |  1  |  0.8963  |
|          resnext50_32x4d          |  1  |  0.8052  |
|            hf_Reformer            |  1  |  0.7742  |
|            densenet121            |  1  |  0.7664  |
+-----------------------------------+-----+----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            XLNetLMHeadModel             | 1  |  1.1042  |
|     MobileBertForQuestionAnswering      | 1  |  1.0831  |
|       AlbertForQuestionAnswering        | 1  |  0.9533  |
|            AlbertForMaskedLM            | 1  |  0.9478  |
|          MobileBertForMaskedLM          | 1  |  0.9301  |
|     M2M100ForConditionalGeneration      | 1  |  0.8985  |
|                 BigBird                 | 1  |  0.8978  |
|            YituTechConvBert             | 1  |  0.8934  |
|             OPTForCausalLM              | 1  |  0.8727  |
|       DebertaForQuestionAnswering       | 1  |  0.8613  |
|               GoogleFnet                | 1  |  0.8548  |
|    MegatronBertForQuestionAnswering     | 1  |  0.8545  |
|         MegatronBertForCausalLM         | 1  |  0.8449  |
|      MBartForConditionalGeneration      | 1  |  0.8443  |
|     PegasusForConditionalGeneration     | 1  |  0.843   |
|     DistilBertForQuestionAnswering      | 1  |  0.8402  |
|        BertForQuestionAnswering         | 1  |  0.8328  |
|            TrOCRForCausalLM             | 1  |  0.8311  |
|            MBartForCausalLM             | 1  |  0.8303  |
|           PegasusForCausalLM            | 1  |  0.8296  |
|       RobertaForQuestionAnswering       | 1  |  0.8235  |
|          AllenaiLongformerBase          | 1  |  0.8232  |
|           DebertaForMaskedLM            | 1  |  0.8209  |
|             XGLMForCausalLM             | 1  |  0.8092  |
|         Speech2Text2ForCausalLM         | 1  |  0.8035  |
|     PLBartForConditionalGeneration      | 1  |  0.8028  |
|             BertForMaskedLM             | 1  |  0.7952  |
|           RobertaForCausalLM            | 1  |  0.7938  |
|          DistilBertForMaskedLM          | 1  |  0.7847  |
|            PLBartForCausalLM            | 1  |  0.7766  |
|               DistillGPT2               | 1  |  0.7334  |
|           LayoutLMForMaskedLM           | 1  |  0.7261  |
|    LayoutLMForSequenceClassification    | 1  |  0.7245  |
| BlenderbotSmallForConditionalGeneration | 1  |  0.7184  |
|                CamemBert                | 1  |  0.7103  |
|       BlenderbotSmallForCausalLM        | 1  |  0.6951  |
|       MT5ForConditionalGeneration       | 1  |  0.6884  |
|             BartForCausalLM             | 1  |  0.6786  |
|      GPT2ForSequenceClassification      | 1  |  0.6739  |
|      BartForConditionalGeneration       | 1  |  0.6451  |
|       T5ForConditionalGeneration        | 1  |  0.632   |
|                 T5Small                 | 1  |  0.6313  |
|       ElectraForQuestionAnswering       | 1  |  0.5732  |
|           ElectraForCausalLM            | 1  |  0.5332  |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|      BartForConditionalGeneration       | 1  | 48.1056  |
|       AlbertForQuestionAnswering        | 1  | 34.3286  |
|            AlbertForMaskedLM            | 1  | 33.2098  |
|          AllenaiLongformerBase          | 1  | 32.9485  |
|       MT5ForConditionalGeneration       | 1  | 32.3667  |
|     M2M100ForConditionalGeneration      | 1  | 26.2587  |
|          MobileBertForMaskedLM          | 1  | 26.1128  |
|                 BigBird                 | 1  | 25.7282  |
|             BartForCausalLM             | 1  | 25.6498  |
|     MobileBertForQuestionAnswering      | 1  | 25.6126  |
|     PegasusForConditionalGeneration     | 1  | 25.5287  |
|       T5ForConditionalGeneration        | 1  | 24.7543  |
|           DebertaForMaskedLM            | 1  | 24.7295  |
|            XLNetLMHeadModel             | 1  |  24.657  |
|      MBartForConditionalGeneration      | 1  | 24.5472  |
|                 T5Small                 | 1  | 24.3507  |
|             XGLMForCausalLM             | 1  | 23.8144  |
|            TrOCRForCausalLM             | 1  |  23.758  |
|       DebertaForQuestionAnswering       | 1  | 23.4078  |
|          DistilBertForMaskedLM          | 1  | 22.5849  |
|      GPT2ForSequenceClassification      | 1  | 22.0168  |
|         MegatronBertForCausalLM         | 1  | 21.8418  |
|    MegatronBertForQuestionAnswering     | 1  | 21.4816  |
|            YituTechConvBert             | 1  | 19.0803  |
|           PegasusForCausalLM            | 1  | 17.7898  |
| BlenderbotSmallForConditionalGeneration | 1  | 17.4371  |
|             BertForMaskedLM             | 1  | 17.1292  |
|                CamemBert                | 1  | 16.7218  |
|     PLBartForConditionalGeneration      | 1  | 16.6943  |
|            MBartForCausalLM             | 1  | 16.6502  |
|           LayoutLMForMaskedLM           | 1  | 16.5857  |
|        BertForQuestionAnswering         | 1  | 16.3166  |
|    LayoutLMForSequenceClassification    | 1  | 16.0688  |
|             OPTForCausalLM              | 1  | 15.9214  |
|               DistillGPT2               | 1  | 15.0388  |
|           RobertaForCausalLM            | 1  | 14.5262  |
|           ElectraForCausalLM            | 1  | 14.4963  |
|       RobertaForQuestionAnswering       | 1  | 14.2469  |
|       ElectraForQuestionAnswering       | 1  | 13.9396  |
|               GoogleFnet                | 1  | 13.6451  |
|       BlenderbotSmallForCausalLM        | 1  | 12.9301  |
|         Speech2Text2ForCausalLM         | 1  | 12.7357  |
|            PLBartForCausalLM            | 1  | 12.5302  |
|     DistilBertForQuestionAnswering      | 1  | 11.8878  |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |  2.6371  |
|       AlbertForQuestionAnswering        | 1  |  2.5568  |
|      BartForConditionalGeneration       | 1  |  2.1208  |
|                 T5Small                 | 1  |  1.9472  |
|       T5ForConditionalGeneration        | 1  |  1.8685  |
|                 BigBird                 | 1  |  1.8542  |
|      GPT2ForSequenceClassification      | 1  |  1.8363  |
|             BartForCausalLM             | 1  |  1.5123  |
|          AllenaiLongformerBase          | 1  |  1.4905  |
|            XLNetLMHeadModel             | 1  |  1.3591  |
|       DebertaForQuestionAnswering       | 1  |  1.3508  |
|                CamemBert                | 1  |  1.3349  |
|           LayoutLMForMaskedLM           | 1  |  1.3255  |
|            YituTechConvBert             | 1  |  1.3106  |
|           DebertaForMaskedLM            | 1  |  1.3088  |
|               GoogleFnet                | 1  |  1.2897  |
|    LayoutLMForSequenceClassification    | 1  |  1.2841  |
|           ElectraForCausalLM            | 1  |  1.264   |
|               DistillGPT2               | 1  |   1.19   |
|       ElectraForQuestionAnswering       | 1  |  1.1781  |
|     M2M100ForConditionalGeneration      | 1  |  1.0959  |
| BlenderbotSmallForConditionalGeneration | 1  |  1.0836  |
|     PegasusForConditionalGeneration     | 1  |  1.0813  |
|      MBartForConditionalGeneration      | 1  |  1.0707  |
|       MT5ForConditionalGeneration       | 1  |  1.0639  |
|             BertForMaskedLM             | 1  |  1.0574  |
|         MegatronBertForCausalLM         | 1  |  1.0557  |
|    MegatronBertForQuestionAnswering     | 1  |  1.0497  |
|           RobertaForCausalLM            | 1  |  1.0444  |
|       BlenderbotSmallForCausalLM        | 1  |  1.0425  |
|            PLBartForCausalLM            | 1  |  1.0422  |
|            TrOCRForCausalLM             | 1  |  1.042   |
|             XGLMForCausalLM             | 1  |  1.0411  |
|             OPTForCausalLM              | 1  |  1.0408  |
|        BertForQuestionAnswering         | 1  |  1.038   |
|     PLBartForConditionalGeneration      | 1  |  1.0358  |
|          DistilBertForMaskedLM          | 1  |  1.0342  |
|       RobertaForQuestionAnswering       | 1  |  1.0332  |
|            MBartForCausalLM             | 1  |  1.0272  |
|     DistilBertForQuestionAnswering      | 1  |  1.0157  |
|           PegasusForCausalLM            | 1  |  1.0102  |
|          MobileBertForMaskedLM          | 1  |  0.9923  |
|         Speech2Text2ForCausalLM         | 1  |  0.9793  |
|     MobileBertForQuestionAnswering      | 1  |  0.9747  |
+-----------------------------------------+----+----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  |  1.5448  |
|           regnety_002           | 1  |  1.3632  |
|            lcnet_050            | 1  |  1.3565  |
|        ese_vovnet19b_dw         | 1  |  1.3444  |
|          spnasnet_100           | 1  |  1.3401  |
|          inception_v3           | 1  |  1.3239  |
|       gluon_inception_v3        | 1  |  1.3096  |
|        adv_inception_v3         | 1  |  1.3006  |
|           mnasnet_100           | 1  |  1.2928  |
|           fbnetc_100            | 1  |  1.2682  |
|          gmixer_24_224          | 1  |  1.2677  |
|         mobilenetv2_100         | 1  |  1.2517  |
|            fbnetv3_b            | 1  |  1.1728  |
|      mobilenetv3_large_100      | 1  |  1.1453  |
|             dpn107              | 1  |  1.0835  |
|            gernet_l             | 1  |  1.0821  |
|          cspdarknet53           | 1  |  1.0688  |
|        gluon_xception65         | 1  |  1.0532  |
|            repvgg_a2            | 1  |  1.0493  |
|            hrnet_w18            | 1  |  1.0471  |
|          ghostnet_100           | 1  |  0.9966  |
|            nfnet_l0             | 1  |  0.9933  |
|           dm_nfnet_f0           | 1  |  0.9885  |
|           resnest101e           | 1  |  0.9787  |
|           selecsls42b           | 1  |  0.9235  |
|         visformer_small         | 1  |  0.9056  |
|        res2net101_26w_4s        | 1  |  0.8988  |
|      xcit_large_24_p8_224       | 1  |  0.8934  |
|          convnext_base          | 1  |  0.8669  |
|        res2net50_14w_8s         | 1  |  0.8607  |
|      beit_base_patch16_224      | 1  |  0.8547  |
|  swin_base_patch4_window7_224   | 1  |  0.8027  |
|           res2next50            | 1  |   0.79   |
| deit_base_distilled_patch16_224 | 1  |  0.7887  |
|           volo_d1_224           | 1  |  0.7649  |
|         poolformer_m36          | 1  |  0.7614  |
|           convit_base           | 1  |  0.7574  |
|      vit_base_patch16_224       | 1  |  0.755   |
|          cait_m36_384           | 1  |  0.7391  |
|           tf_mixnet_l           | 1  |  0.7347  |
|          mixer_b16_224          | 1  |  0.7273  |
|             dla102              | 1  |  0.7187  |
|            mixnet_l             | 1  |  0.7127  |
|          resmlp_12_224          | 1  |  0.6882  |
|            pit_b_224            | 1  |  0.6741  |
|        twins_pcpvt_base         | 1  |  0.6635  |
|        tnt_s_patch16_224        | 1  |  0.6476  |
|          jx_nest_base           | 1  |  0.6457  |
|         coat_lite_mini          | 1  |  0.6293  |
|           rexnet_100            | 1  |  0.6235  |
|         crossvit_9_240          | 1  |  0.6016  |
|           mobilevit_s           | 1  |  0.5561  |
|            tinynet_a            | 1  |   0.5    |
|       tf_efficientnet_b0        | 1  |  0.4917  |
|          gmlp_s16_224           | 1  |  0.4346  |
|     swsl_resnext101_32x16d      | 1  |  0.0673  |
|       eca_botnext26ts_256       | 0  |   0.0    |
|        sebotnet33ts_256         | 0  |   0.0    |
|          botnet26t_256          | 0  |   0.0    |
|        eca_halonext26ts         | 0  |   0.0    |
|        convmixer_768_32         | 0  |   0.0    |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|         coat_lite_mini          | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|             dla102              | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|           volo_d1_224           | 1  |     pass      |
|      xcit_large_24_p8_224       | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|        gluon_xception65         | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|            hrnet_w18            | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|           res2next50            | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|          pnasnet5large          | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|         visformer_small         | 1  |     pass      |
|        res2net50_14w_8s         | 1  |     pass      |
|        res2net101_26w_4s        | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|           resnest101e           | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|     swsl_resnext101_32x16d      | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|           selecsls42b           | 1  |     pass      |
|           rexnet_100            | 1  |     pass      |
|        eca_halonext26ts         | 1  |  fail_to_run  |
|        convmixer_768_32         | 1  |  fail_to_run  |
|        sebotnet33ts_256         | 1  |  fail_to_run  |
|          botnet26t_256          | 1  |  fail_to_run  |
|       eca_botnext26ts_256       | 1  |  fail_to_run  |
|          ghostnet_100           | 1  | fail_accuracy |
|          cait_m36_384           | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|     swsl_resnext101_32x16d      | 1  | 87.5279  |
|          pnasnet5large          | 1  | 46.3015  |
|          cait_m36_384           | 1  | 36.1308  |
|           tf_mixnet_l           | 1  |  35.475  |
|            hrnet_w18            | 1  | 33.5026  |
|  swin_base_patch4_window7_224   | 1  | 33.1088  |
|        twins_pcpvt_base         | 1  | 33.0708  |
|           rexnet_100            | 1  | 32.9664  |
|           mobilevit_s           | 1  | 32.2259  |
|            mixnet_l             | 1  | 31.5505  |
|      xcit_large_24_p8_224       | 1  |  30.044  |
|           resnest101e           | 1  | 29.2334  |
|            fbnetv3_b            | 1  | 29.1483  |
|        adv_inception_v3         | 1  | 29.0574  |
|             dpn107              | 1  | 28.7993  |
|        res2net50_14w_8s         | 1  | 28.4829  |
|           dm_nfnet_f0           | 1  | 27.8804  |
|          ghostnet_100           | 1  |  27.737  |
|         poolformer_m36          | 1  | 26.9426  |
|       tf_efficientnet_b0        | 1  | 26.5395  |
|            tinynet_a            | 1  | 26.4855  |
|        res2net101_26w_4s        | 1  | 25.9801  |
|          jx_nest_base           | 1  | 24.9393  |
|         coat_lite_mini          | 1  | 24.2566  |
|            nfnet_l0             | 1  | 24.0727  |
|      mobilenetv3_large_100      | 1  |  23.634  |
|        tnt_s_patch16_224        | 1  | 23.2213  |
|           fbnetc_100            | 1  | 23.1681  |
|             dla102              | 1  | 23.0499  |
|           volo_d1_224           | 1  | 22.8963  |
|         crossvit_9_240          | 1  |  22.406  |
|          convnext_base          | 1  | 21.0074  |
|          cspdarknet53           | 1  | 20.9592  |
|           res2next50            | 1  | 20.4985  |
|       gluon_inception_v3        | 1  | 20.1694  |
|            pit_b_224            | 1  | 20.0287  |
|          spnasnet_100           | 1  | 19.9673  |
|          inception_v3           | 1  | 19.8218  |
|          gmlp_s16_224           | 1  | 19.3607  |
|           regnety_002           | 1  | 18.9754  |
|         mobilenetv2_100         | 1  | 18.4704  |
|        gluon_xception65         | 1  | 18.4543  |
|           mnasnet_100           | 1  | 18.4216  |
|           convit_base           | 1  | 18.2287  |
|         visformer_small         | 1  | 17.5961  |
|            gernet_l             | 1  | 16.5629  |
|          gmixer_24_224          | 1  | 16.3941  |
|           selecsls42b           | 1  | 16.1093  |
|        ese_vovnet19b_dw         | 1  | 15.1091  |
|          mixer_b16_224          | 1  | 15.0007  |
|            repvgg_a2            | 1  | 14.8364  |
| deit_base_distilled_patch16_224 | 1  | 14.7708  |
|      vit_base_patch16_224       | 1  | 14.6892  |
|      beit_base_patch16_224      | 1  | 14.5156  |
|            lcnet_050            | 1  | 14.3073  |
|          resmlp_12_224          | 1  | 11.9188  |
|          botnet26t_256          | 0  |   nan    |
|        convmixer_768_32         | 0  |   nan    |
|       eca_botnext26ts_256       | 0  |   nan    |
|        eca_halonext26ts         | 0  |   nan    |
|        sebotnet33ts_256         | 0  |   nan    |
+---------------------------------+----+----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|      xcit_large_24_p8_224       | 1  |  2.2401  |
|          cait_m36_384           | 1  |  2.165   |
|          pnasnet5large          | 1  |  1.3583  |
|           dm_nfnet_f0           | 1  |  1.2227  |
|            nfnet_l0             | 1  |  1.2121  |
|           mobilevit_s           | 1  |  1.2001  |
|          jx_nest_base           | 1  |  1.1911  |
|             dpn107              | 1  |  1.1863  |
|          convnext_base          | 1  |  1.1528  |
|           convit_base           | 1  |  1.1385  |
|          cspdarknet53           | 1  |  1.1357  |
|         poolformer_m36          | 1  |  1.1085  |
|          gmixer_24_224          | 1  |  1.1051  |
|  swin_base_patch4_window7_224   | 1  |  1.1047  |
|      beit_base_patch16_224      | 1  |  1.0966  |
|           volo_d1_224           | 1  |  1.0917  |
| deit_base_distilled_patch16_224 | 1  |  1.0801  |
|      vit_base_patch16_224       | 1  |   1.08   |
|        tnt_s_patch16_224        | 1  |  1.0799  |
|          mixer_b16_224          | 1  |  1.0769  |
|         mobilenetv2_100         | 1  |  1.0659  |
|        twins_pcpvt_base         | 1  |  1.0643  |
|           tf_mixnet_l           | 1  |  1.0592  |
|        ese_vovnet19b_dw         | 1  |  1.0515  |
|       tf_efficientnet_b0        | 1  |  1.0506  |
|           rexnet_100            | 1  |  1.0491  |
|         coat_lite_mini          | 1  |  1.0484  |
|          resmlp_12_224          | 1  |  1.0453  |
|            mixnet_l             | 1  |  1.0396  |
|            fbnetv3_b            | 1  |  1.0266  |
|            gernet_l             | 1  |  1.0235  |
|            tinynet_a            | 1  |  1.0175  |
|          spnasnet_100           | 1  |  1.0163  |
|           fbnetc_100            | 1  |  1.011   |
|           mnasnet_100           | 1  |  1.0069  |
|            repvgg_a2            | 1  |  1.0065  |
|           resnest101e           | 1  |  0.9914  |
|            pit_b_224            | 1  |  0.9784  |
|      mobilenetv3_large_100      | 1  |  0.9722  |
|            lcnet_050            | 1  |  0.9651  |
|            hrnet_w18            | 1  |  0.9589  |
|           res2next50            | 1  |  0.9575  |
|           regnety_002           | 1  |  0.9549  |
|         visformer_small         | 1  |  0.9525  |
|       gluon_inception_v3        | 1  |  0.9516  |
|        adv_inception_v3         | 1  |  0.9507  |
|          inception_v3           | 1  |  0.9499  |
|             dla102              | 1  |  0.9464  |
|          ghostnet_100           | 1  |  0.9441  |
|        res2net50_14w_8s         | 1  |  0.941   |
|         crossvit_9_240          | 1  |  0.8815  |
|          gmlp_s16_224           | 1  |   0.88   |
|     swsl_resnext101_32x16d      | 1  |  0.8433  |
|           selecsls42b           | 1  |  0.8197  |
|        gluon_xception65         | 1  |  0.8117  |
|        res2net101_26w_4s        | 1  |  0.7678  |
|          botnet26t_256          | 0  |   nan    |
|        convmixer_768_32         | 0  |   nan    |
|       eca_botnext26ts_256       | 0  |   nan    |
|        eca_halonext26ts         | 0  |   nan    |
|        sebotnet33ts_256         | 0  |   nan    |
+---------------------------------+----+----------+

@ESI-SYD
Copy link

ESI-SYD commented Nov 16, 2022

Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2022-11-13 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information

SW Nightly commit Master/Main commit
Pytorch 637228b 46796fe
Torchbench / 022dfe3
torchaudio 4b10b6a 74f9a89
torchtext 71e4561 c047efe
torchvision 797e1ac ffd5a56

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 20.04.5 LTS
Kernel 5.15.0-1022-aws
Microcode 0xd000331
GCC gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
GLIBC ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
Binutils GNU ld (GNU Binutils for Ubuntu) 2.34
Python Python 3.8.13
OpenSSL OpenSSL 1.1.1s 1 Nov 2022

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 89%, 49/55 | 100%, 44/44 | 89%, 54/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.04x    |    1.07x    |    1.07x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   13.34    |    17.64    |    20.14    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.92x    |    0.96x    |    0.98x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|        shufflenet_v2_x1_0         |  64  |  1.3553  |
|            densenet121            |  64  |  1.2508  |
|           mobilenet_v2            |  16  |  1.2493  |
|             resnet18              |  8   |  1.1779  |
|           squeezenet1_1           |  16  |  1.1674  |
|              alexnet              | 128  |  1.1589  |
|           pytorch_unet            |  1   |  1.1328  |
|            timm_vovnet            |  32  |  1.1134  |
|         soft_actor_critic         | 256  |  1.1091  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0987  |
|                drq                |  1   |  1.0977  |
|          vision_maskrcnn          |  1   |  1.0803  |
|               vgg16               |  4   |  1.0726  |
|            Super_SloMo            |  6   |  1.0633  |
|               dlrm                | 2048 |  1.0587  |
|            hf_T5_large            |  1   |  1.0521  |
|           BERT_pytorch            |  2   |  1.0512  |
|            timm_regnet            |  32  |  1.0416  |
|          pytorch_stargan          |  16  |  1.0272  |
|            hf_Reformer            |  1   |  1.0193  |
|        Background_Matting         |  1   |  1.008   |
|     detectron2_fcos_r_50_fpn      |  1   |  1.0036  |
|            hf_BigBird             |  1   |  1.0027  |
|              demucs               |  1   |  1.0002  |
|            tts_angular            |  64  |  0.997   |
|    mobilenet_v2_quantized_qat     |  96  |  0.9965  |
|               dcgan               | 256  |  0.9957  |
|          LearningToPaint          |  96  |  0.9906  |
|      resnet50_quantized_qat       |  32  |  0.9897  |
|               hf_T5               |  1   |  0.9732  |
|             resnet152             |  32  |  0.9557  |
|           hf_Longformer           |  1   |  0.9471  |
|           hf_GPT2_large           |  1   |  0.926   |
|             resnet50              |  32  |  0.924   |
|        speech_transformer         |  1   |  0.9203  |
|   timm_vision_transformer_large   |  8   |  0.9179  |
|            timm_nfnet             | 128  |  0.913   |
|              hf_GPT2              |  1   |  0.9043  |
|           hf_DistilBert           |  1   |  0.8967  |
|             hf_Albert             |  1   |  0.8903  |
|              hf_Bert              |  1   |  0.8775  |
|           fastNLP_Bert            |  1   |  0.8732  |
|              hf_Bart              |  1   |  0.8687  |
|      nvidia_deeprecommender       | 256  |  0.8346  |
|            hf_T5_base             |  1   |  0.8332  |
|      timm_vision_transformer      |  8   |  0.7683  |
|         timm_efficientnet         |  64  |  0.7536  |
| attention_is_all_you_need_pytorch |  32  |  0.7283  |
|          resnext50_32x4d          |  8   |  0.6504  |
|       functorch_dp_cifar10        |  64  |  0.5466  |
|           lennard_jones           | 1000 |  0.3455  |
|            mnasnet1_0             |  0   |   0.0    |
|           timm_resnest            |  0   |   0.0    |
|        mobilenet_v3_large         |  0   |   0.0    |
|              yolov3               |  0   |   0.0    |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  2  | pass_due_to_skip |
|           hf_GPT2_large           |  2  | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip |
|      timm_vision_transformer      |  2  |       pass       |
|           hf_Longformer           |  2  |       pass       |
|            hf_Reformer            |  2  |       pass       |
|          LearningToPaint          |  2  |       pass       |
|            Super_SloMo            |  2  |       pass       |
|              alexnet              |  2  |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |
|               dcgan               |  2  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  2  |       pass       |
|               vgg16               |  2  |       pass       |
|               dlrm                |  2  |       pass       |
|                drq                |  1  |       pass       |
|            hf_T5_base             |  2  |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |
|             hf_Albert             |  2  |       pass       |
|              hf_Bart              |  2  |       pass       |
|              hf_Bert              |  2  |       pass       |
|            hf_BigBird             |  2  |       pass       |
|           hf_DistilBert           |  2  |       pass       |
|              hf_GPT2              |  2  |       pass       |
|           fastNLP_Bert            |  2  |       pass       |
|               hf_T5               |  2  |       pass       |
|             resnet18              |  2  |       pass       |
|           BERT_pytorch            |  2  |       pass       |
|           lennard_jones           |  2  |       pass       |
|        Background_Matting         |  1  |       pass       |
|           mobilenet_v2            |  2  |       pass       |
|    mobilenet_v2_quantized_qat     |  2  |       pass       |
|            tts_angular            |  2  |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|            timm_vovnet            |  2  |       pass       |
|             resnet152             |  2  |       pass       |
|           pytorch_unet            |  2  |       pass       |
|             resnet50              |  2  |       pass       |
|      resnet50_quantized_qat       |  2  |       pass       |
|          resnext50_32x4d          |  2  |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|        speech_transformer         |  2  |       pass       |
|           squeezenet1_1           |  2  |       pass       |
|         timm_efficientnet         |  2  |       pass       |
|            timm_nfnet             |  2  |       pass       |
|            timm_regnet            |  2  |       pass       |
|            mnasnet1_0             |  2  |   fail_to_run    |
|           timm_resnest            |  2  |   fail_to_run    |
|        mobilenet_v3_large         |  2  |   fail_to_run    |
|     detectron2_fcos_r_50_fpn      |  2  |   fail_to_run    |
|              yolov3               |  2  |   fail_to_run    |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|     detectron2_fcos_r_50_fpn      |  1   |  68.917  |
|          vision_maskrcnn          |  1   | 39.5334  |
|            hf_T5_large            |  1   | 37.3845  |
|   timm_vision_transformer_large   |  8   | 26.3515  |
|           hf_GPT2_large           |  1   | 25.2843  |
|            timm_nfnet             | 128  | 24.0902  |
|            hf_T5_base             |  1   | 23.5965  |
|           BERT_pytorch            |  2   | 22.9939  |
|            hf_BigBird             |  1   | 22.8551  |
|            densenet121            |  64  | 22.5694  |
|           hf_Longformer           |  1   | 22.2104  |
|             resnet152             |  32  | 16.5343  |
|            Super_SloMo            |  6   | 16.2681  |
|         timm_efficientnet         |  64  | 15.9977  |
|            timm_regnet            |  32  | 15.9502  |
|        speech_transformer         |  1   | 15.9089  |
|              hf_Bart              |  1   | 14.1855  |
|            hf_Reformer            |  1   | 13.6939  |
|           fastNLP_Bert            |  1   | 13.5781  |
| attention_is_all_you_need_pytorch |  32  | 13.3312  |
|               hf_T5               |  1   | 13.1629  |
|              hf_Bert              |  1   | 12.5994  |
|              hf_GPT2              |  1   |  12.262  |
|        Background_Matting         |  1   | 11.7714  |
|      timm_vision_transformer      |  8   | 11.7588  |
|            timm_vovnet            |  32  |  11.651  |
|             resnet50              |  32  | 11.5667  |
|             hf_Albert             |  1   | 11.4757  |
|        shufflenet_v2_x1_0         |  64  | 11.2385  |
|          resnext50_32x4d          |  8   | 10.9301  |
|       functorch_dp_cifar10        |  64  | 10.3097  |
|           hf_DistilBert           |  1   | 10.1671  |
|           mobilenet_v2            |  16  | 10.1208  |
|          pytorch_stargan          |  16  |  9.2953  |
|          LearningToPaint          |  96  |  9.1694  |
|           pytorch_unet            |  1   |  9.1086  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  8.9421  |
|             resnet18              |  8   |  8.7708  |
|           squeezenet1_1           |  16  |  8.6944  |
|               vgg16               |  4   |  8.6095  |
|      nvidia_deeprecommender       | 256  |  8.5305  |
|               dlrm                | 2048 |  8.3915  |
|              alexnet              | 128  |  8.2392  |
|            tts_angular            |  64  |  8.1306  |
|                drq                |  1   |   7.92   |
|         soft_actor_critic         | 256  |  7.7621  |
|               dcgan               | 256  |  7.7376  |
|           lennard_jones           | 1000 |  7.7336  |
|              demucs               |  1   |  1.2338  |
|    mobilenet_v2_quantized_qat     |  96  |  0.1009  |
|      resnet50_quantized_qat       |  32  | -0.1167  |
|            mnasnet1_0             |  0   |   nan    |
|        mobilenet_v3_large         |  0   |   nan    |
|           timm_resnest            |  0   |   nan    |
|              yolov3               |  0   |   nan    |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|              demucs               |  1   |  0.9985  |
|               dlrm                | 2048 |  0.9969  |
|      resnet50_quantized_qat       |  32  |  0.9959  |
|        Background_Matting         |  1   |  0.9941  |
|               vgg16               |  4   |  0.9921  |
|            Super_SloMo            |  6   |  0.9918  |
|          LearningToPaint          |  96  |  0.9907  |
|              alexnet              | 128  |  0.9893  |
|            hf_BigBird             |  1   |  0.9883  |
|            densenet121            |  64  |  0.9875  |
|         timm_efficientnet         |  64  |  0.987   |
|            timm_vovnet            |  32  |  0.9867  |
|             resnet50              |  32  |  0.9863  |
|           mobilenet_v2            |  16  |  0.985   |
|           lennard_jones           | 1000 |  0.9844  |
|           pytorch_unet            |  1   |  0.9842  |
|            timm_regnet            |  32  |  0.9836  |
|            timm_nfnet             | 128  |  0.9827  |
|         soft_actor_critic         | 256  |  0.9812  |
|        shufflenet_v2_x1_0         |  64  |  0.9801  |
|            tts_angular            |  64  |  0.9785  |
|                drq                |  1   |  0.9777  |
|     detectron2_fcos_r_50_fpn      |  1   |  0.9769  |
|          resnext50_32x4d          |  8   |  0.976   |
|    mobilenet_v2_quantized_qat     |  96  |  0.9748  |
|          vision_maskrcnn          |  1   |  0.973   |
|   timm_vision_transformer_large   |  8   |  0.968   |
|           squeezenet1_1           |  16  |  0.9649  |
|          pytorch_stargan          |  16  |  0.9585  |
|           hf_GPT2_large           |  1   |  0.9569  |
|           hf_DistilBert           |  1   |  0.9565  |
|        speech_transformer         |  1   |  0.9477  |
|              hf_GPT2              |  1   |  0.9454  |
|              hf_Bart              |  1   |  0.9411  |
|           BERT_pytorch            |  2   |  0.935   |
|               dcgan               | 256  |  0.9314  |
|            hf_T5_base             |  1   |  0.9311  |
| attention_is_all_you_need_pytorch |  32  |  0.9264  |
|            hf_Reformer            |  1   |  0.9182  |
|             hf_Albert             |  1   |  0.8971  |
|           fastNLP_Bert            |  1   |  0.8945  |
|      timm_vision_transformer      |  8   |  0.8796  |
|       functorch_dp_cifar10        |  64  |  0.8791  |
|             resnet18              |  8   |  0.8248  |
|           hf_Longformer           |  1   |  0.7863  |
|              hf_Bert              |  1   |  0.7829  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.777   |
|             resnet152             |  32  |  0.7492  |
|            hf_T5_large            |  1   |  0.645   |
|      nvidia_deeprecommender       | 256  |  0.584   |
|               hf_T5               |  1   |  0.5705  |
|            mnasnet1_0             |  0   |   nan    |
|        mobilenet_v3_large         |  0   |   nan    |
|           timm_resnest            |  0   |   nan    |
|              yolov3               |  0   |   nan    |
+-----------------------------------+------+----------+

Absolute latency (ms)

+-----------------------------------+------+-----------+
|               name                |  bs  | inductor  |
+-----------------------------------+------+-----------+
|   timm_vision_transformer_large   |  8   | 1398.7424 |
|            hf_T5_base             |  1   | 1208.9221 |
|            timm_nfnet             | 128  | 1062.2625 |
|           hf_GPT2_large           |  1   | 885.0781  |
|            Super_SloMo            |  6   | 641.5188  |
|            hf_T5_large            |  1   | 448.5634  |
|          vision_maskrcnn          |  1   | 390.5735  |
|            timm_regnet            |  32  | 379.4222  |
|             resnet152             |  32  | 378.8831  |
|        Background_Matting         |  1   | 287.9932  |
|            densenet121            |  64  | 283.8995  |
|          pytorch_stargan          |  16  | 245.9641  |
|     detectron2_fcos_r_50_fpn      |  1   |  241.546  |
|           pytorch_unet            |  1   | 234.4081  |
|            hf_BigBird             |  1   | 201.2967  |
|              demucs               |  1   | 188.0801  |
|         timm_efficientnet         |  64  | 186.9033  |
|             resnet50              |  32  | 172.0353  |
|            timm_vovnet            |  32  | 166.4536  |
|           hf_Longformer           |  1   |  92.4033  |
|              hf_Bart              |  1   |  84.0521  |
|          resnext50_32x4d          |  8   |  64.7025  |
|              hf_Bert              |  1   |  64.284   |
|    mobilenet_v2_quantized_qat     |  96  |  61.6329  |
|            tts_angular            |  64  |  60.5634  |
|        speech_transformer         |  1   |  59.7795  |
|             hf_Albert             |  1   |  57.749   |
|           fastNLP_Bert            |  1   |  56.6793  |
|              alexnet              | 128  |  56.6686  |
| attention_is_all_you_need_pytorch |  32  |  54.8065  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  53.4079  |
|          LearningToPaint          |  96  |  51.6065  |
|               hf_T5               |  1   |  48.896   |
|      timm_vision_transformer      |  8   |  48.6617  |
|              hf_GPT2              |  1   |  44.8374  |
|      nvidia_deeprecommender       | 256  |  43.0138  |
|               vgg16               |  4   |  41.0296  |
|      resnet50_quantized_qat       |  32  |  40.4163  |
|           hf_DistilBert           |  1   |  36.6378  |
|            hf_Reformer            |  1   |  34.5032  |
|           BERT_pytorch            |  2   |  30.8088  |
|        shufflenet_v2_x1_0         |  64  |  27.2268  |
|               dcgan               | 256  |  23.9807  |
|           mobilenet_v2            |  16  |  18.8449  |
|             resnet18              |  8   |  13.4283  |
|           squeezenet1_1           |  16  |  13.2435  |
|       functorch_dp_cifar10        |  64  |  10.4651  |
|               dlrm                | 2048 |   6.805   |
|                drq                |  1   |  1.1352   |
|           lennard_jones           | 1000 |  0.5759   |
|         soft_actor_critic         | 256  |  0.5045   |
|            mnasnet1_0             |  0   |    nan    |
|        mobilenet_v3_large         |  0   |    nan    |
|           timm_resnest            |  0   |    nan    |
|              yolov3               |  0   |    nan    |
+-----------------------------------+------+-----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       MT5ForConditionalGeneration       |  8  |  2.5012  |
|            XLNetLMHeadModel             | 32  |  1.9757  |
|     MobileBertForQuestionAnswering      | 64  |  1.3192  |
|            YituTechConvBert             |  1  |  1.2841  |
|               DistillGPT2               |  1  |  1.2666  |
|               GoogleFnet                |  1  |  1.2042  |
|          MobileBertForMaskedLM          | 32  |  1.1072  |
|             XGLMForCausalLM             |  8  |  1.0744  |
|                 BigBird                 |  1  |  1.0687  |
|     M2M100ForConditionalGeneration      |  8  |  1.0412  |
|                CamemBert                |  1  |  1.0292  |
|            AlbertForMaskedLM            |  4  |  1.0258  |
|       AlbertForQuestionAnswering        |  4  |  1.0251  |
|          AllenaiLongformerBase          |  1  |  1.0123  |
|             OPTForCausalLM              | 32  |  1.0061  |
|           DebertaForMaskedLM            |  4  |  1.003   |
|                 T5Small                 |  1  |  0.9955  |
|       DebertaForQuestionAnswering       |  8  |  0.9724  |
|      GPT2ForSequenceClassification      |  4  |  0.9399  |
|         MegatronBertForCausalLM         | 16  |  0.9384  |
|    LayoutLMForSequenceClassification    | 16  |  0.9268  |
|           RobertaForCausalLM            | 64  |  0.9214  |
|     PLBartForConditionalGeneration      | 16  |  0.9206  |
|           ElectraForCausalLM            | 32  |  0.9203  |
|      MBartForConditionalGeneration      | 16  |  0.9191  |
|         Speech2Text2ForCausalLM         | 128 |  0.9166  |
|     PegasusForConditionalGeneration     | 16  |  0.9151  |
|     DistilBertForQuestionAnswering      | 64  |  0.9048  |
|    MegatronBertForQuestionAnswering     | 16  |  0.9042  |
|            TrOCRForCausalLM             | 32  |  0.9037  |
|            PLBartForCausalLM            | 32  |  0.9025  |
|           LayoutLMForMaskedLM           | 16  |  0.9001  |
|       RobertaForQuestionAnswering       | 128 |  0.8975  |
|            MBartForCausalLM             | 32  |  0.8971  |
|        BertForQuestionAnswering         | 128 |  0.8951  |
|           PegasusForCausalLM            | 32  |  0.8918  |
|             BertForMaskedLM             | 64  |  0.8895  |
|       ElectraForQuestionAnswering       | 64  |  0.8739  |
|          DistilBertForMaskedLM          | 64  |  0.8656  |
|       T5ForConditionalGeneration        |  4  |  0.861   |
|             BartForCausalLM             |  4  |  0.839   |
|      BartForConditionalGeneration       |  2  |  0.8299  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.8292  |
|       BlenderbotSmallForCausalLM        | 64  |  0.8097  |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       AlbertForQuestionAnswering        |  4  | 26.3394  |
|          MobileBertForMaskedLM          | 32  | 25.7779  |
|     MobileBertForQuestionAnswering      | 64  |  25.68   |
|     PegasusForConditionalGeneration     | 16  |  25.467  |
|     M2M100ForConditionalGeneration      |  8  | 23.9695  |
|       DebertaForQuestionAnswering       |  8  | 23.7444  |
|          AllenaiLongformerBase          |  1  | 23.6976  |
|      BartForConditionalGeneration       |  2  | 23.3764  |
|           DebertaForMaskedLM            |  4  | 22.8322  |
|      MBartForConditionalGeneration      | 16  | 22.6826  |
|             OPTForCausalLM              | 32  | 22.5619  |
|                 BigBird                 |  1  | 21.8533  |
|             XGLMForCausalLM             |  8  | 21.3831  |
|          DistilBertForMaskedLM          | 64  | 20.8333  |
|         MegatronBertForCausalLM         | 16  | 20.0415  |
|    MegatronBertForQuestionAnswering     | 16  | 19.6688  |
| BlenderbotSmallForConditionalGeneration | 64  | 18.4866  |
|            YituTechConvBert             |  1  | 17.7681  |
|           PegasusForCausalLM            | 32  | 17.0373  |
|            AlbertForMaskedLM            |  4  | 17.0334  |
|       MT5ForConditionalGeneration       |  8  | 16.5386  |
|        BertForQuestionAnswering         | 128 | 16.3445  |
|       RobertaForQuestionAnswering       | 128 | 16.0391  |
|           LayoutLMForMaskedLM           | 16  | 15.5177  |
|       T5ForConditionalGeneration        |  4  | 15.4862  |
|           RobertaForCausalLM            | 64  | 15.2302  |
|             BartForCausalLM             |  4  | 15.1982  |
|           ElectraForCausalLM            | 32  | 15.1563  |
|       ElectraForQuestionAnswering       | 64  | 15.0421  |
|             BertForMaskedLM             | 64  | 15.0283  |
|     PLBartForConditionalGeneration      | 16  | 15.0188  |
|    LayoutLMForSequenceClassification    | 16  | 14.9871  |
|            MBartForCausalLM             | 32  | 14.5625  |
|      GPT2ForSequenceClassification      |  4  | 14.2372  |
|            TrOCRForCausalLM             | 32  | 14.1298  |
|                 T5Small                 |  1  |  13.542  |
|                CamemBert                |  1  | 12.7371  |
|       BlenderbotSmallForCausalLM        | 64  | 12.6528  |
|         Speech2Text2ForCausalLM         | 128 | 12.5373  |
|               GoogleFnet                |  1  | 12.2514  |
|     DistilBertForQuestionAnswering      | 64  | 11.3804  |
|            PLBartForCausalLM            | 32  | 11.1566  |
|            XLNetLMHeadModel             | 32  | 10.6467  |
|               DistillGPT2               |  1  | 10.4047  |
+-----------------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  |  0.995   |
|       AlbertForQuestionAnswering        |  4  |  0.9949  |
|       DebertaForQuestionAnswering       |  8  |  0.9942  |
|           DebertaForMaskedLM            |  4  |  0.9922  |
|       BlenderbotSmallForCausalLM        | 64  |  0.9915  |
|          DistilBertForMaskedLM          | 64  |  0.9913  |
|                 BigBird                 |  1  |  0.9908  |
|             BartForCausalLM             |  4  |  0.9905  |
|       ElectraForQuestionAnswering       | 64  |  0.9899  |
|           ElectraForCausalLM            | 32  |  0.9898  |
|         Speech2Text2ForCausalLM         | 128 |  0.9892  |
|      GPT2ForSequenceClassification      |  4  |  0.9887  |
|            PLBartForCausalLM            | 32  |  0.988   |
|           PegasusForCausalLM            | 32  |  0.9872  |
|        BertForQuestionAnswering         | 128 |  0.9863  |
|       RobertaForQuestionAnswering       | 128 |  0.986   |
|     DistilBertForQuestionAnswering      | 64  |  0.9859  |
|            TrOCRForCausalLM             | 32  |  0.9859  |
|            MBartForCausalLM             | 32  |  0.9858  |
|             OPTForCausalLM              | 32  |  0.9858  |
|           LayoutLMForMaskedLM           | 16  |  0.9854  |
|               GoogleFnet                |  1  |  0.9826  |
|           RobertaForCausalLM            | 64  |  0.9805  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.9797  |
|             BertForMaskedLM             | 64  |  0.9787  |
|    LayoutLMForSequenceClassification    | 16  |  0.9781  |
|     PegasusForConditionalGeneration     | 16  |  0.9779  |
|               DistillGPT2               |  1  |  0.9743  |
|       T5ForConditionalGeneration        |  4  |  0.9696  |
|            XLNetLMHeadModel             | 32  |  0.9662  |
|      BartForConditionalGeneration       |  2  |  0.9656  |
|     PLBartForConditionalGeneration      | 16  |  0.9655  |
|          AllenaiLongformerBase          |  1  |  0.9531  |
|      MBartForConditionalGeneration      | 16  |  0.9515  |
|                CamemBert                |  1  |  0.9487  |
|             XGLMForCausalLM             |  8  |  0.9433  |
|     M2M100ForConditionalGeneration      |  8  |  0.9399  |
|         MegatronBertForCausalLM         | 16  |  0.936   |
|    MegatronBertForQuestionAnswering     | 16  |  0.9295  |
|                 T5Small                 |  1  |  0.9231  |
|            YituTechConvBert             |  1  |  0.9204  |
|     MobileBertForQuestionAnswering      | 64  |  0.8881  |
|          MobileBertForMaskedLM          | 32  |  0.881   |
|       MT5ForConditionalGeneration       |  8  |  0.7482  |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|            XLNetLMHeadModel             | 32  | 6135.7544 |
|            AlbertForMaskedLM            |  4  | 2913.5788 |
|       AlbertForQuestionAnswering        |  4  | 2902.9108 |
|        BertForQuestionAnswering         | 128 | 1081.2987 |
|       RobertaForQuestionAnswering       | 128 | 1059.8627 |
|      BartForConditionalGeneration       |  2  | 984.6583  |
|             BartForCausalLM             |  4  | 837.3208  |
|           LayoutLMForMaskedLM           | 16  | 832.1489  |
| BlenderbotSmallForConditionalGeneration | 64  | 683.8456  |
|             BertForMaskedLM             | 64  | 671.6693  |
|    LayoutLMForSequenceClassification    | 16  | 670.1861  |
|           RobertaForCausalLM            | 64  | 652.6031  |
|       ElectraForQuestionAnswering       | 64  | 643.3807  |
|      MBartForConditionalGeneration      | 16  | 610.1322  |
|     PegasusForConditionalGeneration     | 16  | 609.0805  |
|            MBartForCausalLM             | 32  | 572.2292  |
|           PegasusForCausalLM            | 32  | 571.9501  |
|            TrOCRForCausalLM             | 32  | 561.4435  |
|      GPT2ForSequenceClassification      |  4  | 519.2097  |
|         MegatronBertForCausalLM         | 16  | 497.6218  |
|           ElectraForCausalLM            | 32  |  487.991  |
|       T5ForConditionalGeneration        |  4  | 467.7136  |
|             XGLMForCausalLM             |  8  | 453.1366  |
|    MegatronBertForQuestionAnswering     | 16  | 449.5991  |
|          DistilBertForMaskedLM          | 64  | 416.0586  |
|       BlenderbotSmallForCausalLM        | 64  | 384.0055  |
|     M2M100ForConditionalGeneration      |  8  | 382.1028  |
|             OPTForCausalLM              | 32  | 344.7281  |
|       DebertaForQuestionAnswering       |  8  | 328.4248  |
|     DistilBertForQuestionAnswering      | 64  | 255.9421  |
|       MT5ForConditionalGeneration       |  8  |  252.552  |
|            PLBartForCausalLM            | 32  | 246.1751  |
|           DebertaForMaskedLM            |  4  |  242.225  |
|     PLBartForConditionalGeneration      | 16  | 223.0679  |
|                 BigBird                 |  1  | 212.8135  |
|          AllenaiLongformerBase          |  1  | 174.5816  |
|         Speech2Text2ForCausalLM         | 128 | 167.4415  |
|          MobileBertForMaskedLM          | 32  | 148.5585  |
|     MobileBertForQuestionAnswering      | 64  |  134.642  |
|                 T5Small                 |  1  | 124.1493  |
|            YituTechConvBert             |  1  |  69.3295  |
|                CamemBert                |  1  |  68.0215  |
|               DistillGPT2               |  1  |  46.8582  |
|               GoogleFnet                |  1  |  42.4637  |
+-----------------------------------------+-----+-----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  |  1.3859  |
|          inception_v3           | 128 |  1.3183  |
|       gluon_inception_v3        | 128 |  1.3049  |
|        adv_inception_v3         | 128 |  1.2813  |
|           fbnetc_100            | 128 |  1.2028  |
|           volo_d1_224           | 64  |  1.2006  |
|          botnet26t_256          | 128 |  1.1893  |
|           mnasnet_100           | 128 |  1.1789  |
|          spnasnet_100           | 128 |  1.1788  |
|          ghostnet_100           | 128 |  1.174   |
|        ese_vovnet19b_dw         | 128 |  1.1691  |
|      mobilenetv3_large_100      | 128 |  1.1658  |
|             dpn107              | 32  |  1.1617  |
|        gluon_xception65         | 32  |  1.1576  |
|           res2next50            | 128 |  1.1218  |
|            gernet_l             | 128 |   1.12   |
|            repvgg_a2            | 128 |  1.118   |
|            lcnet_050            | 128 |  1.1163  |
|        res2net101_26w_4s        | 64  |  1.1121  |
|        res2net50_14w_8s         | 128 |  1.1096  |
|            fbnetv3_b            | 128 |  1.0893  |
|         mobilenetv2_100         | 128 |  1.0708  |
|           selecsls42b           | 128 |  1.0687  |
|          cspdarknet53           | 64  |  1.0549  |
|           regnety_002           | 128 |  1.042   |
|           tf_mixnet_l           | 128 |  1.0342  |
|      xcit_large_24_p8_224       |  5  |  1.0162  |
|          gmixer_24_224          | 128 |  1.0108  |
|          cait_m36_384           |  4  |  0.9912  |
|             dla102              | 128 |  0.9896  |
|            mixnet_l             | 128 |  0.9869  |
|           dm_nfnet_f0           | 128 |  0.9194  |
|        eca_halonext26ts         | 128 |  0.9112  |
|       eca_botnext26ts_256       | 128 |  0.9086  |
|  swin_base_patch4_window7_224   | 64  |  0.8985  |
|        sebotnet33ts_256         | 64  |  0.8906  |
|      beit_base_patch16_224      | 64  |  0.8877  |
|         poolformer_m36          | 64  |  0.8625  |
| deit_base_distilled_patch16_224 | 64  |  0.8574  |
|            nfnet_l0             | 128 |  0.8569  |
|      vit_base_patch16_224       | 64  |  0.8495  |
|           convit_base           | 64  |  0.8484  |
|          resmlp_12_224          | 128 |  0.8357  |
|           rexnet_100            | 128 |  0.8341  |
|          convnext_base          | 64  |  0.8285  |
|          jx_nest_base           | 32  |  0.8268  |
|       tf_efficientnet_b0        | 128 |  0.8073  |
|        tnt_s_patch16_224        | 128 |  0.8049  |
|          mixer_b16_224          | 128 |  0.8014  |
|            pit_b_224            | 64  |  0.7813  |
|         coat_lite_mini          | 128 |  0.7712  |
|            tinynet_a            | 128 |  0.7709  |
|           mobilevit_s           | 64  |  0.7699  |
|         crossvit_9_240          | 128 |  0.768   |
|        twins_pcpvt_base         | 64  |  0.7268  |
|          gmlp_s16_224           | 128 |  0.6241  |
|           resnest101e           |  0  |   0.0    |
|        convmixer_768_32         |  0  |   0.0    |
|     swsl_resnext101_32x16d      |  0  |   0.0    |
|            hrnet_w18            |  0  |   0.0    |
|         visformer_small         |  0  |   0.0    |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|             dpn107              | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|       eca_botnext26ts_256       | 2  |     pass      |
|          botnet26t_256          | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|           convit_base           | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|        eca_halonext26ts         | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|  swin_base_patch4_window7_224   | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|        sebotnet33ts_256         | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |  fail_to_run  |
|        convmixer_768_32         | 2  |  fail_to_run  |
|            hrnet_w18            | 2  |  fail_to_run  |
|         visformer_small         | 2  |  fail_to_run  |
|           resnest101e           | 2  |  fail_to_run  |
|          ghostnet_100           | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  | 38.6446  |
|           dm_nfnet_f0           | 128 | 38.1373  |
|  swin_base_patch4_window7_224   | 64  | 30.8531  |
|         poolformer_m36          | 64  | 30.7346  |
|          cait_m36_384           |  4  | 29.7031  |
|        twins_pcpvt_base         | 64  | 29.3807  |
|           tf_mixnet_l           | 128 | 29.2054  |
|            tinynet_a            | 128 | 27.9677  |
|           mobilevit_s           | 64  | 26.8933  |
|            mixnet_l             | 128 | 26.4851  |
|        res2net50_14w_8s         | 128 | 25.4154  |
|      xcit_large_24_p8_224       |  5  | 25.1994  |
|        tnt_s_patch16_224        | 128 | 24.5253  |
|             dla102              | 128 | 24.3247  |
|            nfnet_l0             | 128 | 24.1006  |
|        eca_halonext26ts         | 128 | 23.1717  |
|            fbnetv3_b            | 128 | 23.0866  |
|             dpn107              | 32  | 22.6769  |
|        res2net101_26w_4s        | 64  | 22.5152  |
|          jx_nest_base           | 32  | 22.0539  |
|           rexnet_100            | 128 | 21.7039  |
|        sebotnet33ts_256         | 64  |  21.513  |
|       eca_botnext26ts_256       | 128 | 21.2468  |
|         coat_lite_mini          | 128 | 20.3097  |
|          convnext_base          | 64  | 20.2474  |
|       tf_efficientnet_b0        | 128 | 19.9948  |
|           volo_d1_224           | 64  | 19.7657  |
|           convit_base           | 64  | 19.3706  |
|         crossvit_9_240          | 128 | 19.3676  |
|           res2next50            | 128 | 19.1922  |
|            pit_b_224            | 64  | 18.7909  |
|          gmlp_s16_224           | 128 | 18.5539  |
|          botnet26t_256          | 128 | 17.8705  |
|        adv_inception_v3         | 128 | 17.3002  |
|          cspdarknet53           | 64  | 17.0967  |
|          mixer_b16_224          | 128 | 16.5458  |
|        gluon_xception65         | 32  | 15.9474  |
|          ghostnet_100           | 128 | 15.6503  |
| deit_base_distilled_patch16_224 | 64  | 15.4246  |
|           regnety_002           | 128 | 15.4038  |
|         mobilenetv2_100         | 128 | 15.1876  |
|      vit_base_patch16_224       | 64  | 15.1275  |
|      mobilenetv3_large_100      | 128 | 14.9626  |
|           fbnetc_100            | 128 | 14.9414  |
|          spnasnet_100           | 128 | 14.6729  |
|            gernet_l             | 128 |  13.935  |
|      beit_base_patch16_224      | 64  | 13.7499  |
|       gluon_inception_v3        | 128 | 13.5428  |
|           mnasnet_100           | 128 | 13.4203  |
|          gmixer_24_224          | 128 | 13.4183  |
|          inception_v3           | 128 | 13.3092  |
|            repvgg_a2            | 128 |  13.011  |
|           selecsls42b           | 128 | 12.8237  |
|        ese_vovnet19b_dw         | 128 | 12.7159  |
|          resmlp_12_224          | 128 | 11.7585  |
|            lcnet_050            | 128 | 10.2061  |
|        convmixer_768_32         |  0  |   nan    |
|            hrnet_w18            |  0  |   nan    |
|           resnest101e           |  0  |   nan    |
|     swsl_resnext101_32x16d      |  0  |   nan    |
|         visformer_small         |  0  |   nan    |
+---------------------------------+-----+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        ese_vovnet19b_dw         | 128 |  0.9971  |
|             dla102              | 128 |  0.9963  |
|           selecsls42b           | 128 |  0.9963  |
|           tf_mixnet_l           | 128 |  0.9962  |
|       eca_botnext26ts_256       | 128 |  0.9962  |
|            mixnet_l             | 128 |  0.9961  |
|        eca_halonext26ts         | 128 |  0.9961  |
|        gluon_xception65         | 32  |  0.9959  |
|        adv_inception_v3         | 128 |  0.9957  |
|          botnet26t_256          | 128 |  0.9956  |
|          mixer_b16_224          | 128 |  0.9955  |
|        res2net50_14w_8s         | 128 |  0.9946  |
|           res2next50            | 128 |  0.9946  |
|          cspdarknet53           | 64  |  0.9945  |
|            pit_b_224            | 64  |  0.9944  |
|          resmlp_12_224          | 128 |  0.9943  |
|           rexnet_100            | 128 |  0.9943  |
|         coat_lite_mini          | 128 |  0.9938  |
|           mobilevit_s           | 64  |  0.9936  |
|           convit_base           | 64  |  0.9935  |
|          ghostnet_100           | 128 |  0.9935  |
|             dpn107              | 32  |  0.9933  |
|            nfnet_l0             | 128 |  0.9932  |
|       tf_efficientnet_b0        | 128 |  0.9931  |
|           dm_nfnet_f0           | 128 |  0.9928  |
|        sebotnet33ts_256         | 64  |  0.9925  |
| deit_base_distilled_patch16_224 | 64  |  0.9919  |
|            gernet_l             | 128 |  0.9919  |
|      vit_base_patch16_224       | 64  |  0.9918  |
|        res2net101_26w_4s        | 64  |  0.9917  |
|         mobilenetv2_100         | 128 |  0.9915  |
|        tnt_s_patch16_224        | 128 |  0.9913  |
|           mnasnet_100           | 128 |  0.9905  |
|          pnasnet5large          | 16  |  0.9904  |
|         crossvit_9_240          | 128 |  0.9897  |
|            tinynet_a            | 128 |  0.9897  |
|           fbnetc_100            | 128 |  0.9896  |
|      mobilenetv3_large_100      | 128 |  0.9894  |
|            repvgg_a2            | 128 |  0.9894  |
|            fbnetv3_b            | 128 |  0.9892  |
|          spnasnet_100           | 128 |  0.9886  |
|          convnext_base          | 64  |  0.9883  |
|  swin_base_patch4_window7_224   | 64  |  0.9846  |
|           regnety_002           | 128 |  0.9796  |
|           volo_d1_224           | 64  |  0.9787  |
|         poolformer_m36          | 64  |  0.9764  |
|          jx_nest_base           | 32  |  0.9731  |
|            lcnet_050            | 128 |  0.967   |
|       gluon_inception_v3        | 128 |  0.9649  |
|          inception_v3           | 128 |  0.9643  |
|          cait_m36_384           |  4  |  0.9626  |
|        twins_pcpvt_base         | 64  |  0.9612  |
|      xcit_large_24_p8_224       |  5  |  0.9537  |
|      beit_base_patch16_224      | 64  |  0.9503  |
|          gmixer_24_224          | 128 |  0.919   |
|          gmlp_s16_224           | 128 |  0.8688  |
|        convmixer_768_32         |  0  |   nan    |
|            hrnet_w18            |  0  |   nan    |
|           resnest101e           |  0  |   nan    |
|     swsl_resnext101_32x16d      |  0  |   nan    |
|         visformer_small         |  0  |   nan    |
+---------------------------------+-----+----------+

Absolute latency (ms)

+---------------------------------+-----+-----------+
|              name               | bs  | inductor  |
+---------------------------------+-----+-----------+
|           dm_nfnet_f0           | 128 | 1893.3513 |
|            nfnet_l0             | 128 | 1400.5184 |
|          mixer_b16_224          | 128 | 1343.6957 |
|           convit_base           | 64  | 1274.3139 |
|          cait_m36_384           |  4  | 1257.5395 |
|        tnt_s_patch16_224        | 128 | 1143.7164 |
|             dla102              | 128 | 1138.292  |
|  swin_base_patch4_window7_224   | 64  | 1120.7129 |
|          gmlp_s16_224           | 128 | 1067.7405 |
|          convnext_base          | 64  | 963.5042  |
|            pit_b_224            | 64  | 960.3873  |
|      vit_base_patch16_224       | 64  | 928.8819  |
| deit_base_distilled_patch16_224 | 64  | 917.4262  |
|         poolformer_m36          | 64  | 905.9793  |
|      beit_base_patch16_224      | 64  | 899.0354  |
|        eca_halonext26ts         | 128 | 753.4683  |
|       eca_botnext26ts_256       | 128 | 743.8374  |
|           res2next50            | 128 | 743.5604  |
|        res2net50_14w_8s         | 128 | 737.2616  |
|        adv_inception_v3         | 128 |  673.563  |
|       gluon_inception_v3        | 128 | 665.0792  |
|          inception_v3           | 128 | 664.3113  |
|          jx_nest_base           | 32  | 660.1735  |
|          gmixer_24_224          | 128 | 658.4195  |
|        twins_pcpvt_base         | 64  | 652.0627  |
|           tf_mixnet_l           | 128 | 622.3522  |
|            repvgg_a2            | 128 |  616.996  |
|            mixnet_l             | 128 | 607.9118  |
|          pnasnet5large          | 16  | 606.0259  |
|         coat_lite_mini          | 128 | 600.8722  |
|          botnet26t_256          | 128 | 585.0667  |
|        res2net101_26w_4s        | 64  | 584.4436  |
|           volo_d1_224           | 64  | 576.0334  |
|             dpn107              | 32  | 567.6784  |
|        sebotnet33ts_256         | 64  | 551.4591  |
|      xcit_large_24_p8_224       |  5  | 541.7088  |
|           mobilevit_s           | 64  | 535.4861  |
|        gluon_xception65         | 32  | 506.9852  |
|          cspdarknet53           | 64  | 493.8644  |
|            gernet_l             | 128 | 485.0978  |
|         crossvit_9_240          | 128 | 449.3476  |
|       tf_efficientnet_b0        | 128 | 430.5557  |
|           rexnet_100            | 128 | 400.9165  |
|           selecsls42b           | 128 | 378.0065  |
|          resmlp_12_224          | 128 | 370.8926  |
|        ese_vovnet19b_dw         | 128 | 351.5002  |
|            fbnetv3_b            | 128 | 345.3958  |
|            tinynet_a            | 128 | 306.0124  |
|         mobilenetv2_100         | 128 | 250.1286  |
|           fbnetc_100            | 128 | 217.9854  |
|          spnasnet_100           | 128 | 193.8245  |
|           mnasnet_100           | 128 | 182.0802  |
|          ghostnet_100           | 128 | 156.2125  |
|      mobilenetv3_large_100      | 128 | 152.9724  |
|           regnety_002           | 128 |  89.4823  |
|            lcnet_050            | 128 |  38.3547  |
|        convmixer_768_32         |  0  |    nan    |
|            hrnet_w18            |  0  |    nan    |
|           resnest101e           |  0  |    nan    |
|     swsl_resnext101_32x16d      |  0  |    nan    |
|         visformer_small         |  0  |    nan    |
+---------------------------------+-----+-----------+

@ESI-SYD
Copy link

ESI-SYD commented Nov 16, 2022

Performance Dashboard for float32 precision -- Single-core Single-thread (2022-11-13 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information

SW Nightly commit Master/Main commit
Pytorch 637228b 46796fe
Torchbench / 022dfe3
torchaudio 4b10b6a 74f9a89
torchtext 71e4561 c047efe
torchvision 797e1ac ffd5a56

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 20.04.5 LTS
Kernel 5.15.0-1022-aws
Microcode 0xd000331
GCC gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
GLIBC ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
Binutils GNU ld (GNU Binutils for Ubuntu) 2.34
Python Python 3.8.13
OpenSSL OpenSSL 1.1.1s 1 Nov 2022

Update: We use single-instance mode in this round.
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 91%, 50/55 | 100%, 44/44 | 89%, 54/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.07x    |    1.03x    |    1.11x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   24.94    |    24.00    |    31.63    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.89x    |    0.85x    |    0.86x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|            hf_Reformer            |  1  |  1.5011  |
|           mobilenet_v2            |  1  |  1.3734  |
|        speech_transformer         |  1  |  1.3015  |
| attention_is_all_you_need_pytorch |  1  |  1.2772  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  1.2309  |
|            timm_nfnet             |  1  |  1.1797  |
|              hf_GPT2              |  1  |  1.1606  |
|         soft_actor_critic         | 256 |  1.1579  |
|             resnet18              |  1  |  1.1408  |
|          vision_maskrcnn          |  1  |  1.1278  |
|           pytorch_unet            |  1  |  1.1258  |
|        shufflenet_v2_x1_0         |  1  |  1.1245  |
|            timm_vovnet            |  1  |  1.1127  |
|           squeezenet1_1           |  1  |  1.0981  |
|               vgg16               |  1  |  1.0909  |
|     detectron2_fcos_r_50_fpn      |  1  |  1.0901  |
|           BERT_pytorch            |  1  |  1.0853  |
|              alexnet              |  1  |  1.0752  |
|          pytorch_stargan          | 16  |  1.0681  |
|            densenet121            |  1  |  1.0661  |
|            Super_SloMo            |  1  |  1.0624  |
|            timm_regnet            |  1  |  1.0616  |
|                drq                |  1  |  1.0598  |
|               dlrm                |  1  |  1.0443  |
|        Background_Matting         |  1  |  1.0345  |
|               dcgan               |  1  |  1.0263  |
|          LearningToPaint          |  1  |  1.0231  |
|    mobilenet_v2_quantized_qat     |  1  |  1.0035  |
|            tts_angular            |  1  |  1.0009  |
|      resnet50_quantized_qat       |  1  |  0.9994  |
|              demucs               |  1  |  0.9986  |
|            hf_BigBird             |  1  |  0.9947  |
|      nvidia_deeprecommender       |  1  |  0.9779  |
|      timm_vision_transformer      |  1  |  0.972   |
|           hf_DistilBert           |  1  |  0.9675  |
|             resnet50              |  1  |   0.9    |
|             resnet152             |  1  |  0.8906  |
|   timm_vision_transformer_large   |  1  |  0.8427  |
|           hf_Longformer           |  1  |  0.8362  |
|            hf_T5_large            |  1  |  0.7989  |
|          resnext50_32x4d          |  1  |  0.794   |
|         timm_efficientnet         |  1  |  0.7916  |
|             hf_Albert             |  1  |  0.7731  |
|           lennard_jones           |  1  |  0.7587  |
|              hf_Bert              |  1  |  0.7504  |
|              hf_Bart              |  1  |  0.7361  |
|           fastNLP_Bert            |  1  |  0.7308  |
|               hf_T5               |  1  |  0.724   |
|           hf_GPT2_large           |  1  |  0.7223  |
|            hf_T5_base             |  1  |  0.6573  |
|       functorch_dp_cifar10        |  1  |  0.1853  |
|            mnasnet1_0             |  0  |   0.0    |
|           timm_resnest            |  0  |   0.0    |
|        mobilenet_v3_large         |  0  |   0.0    |
|              yolov3               |  0  |   0.0    |
+-----------------------------------+-----+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  1  | pass_due_to_skip |
|   timm_vision_transformer_large   |  1  | pass_due_to_skip |
|           hf_GPT2_large           |  1  | pass_due_to_skip |
|           BERT_pytorch            |  1  |       pass       |
|       functorch_dp_cifar10        |  1  |       pass       |
|             hf_Albert             |  1  |       pass       |
| attention_is_all_you_need_pytorch |  1  |       pass       |
|               dcgan               |  1  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  1  |       pass       |
|     detectron2_fcos_r_50_fpn      |  1  |       pass       |
|               dlrm                |  1  |       pass       |
|                drq                |  1  |       pass       |
|           fastNLP_Bert            |  1  |       pass       |
|              hf_Bart              |  1  |       pass       |
|            hf_T5_base             |  1  |       pass       |
|           lennard_jones           |  1  |       pass       |
|              hf_Bert              |  1  |       pass       |
|            hf_BigBird             |  1  |       pass       |
|           hf_DistilBert           |  1  |       pass       |
|              hf_GPT2              |  1  |       pass       |
|          LearningToPaint          |  1  |       pass       |
|           hf_Longformer           |  1  |       pass       |
|            hf_Reformer            |  1  |       pass       |
|               hf_T5               |  1  |       pass       |
|            timm_vovnet            |  1  |       pass       |
|              alexnet              |  1  |       pass       |
|           mobilenet_v2            |  1  |       pass       |
|             resnet50              |  1  |       pass       |
|            Super_SloMo            |  1  |       pass       |
|      resnet50_quantized_qat       |  1  |       pass       |
|    mobilenet_v2_quantized_qat     |  1  |       pass       |
|               vgg16               |  1  |       pass       |
|      nvidia_deeprecommender       |  1  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|           pytorch_unet            |  1  |       pass       |
|      timm_vision_transformer      |  1  |       pass       |
|             resnet18              |  1  |       pass       |
|             resnet152             |  1  |       pass       |
|          resnext50_32x4d          |  1  |       pass       |
|        Background_Matting         |  1  |       pass       |
|        shufflenet_v2_x1_0         |  1  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|        speech_transformer         |  1  |       pass       |
|           squeezenet1_1           |  1  |       pass       |
|         timm_efficientnet         |  1  |       pass       |
|            timm_nfnet             |  1  |       pass       |
|            timm_regnet            |  1  |       pass       |
|            tts_angular            |  1  |       pass       |
|            mnasnet1_0             |  1  |   fail_to_run    |
|           timm_resnest            |  1  |   fail_to_run    |
|        mobilenet_v3_large         |  1  |   fail_to_run    |
|              yolov3               |  1  |   fail_to_run    |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|            hf_T5_base             |  1  | 92.8005  |
|            hf_T5_large            |  1  | 77.7114  |
|     detectron2_fcos_r_50_fpn      |  1  | 74.3171  |
|           hf_GPT2_large           |  1  | 70.7082  |
|            densenet121            |  1  | 58.1948  |
|          vision_maskrcnn          |  1  |  55.998  |
|            timm_nfnet             |  1  | 38.8452  |
|           hf_Longformer           |  1  | 38.6295  |
|         timm_efficientnet         |  1  | 37.2077  |
|   timm_vision_transformer_large   |  1  | 35.6581  |
|        Background_Matting         |  1  | 33.8217  |
|            Super_SloMo            |  1  | 33.7243  |
|            timm_regnet            |  1  | 28.8207  |
|            timm_vovnet            |  1  | 28.5165  |
|           BERT_pytorch            |  1  | 27.9394  |
|             resnet152             |  1  | 27.6069  |
|            hf_Reformer            |  1  | 25.1953  |
|           pytorch_unet            |  1  | 24.1219  |
|            hf_BigBird             |  1  | 23.9335  |
|           hf_DistilBert           |  1  | 23.5613  |
|        speech_transformer         |  1  | 23.1587  |
|          resnext50_32x4d          |  1  | 22.8071  |
|               hf_T5               |  1  | 22.7984  |
|             resnet50              |  1  | 22.5974  |
|              hf_Bart              |  1  | 22.2426  |
|       functorch_dp_cifar10        |  1  |  21.438  |
|              hf_GPT2              |  1  | 19.4325  |
|        shufflenet_v2_x1_0         |  1  | 18.9821  |
|           fastNLP_Bert            |  1  | 18.3992  |
|              hf_Bert              |  1  | 17.7705  |
|          LearningToPaint          |  1  | 17.7093  |
|      timm_vision_transformer      |  1  |  17.112  |
|             resnet18              |  1  | 16.6731  |
|           squeezenet1_1           |  1  | 16.4526  |
| attention_is_all_you_need_pytorch |  1  | 16.2151  |
|             hf_Albert             |  1  | 16.0676  |
|           mobilenet_v2            |  1  | 14.9278  |
|          pytorch_stargan          | 16  | 14.2514  |
|               vgg16               |  1  | 14.1918  |
|   pytorch_CycleGAN_and_pix2pix    |  1  | 13.2499  |
|                drq                |  1  | 11.6338  |
|              alexnet              |  1  | 10.7819  |
|               dlrm                |  1  | 10.6928  |
|      nvidia_deeprecommender       |  1  | 10.3391  |
|               dcgan               |  1  |  9.4681  |
|            tts_angular            |  1  |  8.4271  |
|         soft_actor_critic         | 256 |  8.4067  |
|           lennard_jones           |  1  |  7.9306  |
|              demucs               |  1  |  1.2121  |
|    mobilenet_v2_quantized_qat     |  1  |  0.0774  |
|      resnet50_quantized_qat       |  1  |  0.0578  |
|            mnasnet1_0             |  0  |   nan    |
|        mobilenet_v3_large         |  0  |   nan    |
|           timm_resnest            |  0  |   nan    |
|              yolov3               |  0  |   nan    |
+-----------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|      resnet50_quantized_qat       |  1  |  1.0555  |
|              demucs               |  1  |  0.9986  |
|               dlrm                |  1  |  0.9981  |
|        Background_Matting         |  1  |  0.9955  |
|    mobilenet_v2_quantized_qat     |  1  |  0.9918  |
|           pytorch_unet            |  1  |  0.9891  |
|         soft_actor_critic         | 256 |  0.9871  |
|          pytorch_stargan          | 16  |  0.9864  |
|            tts_angular            |  1  |  0.9839  |
|           lennard_jones           |  1  |  0.9835  |
|                drq                |  1  |  0.9819  |
|            hf_T5_base             |  1  |  0.9803  |
|            hf_BigBird             |  1  |  0.9766  |
|            Super_SloMo            |  1  |  0.9756  |
|              alexnet              |  1  |  0.9745  |
|               vgg16               |  1  |  0.9675  |
|           hf_DistilBert           |  1  |  0.9581  |
|            timm_vovnet            |  1  |  0.9498  |
|               dcgan               |  1  |  0.9484  |
|              hf_GPT2              |  1  |  0.947   |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  0.9415  |
|   timm_vision_transformer_large   |  1  |  0.9377  |
|           squeezenet1_1           |  1  |  0.937   |
|       functorch_dp_cifar10        |  1  |  0.9323  |
|           mobilenet_v2            |  1  |  0.9289  |
|           BERT_pytorch            |  1  |  0.9276  |
|              hf_Bart              |  1  |  0.9274  |
|          LearningToPaint          |  1  |  0.9272  |
|        shufflenet_v2_x1_0         |  1  |  0.921   |
|        speech_transformer         |  1  |  0.9188  |
|          vision_maskrcnn          |  1  |  0.9154  |
|            hf_Reformer            |  1  |  0.9086  |
| attention_is_all_you_need_pytorch |  1  |   0.9    |
|             hf_Albert             |  1  |  0.8985  |
|            timm_regnet            |  1  |  0.8979  |
|             resnet18              |  1  |  0.8822  |
|      timm_vision_transformer      |  1  |  0.8794  |
|         timm_efficientnet         |  1  |  0.8656  |
|          resnext50_32x4d          |  1  |  0.8288  |
|             resnet50              |  1  |  0.8273  |
|            densenet121            |  1  |  0.8186  |
|     detectron2_fcos_r_50_fpn      |  1  |  0.8031  |
|               hf_T5               |  1  |  0.8024  |
|              hf_Bert              |  1  |  0.779   |
|           hf_GPT2_large           |  1  |  0.7665  |
|             resnet152             |  1  |  0.7302  |
|           fastNLP_Bert            |  1  |  0.7229  |
|           hf_Longformer           |  1  |  0.7169  |
|            timm_nfnet             |  1  |  0.6614  |
|      nvidia_deeprecommender       |  1  |  0.4974  |
|            hf_T5_large            |  1  |  0.4545  |
|            mnasnet1_0             |  0  |   nan    |
|        mobilenet_v3_large         |  0  |   nan    |
|           timm_resnest            |  0  |   nan    |
|              yolov3               |  0  |   nan    |
+-----------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------+-----+------------+
|               name                | bs  |  inductor  |
+-----------------------------------+-----+------------+
|            hf_T5_base             |  1  | 24338.544  |
|           hf_GPT2_large           |  1  | 17436.8148 |
|            hf_T5_large            |  1  | 7392.2429  |
|        Background_Matting         |  1  | 5839.6769  |
|           pytorch_unet            |  1  | 4485.8313  |
|   timm_vision_transformer_large   |  1  | 3919.5872  |
|          vision_maskrcnn          |  1  | 2901.5894  |
|          pytorch_stargan          | 16  | 2755.3495  |
|     detectron2_fcos_r_50_fpn      |  1  | 2393.4072  |
|            Super_SloMo            |  1  | 2220.5123  |
|              demucs               |  1  | 1954.1595  |
|            hf_BigBird             |  1  | 1786.2235  |
|              hf_Bart              |  1  | 1387.0216  |
|           hf_Longformer           |  1  | 1346.5585  |
|              hf_Bert              |  1  | 1062.5523  |
|             hf_Albert             |  1  |  938.716   |
|           fastNLP_Bert            |  1  |  878.4237  |
|        speech_transformer         |  1  |  791.4288  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  777.1568  |
|               hf_T5               |  1  |  743.4815  |
|           hf_DistilBert           |  1  |  626.4406  |
|              hf_GPT2              |  1  |  542.0892  |
|            hf_Reformer            |  1  |  496.1613  |
|             resnet152             |  1  |  249.2874  |
|               vgg16               |  1  |  241.9715  |
|            timm_nfnet             |  1  |  225.9756  |
|           BERT_pytorch            |  1  |  211.919   |
|            timm_regnet            |  1  |  204.7786  |
|          resnext50_32x4d          |  1  |  164.5818  |
|            timm_vovnet            |  1  |  152.549   |
|             resnet50              |  1  |  151.8386  |
|            densenet121            |  1  |  116.0932  |
|      timm_vision_transformer      |  1  |  112.181   |
|             resnet18              |  1  |  62.0212   |
|         timm_efficientnet         |  1  |  61.6157   |
|            tts_angular            |  1  |  55.7978   |
|      nvidia_deeprecommender       |  1  |  55.1794   |
|      resnet50_quantized_qat       |  1  |  38.2829   |
| attention_is_all_you_need_pytorch |  1  |  35.3115   |
|       functorch_dp_cifar10        |  1  |  33.3901   |
|              alexnet              |  1  |  31.0984   |
|           mobilenet_v2            |  1  |   27.257   |
|           squeezenet1_1           |  1  |  22.6721   |
|        shufflenet_v2_x1_0         |  1  |   18.763   |
|    mobilenet_v2_quantized_qat     |  1  |  17.8269   |
|          LearningToPaint          |  1  |   14.713   |
|               dcgan               |  1  |   7.3128   |
|         soft_actor_critic         | 256 |   3.9637   |
|                drq                |  1  |   3.5036   |
|               dlrm                |  1  |   0.7033   |
|           lennard_jones           |  1  |   0.0784   |
|            mnasnet1_0             |  0  |    nan     |
|        mobilenet_v3_large         |  0  |    nan     |
|           timm_resnest            |  0  |    nan     |
|              yolov3               |  0  |    nan     |
+-----------------------------------+-----+------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|     MobileBertForQuestionAnswering      | 1  |  2.0157  |
|               GoogleFnet                | 1  |  1.1464  |
|         Speech2Text2ForCausalLM         | 1  |  1.1193  |
|            XLNetLMHeadModel             | 1  |  1.1035  |
|             OPTForCausalLM              | 1  |  1.0493  |
|     DistilBertForQuestionAnswering      | 1  |  1.0272  |
|      MBartForConditionalGeneration      | 1  |  1.0185  |
|       DebertaForQuestionAnswering       | 1  |  1.0141  |
|     PegasusForConditionalGeneration     | 1  |  1.0138  |
| BlenderbotSmallForConditionalGeneration | 1  |  0.9942  |
|     PLBartForConditionalGeneration      | 1  |  0.9911  |
|       ElectraForQuestionAnswering       | 1  |  0.9823  |
|            TrOCRForCausalLM             | 1  |  0.9578  |
|            MBartForCausalLM             | 1  |  0.9551  |
|       AlbertForQuestionAnswering        | 1  |  0.9503  |
|           PegasusForCausalLM            | 1  |  0.9483  |
|            AlbertForMaskedLM            | 1  |  0.9433  |
|          MobileBertForMaskedLM          | 1  |  0.9305  |
|          DistilBertForMaskedLM          | 1  |  0.9283  |
|            PLBartForCausalLM            | 1  |  0.9071  |
|                 BigBird                 | 1  |  0.9054  |
|     M2M100ForConditionalGeneration      | 1  |  0.8989  |
|       BlenderbotSmallForCausalLM        | 1  |  0.8872  |
|            YituTechConvBert             | 1  |  0.8676  |
|    MegatronBertForQuestionAnswering     | 1  |  0.8559  |
|         MegatronBertForCausalLM         | 1  |  0.8358  |
|       RobertaForQuestionAnswering       | 1  |  0.831   |
|           ElectraForCausalLM            | 1  |  0.8181  |
|        BertForQuestionAnswering         | 1  |  0.8142  |
|           DebertaForMaskedLM            | 1  |  0.8102  |
|             XGLMForCausalLM             | 1  |  0.8052  |
|           RobertaForCausalLM            | 1  |  0.7969  |
|             BertForMaskedLM             | 1  |  0.795   |
|          AllenaiLongformerBase          | 1  |  0.7873  |
|               DistillGPT2               | 1  |  0.7127  |
|                CamemBert                | 1  |  0.7103  |
|    LayoutLMForSequenceClassification    | 1  |  0.7102  |
|           LayoutLMForMaskedLM           | 1  |  0.7085  |
|             BartForCausalLM             | 1  |  0.6974  |
|       MT5ForConditionalGeneration       | 1  |  0.6928  |
|      GPT2ForSequenceClassification      | 1  |  0.6634  |
|      BartForConditionalGeneration       | 1  |  0.6572  |
|                 T5Small                 | 1  |  0.6369  |
|       T5ForConditionalGeneration        | 1  |  0.615   |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|      BartForConditionalGeneration       | 1  | 47.1175  |
|       AlbertForQuestionAnswering        | 1  | 44.5295  |
|          AllenaiLongformerBase          | 1  | 37.0788  |
|    MegatronBertForQuestionAnswering     | 1  | 36.2235  |
|            AlbertForMaskedLM            | 1  | 35.9871  |
|     M2M100ForConditionalGeneration      | 1  | 31.9347  |
|       MT5ForConditionalGeneration       | 1  | 31.5822  |
|          MobileBertForMaskedLM          | 1  | 31.4211  |
|      MBartForConditionalGeneration      | 1  | 30.9842  |
|     MobileBertForQuestionAnswering      | 1  | 30.8973  |
|                 T5Small                 | 1  | 30.2116  |
|     PegasusForConditionalGeneration     | 1  | 30.0553  |
|       T5ForConditionalGeneration        | 1  | 29.3266  |
|             XGLMForCausalLM             | 1  | 29.1562  |
|         MegatronBertForCausalLM         | 1  | 27.7735  |
|             BartForCausalLM             | 1  | 25.5648  |
|           DebertaForMaskedLM            | 1  | 25.1827  |
|                 BigBird                 | 1  | 25.0161  |
|       DebertaForQuestionAnswering       | 1  | 24.3473  |
|            XLNetLMHeadModel             | 1  | 23.5854  |
|            YituTechConvBert             | 1  | 22.6812  |
|           PegasusForCausalLM            | 1  | 21.3936  |
| BlenderbotSmallForConditionalGeneration | 1  | 20.6178  |
|      GPT2ForSequenceClassification      | 1  | 20.3253  |
|     PLBartForConditionalGeneration      | 1  |  20.325  |
|            MBartForCausalLM             | 1  | 20.3167  |
|             OPTForCausalLM              | 1  | 20.0142  |
|                CamemBert                | 1  |  19.289  |
|            TrOCRForCausalLM             | 1  | 19.0708  |
|             BertForMaskedLM             | 1  | 18.9873  |
|           LayoutLMForMaskedLM           | 1  |  18.862  |
|    LayoutLMForSequenceClassification    | 1  | 18.8137  |
|               DistillGPT2               | 1  | 18.3318  |
|           RobertaForCausalLM            | 1  |  17.153  |
|           ElectraForCausalLM            | 1  | 16.6059  |
|       RobertaForQuestionAnswering       | 1  | 16.4143  |
|        BertForQuestionAnswering         | 1  | 15.9136  |
|       ElectraForQuestionAnswering       | 1  | 15.6335  |
|         Speech2Text2ForCausalLM         | 1  | 15.4609  |
|       BlenderbotSmallForCausalLM        | 1  | 15.1055  |
|            PLBartForCausalLM            | 1  | 14.5719  |
|          DistilBertForMaskedLM          | 1  | 14.5081  |
|               GoogleFnet                | 1  | 13.8878  |
|     DistilBertForQuestionAnswering      | 1  | 13.7853  |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|               GoogleFnet                | 1  |  0.9833  |
|                 BigBird                 | 1  |  0.982   |
|       DebertaForQuestionAnswering       | 1  |  0.9742  |
|               DistillGPT2               | 1  |  0.9711  |
|             BartForCausalLM             | 1  |  0.9692  |
|           DebertaForMaskedLM            | 1  |  0.9677  |
|            PLBartForCausalLM            | 1  |  0.9642  |
|           PegasusForCausalLM            | 1  |  0.9594  |
|            TrOCRForCausalLM             | 1  |  0.9588  |
|     DistilBertForQuestionAnswering      | 1  |  0.9588  |
|            MBartForCausalLM             | 1  |  0.9578  |
|          DistilBertForMaskedLM          | 1  |  0.9566  |
|     M2M100ForConditionalGeneration      | 1  |  0.9529  |
|     PegasusForConditionalGeneration     | 1  |  0.9484  |
|             OPTForCausalLM              | 1  |  0.9472  |
|         Speech2Text2ForCausalLM         | 1  |  0.9429  |
|       BlenderbotSmallForCausalLM        | 1  |  0.9421  |
|          AllenaiLongformerBase          | 1  |  0.9393  |
|             XGLMForCausalLM             | 1  |  0.9383  |
|        BertForQuestionAnswering         | 1  |  0.9267  |
|       RobertaForQuestionAnswering       | 1  |  0.9263  |
|     PLBartForConditionalGeneration      | 1  |  0.9139  |
|           RobertaForCausalLM            | 1  |  0.9058  |
|             BertForMaskedLM             | 1  |  0.9053  |
|      BartForConditionalGeneration       | 1  |  0.9017  |
|      MBartForConditionalGeneration      | 1  |  0.8851  |
|       ElectraForQuestionAnswering       | 1  |  0.8503  |
|           ElectraForCausalLM            | 1  |  0.8476  |
| BlenderbotSmallForConditionalGeneration | 1  |  0.8268  |
|      GPT2ForSequenceClassification      | 1  |  0.8072  |
|            XLNetLMHeadModel             | 1  |  0.792   |
|                CamemBert                | 1  |  0.7851  |
|           LayoutLMForMaskedLM           | 1  |  0.7813  |
|            YituTechConvBert             | 1  |  0.7812  |
|       T5ForConditionalGeneration        | 1  |  0.7482  |
|                 T5Small                 | 1  |  0.742   |
|    LayoutLMForSequenceClassification    | 1  |  0.7225  |
|       AlbertForQuestionAnswering        | 1  |  0.6756  |
|            AlbertForMaskedLM            | 1  |  0.6756  |
|         MegatronBertForCausalLM         | 1  |  0.6464  |
|    MegatronBertForQuestionAnswering     | 1  |  0.641   |
|          MobileBertForMaskedLM          | 1  |  0.5617  |
|     MobileBertForQuestionAnswering      | 1  |  0.5454  |
|       MT5ForConditionalGeneration       | 1  |  0.4533  |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+------------+
|                  name                   | bs |  inductor  |
+-----------------------------------------+----+------------+
|            AlbertForMaskedLM            | 1  | 15979.3074 |
|       AlbertForQuestionAnswering        | 1  | 15761.2033 |
|      BartForConditionalGeneration       | 1  | 10401.2985 |
|             BartForCausalLM             | 1  | 4551.3694  |
|            XLNetLMHeadModel             | 1  | 3923.4382  |
|          AllenaiLongformerBase          | 1  |  3044.825  |
|      GPT2ForSequenceClassification      | 1  | 2708.5162  |
|       T5ForConditionalGeneration        | 1  | 2494.2517  |
|                 BigBird                 | 1  | 2485.3698  |
|                 T5Small                 | 1  | 2461.8846  |
|             XGLMForCausalLM             | 1  | 1297.3975  |
|           DebertaForMaskedLM            | 1  | 1184.1494  |
|           LayoutLMForMaskedLM           | 1  | 1175.5668  |
|                CamemBert                | 1  | 1169.6708  |
|            YituTechConvBert             | 1  | 1045.6575  |
|     M2M100ForConditionalGeneration      | 1  |  1002.25   |
|       DebertaForQuestionAnswering       | 1  |  972.4197  |
|    LayoutLMForSequenceClassification    | 1  |  939.5538  |
|     PegasusForConditionalGeneration     | 1  |  914.3748  |
|      MBartForConditionalGeneration      | 1  |  909.1633  |
|               DistillGPT2               | 1  |  866.172   |
|       MT5ForConditionalGeneration       | 1  |  837.5624  |
|               GoogleFnet                | 1  |  762.3751  |
|         MegatronBertForCausalLM         | 1  |  741.3603  |
|    MegatronBertForQuestionAnswering     | 1  |  659.098   |
|           PegasusForCausalLM            | 1  |  469.6912  |
|            MBartForCausalLM             | 1  |  466.6514  |
|            TrOCRForCausalLM             | 1  |  464.9123  |
|     PLBartForConditionalGeneration      | 1  |  355.0224  |
|           ElectraForCausalLM            | 1  |  353.6413  |
|             OPTForCausalLM              | 1  |  285.7109  |
| BlenderbotSmallForConditionalGeneration | 1  |  276.1258  |
|             BertForMaskedLM             | 1  |  270.0311  |
|           RobertaForCausalLM            | 1  |  269.9671  |
|            PLBartForCausalLM            | 1  |  216.2976  |
|       ElectraForQuestionAnswering       | 1  |  205.9566  |
|        BertForQuestionAnswering         | 1  |  203.536   |
|       RobertaForQuestionAnswering       | 1  |  199.4976  |
|          DistilBertForMaskedLM          | 1  |  173.1242  |
|       BlenderbotSmallForCausalLM        | 1  |  170.3158  |
|          MobileBertForMaskedLM          | 1  |  142.1039  |
|     DistilBertForQuestionAnswering      | 1  |  101.307   |
|     MobileBertForQuestionAnswering      | 1  |  58.3095   |
|         Speech2Text2ForCausalLM         | 1  |  33.4075   |
+-----------------------------------------+----+------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  |  1.5261  |
|           regnety_002           | 1  |  1.4805  |
|           fbnetc_100            | 1  |  1.4311  |
|          spnasnet_100           | 1  |  1.4109  |
|           mnasnet_100           | 1  |  1.4031  |
|         mobilenetv2_100         | 1  |  1.3833  |
|            fbnetv3_b            | 1  |  1.3708  |
|          inception_v3           | 1  |  1.3517  |
|        ese_vovnet19b_dw         | 1  |  1.3445  |
|       gluon_inception_v3        | 1  |  1.3418  |
|        adv_inception_v3         | 1  |  1.333   |
|          botnet26t_256          | 1  |  1.3111  |
|      mobilenetv3_large_100      | 1  |  1.2825  |
|            lcnet_050            | 1  |  1.2651  |
|          gmixer_24_224          | 1  |  1.258   |
|            gernet_l             | 1  |  1.193   |
|          cspdarknet53           | 1  |  1.1808  |
|            nfnet_l0             | 1  |  1.1806  |
|           dm_nfnet_f0           | 1  |  1.166   |
|           volo_d1_224           | 1  |  1.1242  |
|            repvgg_a2            | 1  |  1.121   |
|          ghostnet_100           | 1  |  1.0803  |
|      beit_base_patch16_224      | 1  |  1.0729  |
|             dpn107              | 1  |  1.067   |
|        tnt_s_patch16_224        | 1  |  1.0514  |
|         crossvit_9_240          | 1  |  1.0269  |
|        gluon_xception65         | 1  |  1.0139  |
|           tf_mixnet_l           | 1  |  0.9924  |
|           convit_base           | 1  |  0.9908  |
|            mixnet_l             | 1  |  0.9894  |
|           rexnet_100            | 1  |  0.988   |
|          resmlp_12_224          | 1  |  0.9627  |
|           selecsls42b           | 1  |  0.9512  |
|         coat_lite_mini          | 1  |  0.9506  |
|        sebotnet33ts_256         | 1  |  0.9414  |
|        eca_halonext26ts         | 1  |  0.9367  |
|      vit_base_patch16_224       | 1  |  0.9362  |
|       eca_botnext26ts_256       | 1  |  0.9286  |
|        res2net50_14w_8s         | 1  |  0.922   |
|          jx_nest_base           | 1  |  0.9137  |
|            pit_b_224            | 1  |  0.9052  |
|        res2net101_26w_4s        | 1  |  0.902   |
|          mixer_b16_224          | 1  |  0.8879  |
|           mobilevit_s           | 1  |  0.8722  |
|      xcit_large_24_p8_224       | 1  |  0.8715  |
|           res2next50            | 1  |  0.8654  |
|            tinynet_a            | 1  |  0.8091  |
|             dla102              | 1  |  0.8081  |
|  swin_base_patch4_window7_224   | 1  |  0.7887  |
| deit_base_distilled_patch16_224 | 1  |  0.7847  |
|       tf_efficientnet_b0        | 1  |  0.7563  |
|          cait_m36_384           | 1  |  0.7369  |
|          gmlp_s16_224           | 1  |  0.7355  |
|         poolformer_m36          | 1  |  0.7316  |
|          convnext_base          | 1  |  0.6633  |
|        twins_pcpvt_base         | 1  |  0.6567  |
|     swsl_resnext101_32x16d      | 0  |   0.0    |
|        convmixer_768_32         | 0  |   0.0    |
|         visformer_small         | 0  |   0.0    |
|            hrnet_w18            | 0  |   0.0    |
|           resnest101e           | 0  |   0.0    |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|           res2next50            | 1  |     pass      |
|       eca_botnext26ts_256       | 1  |     pass      |
|          botnet26t_256          | 1  |     pass      |
|         coat_lite_mini          | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|             dla102              | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|      xcit_large_24_p8_224       | 1  |     pass      |
|        eca_halonext26ts         | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|        gluon_xception65         | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|           volo_d1_224           | 1  |     pass      |
|        res2net50_14w_8s         | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|          pnasnet5large          | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|        res2net101_26w_4s        | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|           selecsls42b           | 1  |     pass      |
|        sebotnet33ts_256         | 1  |     pass      |
|           rexnet_100            | 1  |     pass      |
|     swsl_resnext101_32x16d      | 1  |  fail_to_run  |
|        convmixer_768_32         | 1  |  fail_to_run  |
|            hrnet_w18            | 1  |  fail_to_run  |
|         visformer_small         | 1  |  fail_to_run  |
|           resnest101e           | 1  |  fail_to_run  |
|          ghostnet_100           | 1  | fail_accuracy |
|          cait_m36_384           | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 65.8488  |
|           tf_mixnet_l           | 1  | 52.3016  |
|        twins_pcpvt_base         | 1  | 51.7848  |
|           mobilevit_s           | 1  | 46.7904  |
|            mixnet_l             | 1  | 45.6539  |
|           rexnet_100            | 1  | 45.2664  |
|          cait_m36_384           | 1  | 45.1329  |
|  swin_base_patch4_window7_224   | 1  | 44.4902  |
|         poolformer_m36          | 1  | 43.0483  |
|          jx_nest_base           | 1  | 42.9664  |
|        res2net50_14w_8s         | 1  | 40.3109  |
|             dpn107              | 1  | 40.1958  |
|      xcit_large_24_p8_224       | 1  | 40.0007  |
|       tf_efficientnet_b0        | 1  | 39.7331  |
|         coat_lite_mini          | 1  | 39.6945  |
|           dm_nfnet_f0           | 1  |  39.247  |
|          ghostnet_100           | 1  | 38.4214  |
|            tinynet_a            | 1  | 38.1868  |
|          spnasnet_100           | 1  | 38.1579  |
|            fbnetv3_b            | 1  | 37.8081  |
|        sebotnet33ts_256         | 1  | 36.1111  |
|             dla102              | 1  | 34.4196  |
|            nfnet_l0             | 1  |  34.343  |
|        eca_halonext26ts         | 1  | 34.0299  |
|      mobilenetv3_large_100      | 1  |  32.944  |
|        res2net101_26w_4s        | 1  | 32.7057  |
|         crossvit_9_240          | 1  | 31.8976  |
|           volo_d1_224           | 1  | 30.7166  |
|       eca_botnext26ts_256       | 1  | 29.5281  |
|        tnt_s_patch16_224        | 1  | 29.2087  |
|            pit_b_224            | 1  |  28.442  |
|          cspdarknet53           | 1  | 28.4177  |
|           res2next50            | 1  |  27.934  |
|          botnet26t_256          | 1  | 27.4151  |
|        adv_inception_v3         | 1  | 27.2005  |
|           regnety_002           | 1  | 27.1994  |
|          inception_v3           | 1  | 27.0732  |
|       gluon_inception_v3        | 1  | 27.0532  |
|           fbnetc_100            | 1  | 26.9271  |
|           mnasnet_100           | 1  | 24.7638  |
|           convit_base           | 1  | 24.6206  |
|         mobilenetv2_100         | 1  | 24.6136  |
|          convnext_base          | 1  | 24.1569  |
|          gmlp_s16_224           | 1  | 23.7206  |
|           selecsls42b           | 1  | 23.4718  |
|            gernet_l             | 1  | 20.3854  |
|          gmixer_24_224          | 1  |  19.723  |
|        ese_vovnet19b_dw         | 1  | 19.0525  |
| deit_base_distilled_patch16_224 | 1  | 18.8981  |
|        gluon_xception65         | 1  | 18.4347  |
|          mixer_b16_224          | 1  | 17.9788  |
|            repvgg_a2            | 1  | 17.8001  |
|            lcnet_050            | 1  | 17.4862  |
|      beit_base_patch16_224      | 1  | 17.1874  |
|      vit_base_patch16_224       | 1  | 17.1663  |
|          resmlp_12_224          | 1  | 13.6514  |
|        convmixer_768_32         | 0  |   nan    |
|            hrnet_w18            | 0  |   nan    |
|           resnest101e           | 0  |   nan    |
|     swsl_resnext101_32x16d      | 0  |   nan    |
|         visformer_small         | 0  |   nan    |
+---------------------------------+----+----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|             dpn107              | 1  |  0.9625  |
|        ese_vovnet19b_dw         | 1  |  0.9465  |
|            pit_b_224            | 1  |  0.9402  |
|          resmlp_12_224          | 1  |  0.9361  |
| deit_base_distilled_patch16_224 | 1  |  0.9315  |
|      vit_base_patch16_224       | 1  |  0.9313  |
|            gernet_l             | 1  |  0.9282  |
|            repvgg_a2            | 1  |  0.9267  |
|          cspdarknet53           | 1  |  0.9266  |
|            lcnet_050            | 1  |  0.924   |
|          mixer_b16_224          | 1  |  0.9236  |
|           convit_base           | 1  |  0.9232  |
|      beit_base_patch16_224      | 1  |  0.9217  |
|          botnet26t_256          | 1  |  0.9201  |
|         coat_lite_mini          | 1  |   0.92   |
|       eca_botnext26ts_256       | 1  |  0.9086  |
|            nfnet_l0             | 1  |  0.9076  |
|           rexnet_100            | 1  |  0.9064  |
|           mnasnet_100           | 1  |  0.9061  |
|          convnext_base          | 1  |  0.9027  |
|        eca_halonext26ts         | 1  |  0.8989  |
|         mobilenetv2_100         | 1  |  0.8979  |
|          ghostnet_100           | 1  |  0.8957  |
|          cait_m36_384           | 1  |  0.8954  |
|           regnety_002           | 1  |  0.8937  |
|        sebotnet33ts_256         | 1  |  0.8895  |
|           fbnetc_100            | 1  |  0.884   |
|          spnasnet_100           | 1  |  0.8834  |
|            mixnet_l             | 1  |  0.8826  |
|           tf_mixnet_l           | 1  |  0.8784  |
|      mobilenetv3_large_100      | 1  |  0.8757  |
|      xcit_large_24_p8_224       | 1  |  0.8743  |
|         crossvit_9_240          | 1  |  0.8711  |
|           mobilevit_s           | 1  |  0.8643  |
|       tf_efficientnet_b0        | 1  |  0.8635  |
|           dm_nfnet_f0           | 1  |  0.8582  |
|          gmixer_24_224          | 1  |  0.8422  |
|            tinynet_a            | 1  |  0.842   |
|       gluon_inception_v3        | 1  |  0.8373  |
|        adv_inception_v3         | 1  |  0.8373  |
|          inception_v3           | 1  |  0.837   |
|  swin_base_patch4_window7_224   | 1  |  0.8339  |
|           res2next50            | 1  |  0.8255  |
|           selecsls42b           | 1  |  0.8204  |
|        res2net50_14w_8s         | 1  |  0.8045  |
|        tnt_s_patch16_224        | 1  |  0.803   |
|             dla102              | 1  |  0.8019  |
|        gluon_xception65         | 1  |  0.7932  |
|           volo_d1_224           | 1  |  0.7754  |
|            fbnetv3_b            | 1  |  0.7697  |
|          jx_nest_base           | 1  |  0.7662  |
|        res2net101_26w_4s        | 1  |  0.7615  |
|          pnasnet5large          | 1  |  0.7516  |
|          gmlp_s16_224           | 1  |  0.7425  |
|         poolformer_m36          | 1  |  0.6931  |
|        twins_pcpvt_base         | 1  |  0.6234  |
|        convmixer_768_32         | 0  |   nan    |
|            hrnet_w18            | 0  |   nan    |
|           resnest101e           | 0  |   nan    |
|     swsl_resnext101_32x16d      | 0  |   nan    |
|         visformer_small         | 0  |   nan    |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|          cait_m36_384           | 1  | 5000.4837 |
|      xcit_large_24_p8_224       | 1  | 2106.4201 |
|          pnasnet5large          | 1  | 519.3834  |
|          jx_nest_base           | 1  | 493.5201  |
|           dm_nfnet_f0           | 1  | 428.9768  |
|           convit_base           | 1  | 396.2937  |
|  swin_base_patch4_window7_224   | 1  | 373.1167  |
|          convnext_base          | 1  | 363.4278  |
|            pit_b_224            | 1  | 340.8231  |
|      vit_base_patch16_224       | 1  | 325.3479  |
|      beit_base_patch16_224      | 1  | 317.7918  |
| deit_base_distilled_patch16_224 | 1  | 312.6354  |
|         poolformer_m36          | 1  | 298.2161  |
|             dpn107              | 1  | 287.9234  |
|             dla102              | 1  | 270.2644  |
|            nfnet_l0             | 1  | 258.4369  |
|          mixer_b16_224          | 1  | 247.0671  |
|        gluon_xception65         | 1  | 241.2717  |
|        twins_pcpvt_base         | 1  | 224.3239  |
|           volo_d1_224           | 1  | 201.5605  |
|          gmlp_s16_224           | 1  |  200.212  |
|        tnt_s_patch16_224        | 1  | 185.3776  |
|        sebotnet33ts_256         | 1  | 184.1834  |
|          cspdarknet53           | 1  | 181.7837  |
|        res2net101_26w_4s        | 1  | 178.8889  |
|        res2net50_14w_8s         | 1  | 165.3656  |
|           res2next50            | 1  | 161.0902  |
|            repvgg_a2            | 1  | 161.0307  |
|       gluon_inception_v3        | 1  | 160.1412  |
|          inception_v3           | 1  | 159.7522  |
|        adv_inception_v3         | 1  | 159.5323  |
|           mobilevit_s           | 1  | 149.5235  |
|           selecsls42b           | 1  | 135.0448  |
|        eca_halonext26ts         | 1  | 122.3759  |
|       eca_botnext26ts_256       | 1  | 119.8172  |
|          gmixer_24_224          | 1  | 109.4654  |
|         coat_lite_mini          | 1  | 103.9586  |
|           tf_mixnet_l           | 1  |  97.0654  |
|            gernet_l             | 1  |  93.7525  |
|            mixnet_l             | 1  |  90.0455  |
|          botnet26t_256          | 1  |  89.0058  |
|         crossvit_9_240          | 1  |  71.6891  |
|       tf_efficientnet_b0        | 1  |  70.8574  |
|          resmlp_12_224          | 1  |  67.9699  |
|           rexnet_100            | 1  |  54.5953  |
|            tinynet_a            | 1  |  52.4606  |
|            fbnetv3_b            | 1  |  46.5291  |
|        ese_vovnet19b_dw         | 1  |  44.5556  |
|          ghostnet_100           | 1  |  34.2241  |
|           fbnetc_100            | 1  |  28.5216  |
|         mobilenetv2_100         | 1  |  27.3889  |
|          spnasnet_100           | 1  |  26.9365  |
|           mnasnet_100           | 1  |  24.4007  |
|      mobilenetv3_large_100      | 1  |  23.7384  |
|           regnety_002           | 1  |  15.3487  |
|            lcnet_050            | 1  |  8.0231   |
|        convmixer_768_32         | 0  |    nan    |
|            hrnet_w18            | 0  |    nan    |
|           resnest101e           | 0  |    nan    |
|     swsl_resnext101_32x16d      | 0  |    nan    |
|         visformer_small         | 0  |    nan    |
+---------------------------------+----+-----------+

@ESI-SYD
Copy link

ESI-SYD commented Nov 18, 2022

Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2022-11-16 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information

SW Nightly commit Master/Main commit
Pytorch 0662e90 e2f0648
Torchbench / 022dfe3
torchaudio 4b10b6a 74f9a89
torchtext 71e4561 c047efe
torchvision 797e1ac ffd5a56

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 20.04.5 LTS
Kernel 5.15.0-1022-aws
Microcode 0xd000331
GCC gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
GLIBC ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
Binutils GNU ld (GNU Binutils for Ubuntu) 2.34
Python Python 3.8.13
OpenSSL OpenSSL 1.1.1s 1 Nov 2022

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 93%, 51/55 | 100%, 44/44 | 95%, 58/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.09x    |    1.06x    |    1.09x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   13.75    |    16.98    |    21.17    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.94x    |    0.95x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|         soft_actor_critic         | 256  |  1.6252  |
|           squeezenet1_1           |  16  |  1.5821  |
|           timm_resnest            |  32  |  1.3733  |
|        mobilenet_v3_large         |  32  |  1.3718  |
|        shufflenet_v2_x1_0         |  64  |  1.3643  |
|            mnasnet1_0             |  32  |  1.3223  |
|           mobilenet_v2            |  16  |  1.2357  |
|             resnet18              |  8   |  1.2326  |
|            densenet121            |  64  |  1.2309  |
|           pytorch_unet            |  1   |  1.186   |
|             resnet50              |  32  |  1.1807  |
|          resnext50_32x4d          |  8   |  1.1735  |
|             resnet152             |  32  |  1.1562  |
|              alexnet              | 128  |  1.1477  |
|            timm_vovnet            |  32  |  1.1097  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.1049  |
|        Background_Matting         |  1   |  1.0849  |
|          vision_maskrcnn          |  1   |  1.0788  |
|               vgg16               |  4   |  1.0784  |
|            Super_SloMo            |  6   |  1.0687  |
|              yolov3               |  8   |  1.0684  |
|               dlrm                | 2048 |  1.0652  |
|            hf_T5_large            |  1   |  1.0443  |
|            timm_regnet            |  32  |  1.0366  |
|           BERT_pytorch            |  2   |  1.0277  |
|          pytorch_stargan          |  16  |  1.026   |
|               dcgan               | 256  |  1.0183  |
|            tts_angular            |  64  |  1.0072  |
|              demucs               |  1   |  1.0036  |
|          LearningToPaint          |  96  |  0.9895  |
|            hf_BigBird             |  1   |  0.9858  |
|            hf_Reformer            |  1   |  0.9762  |
|           hf_Longformer           |  1   |  0.9692  |
|                drq                |  1   |  0.9596  |
|               hf_T5               |  1   |  0.9589  |
|           hf_GPT2_large           |  1   |  0.9334  |
|            timm_nfnet             | 128  |  0.9175  |
|        speech_transformer         |  1   |  0.9138  |
|              hf_GPT2              |  1   |  0.9076  |
|   timm_vision_transformer_large   |  8   |  0.9039  |
|       functorch_dp_cifar10        |  64  |  0.8978  |
|             hf_Albert             |  1   |  0.884   |
|           hf_DistilBert           |  1   |  0.8839  |
|           fastNLP_Bert            |  1   |  0.8578  |
|              hf_Bart              |  1   |  0.8356  |
|      nvidia_deeprecommender       | 256  |  0.8324  |
|            hf_T5_base             |  1   |  0.8256  |
|         timm_efficientnet         |  64  |  0.7539  |
|      timm_vision_transformer      |  8   |  0.7304  |
| attention_is_all_you_need_pytorch |  32  |  0.7104  |
|              hf_Bert              |  1   |  0.7102  |
|           lennard_jones           | 1000 |  0.3472  |
|     detectron2_fcos_r_50_fpn      |  0   |   0.0    |
|    mobilenet_v2_quantized_qat     |  0   |   0.0    |
|      resnet50_quantized_qat       |  0   |   0.0    |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  2  | pass_due_to_skip |
|           hf_GPT2_large           |  2  | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip |
|           fastNLP_Bert            |  2  |       pass       |
|             hf_Albert             |  2  |       pass       |
|          LearningToPaint          |  2  |       pass       |
|            Super_SloMo            |  2  |       pass       |
|              alexnet              |  2  |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |
|               dcgan               |  2  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  2  |       pass       |
|               dlrm                |  2  |       pass       |
|                drq                |  1  |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |
|              hf_Bart              |  2  |       pass       |
|              hf_Bert              |  2  |       pass       |
|            hf_BigBird             |  2  |       pass       |
|           hf_DistilBert           |  2  |       pass       |
|              hf_GPT2              |  2  |       pass       |
|           hf_Longformer           |  2  |       pass       |
|            hf_Reformer            |  2  |       pass       |
|               hf_T5               |  2  |       pass       |
|            hf_T5_base             |  2  |       pass       |
|              yolov3               |  2  |       pass       |
|           lennard_jones           |  2  |       pass       |
|            mnasnet1_0             |  2  |       pass       |
|           mobilenet_v2            |  2  |       pass       |
|          resnext50_32x4d          |  2  |       pass       |
|           BERT_pytorch            |  2  |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |
|        mobilenet_v3_large         |  2  |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|           pytorch_unet            |  2  |       pass       |
|             resnet152             |  2  |       pass       |
|             resnet18              |  2  |       pass       |
|               vgg16               |  2  |       pass       |
|             resnet50              |  2  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|        Background_Matting         |  1  |       pass       |
|        speech_transformer         |  2  |       pass       |
|           squeezenet1_1           |  2  |       pass       |
|         timm_efficientnet         |  2  |       pass       |
|            timm_nfnet             |  2  |       pass       |
|            timm_regnet            |  2  |       pass       |
|           timm_resnest            |  2  |       pass       |
|      timm_vision_transformer      |  2  |       pass       |
|            timm_vovnet            |  2  |       pass       |
|            tts_angular            |  2  |       pass       |
|    mobilenet_v2_quantized_qat     |  2  |   fail_to_run    |
|      resnet50_quantized_qat       |  2  |   fail_to_run    |
|     detectron2_fcos_r_50_fpn      |  2  |   fail_to_run    |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|           BERT_pytorch            |  2   | 45.3345  |
|          vision_maskrcnn          |  1   | 37.3994  |
|            hf_T5_large            |  1   | 35.8439  |
|           hf_GPT2_large           |  1   | 23.8375  |
|   timm_vision_transformer_large   |  8   | 23.6431  |
|            timm_nfnet             | 128  | 22.9685  |
|            hf_T5_base             |  1   | 22.7697  |
|            hf_BigBird             |  1   | 22.2036  |
|           hf_Longformer           |  1   | 22.1697  |
|            densenet121            |  64  | 21.2738  |
|              yolov3               |  8   | 20.0725  |
|          pytorch_stargan          |  16  | 18.7711  |
|        speech_transformer         |  1   |  17.199  |
|            Super_SloMo            |  6   | 16.1839  |
|             resnet152             |  32  | 15.1095  |
|              hf_Bart              |  1   | 14.1614  |
|            timm_regnet            |  32  | 14.1071  |
|            hf_Reformer            |  1   | 13.6092  |
|         timm_efficientnet         |  64  | 13.5555  |
|           fastNLP_Bert            |  1   | 13.4595  |
| attention_is_all_you_need_pytorch |  32  | 13.2475  |
|               hf_T5               |  1   |  13.152  |
|              hf_Bert              |  1   | 12.6097  |
|              hf_GPT2              |  1   | 11.9871  |
|      timm_vision_transformer      |  8   | 11.7748  |
|             hf_Albert             |  1   | 11.6249  |
|            timm_vovnet            |  32  | 11.2489  |
|        Background_Matting         |  1   | 11.1389  |
|        shufflenet_v2_x1_0         |  64  | 10.6221  |
|             resnet50              |  32  | 10.1995  |
|           mobilenet_v2            |  16  | 10.1357  |
|           hf_DistilBert           |  1   | 10.1328  |
|        mobilenet_v3_large         |  32  | 10.0824  |
|          resnext50_32x4d          |  8   |  9.8654  |
|       functorch_dp_cifar10        |  64  |  9.5765  |
|           timm_resnest            |  32  |  9.5476  |
|           pytorch_unet            |  1   |  9.4959  |
|            mnasnet1_0             |  32  |  9.4026  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  9.2776  |
|          LearningToPaint          |  96  |  8.8359  |
|             resnet18              |  8   |  8.5035  |
|           squeezenet1_1           |  16  |  8.4414  |
|      nvidia_deeprecommender       | 256  |  8.3935  |
|               vgg16               |  4   |  8.3262  |
|               dlrm                | 2048 |  8.2111  |
|              alexnet              | 128  |  8.2034  |
|            tts_angular            |  64  |  8.1786  |
|                drq                |  1   |  7.9311  |
|               dcgan               | 256  |  7.8473  |
|         soft_actor_critic         | 256  |  7.7683  |
|           lennard_jones           | 1000 |  7.7577  |
|              demucs               |  1   |  1.4649  |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |
|    mobilenet_v2_quantized_qat     |  0   |   nan    |
|      resnet50_quantized_qat       |  0   |   nan    |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|              demucs               |  1   |  0.9986  |
|               dlrm                | 2048 |  0.9976  |
|        Background_Matting         |  1   |  0.9949  |
|               vgg16               |  4   |  0.9928  |
|            Super_SloMo            |  6   |  0.9917  |
|          LearningToPaint          |  96  |  0.9907  |
|            densenet121            |  64  |  0.9902  |
|             resnet152             |  32  |   0.99   |
|            hf_BigBird             |  1   |  0.9898  |
|            mnasnet1_0             |  32  |  0.9897  |
|            timm_nfnet             | 128  |  0.9886  |
|             resnet50              |  32  |  0.9885  |
|           pytorch_unet            |  1   |  0.9877  |
|        mobilenet_v3_large         |  32  |  0.9873  |
|                drq                |  1   |  0.9863  |
|              yolov3               |  8   |  0.9854  |
|           mobilenet_v2            |  16  |  0.9851  |
|           lennard_jones           | 1000 |  0.984   |
|        shufflenet_v2_x1_0         |  64  |  0.9827  |
|            timm_vovnet            |  32  |  0.9817  |
|         soft_actor_critic         | 256  |  0.9806  |
|            tts_angular            |  64  |  0.9791  |
|          vision_maskrcnn          |  1   |  0.9777  |
|          resnext50_32x4d          |  8   |  0.9776  |
|           squeezenet1_1           |  16  |  0.9776  |
|        speech_transformer         |  1   |  0.9731  |
|           hf_GPT2_large           |  1   |  0.9702  |
|           hf_DistilBert           |  1   |  0.9689  |
|         timm_efficientnet         |  64  |  0.9623  |
|             hf_Albert             |  1   |  0.9621  |
|   timm_vision_transformer_large   |  8   |  0.9611  |
|          pytorch_stargan          |  16  |  0.959   |
|              hf_Bart              |  1   |  0.9583  |
|             resnet18              |  8   |  0.9572  |
|            timm_regnet            |  32  |  0.956   |
|              hf_GPT2              |  1   |  0.9545  |
| attention_is_all_you_need_pytorch |  32  |  0.9534  |
|           BERT_pytorch            |  2   |  0.9524  |
|            hf_T5_base             |  1   |  0.939   |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.9369  |
|            hf_Reformer            |  1   |  0.933   |
|              alexnet              | 128  |  0.9299  |
|      timm_vision_transformer      |  8   |  0.9081  |
|           fastNLP_Bert            |  1   |  0.9071  |
|       functorch_dp_cifar10        |  64  |  0.8914  |
|           timm_resnest            |  32  |  0.8874  |
|               dcgan               | 256  |  0.8806  |
|           hf_Longformer           |  1   |  0.8593  |
|              hf_Bert              |  1   |  0.7947  |
|               hf_T5               |  1   |  0.765   |
|            hf_T5_large            |  1   |  0.6635  |
|      nvidia_deeprecommender       | 256  |  0.5843  |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |
|    mobilenet_v2_quantized_qat     |  0   |   nan    |
|      resnet50_quantized_qat       |  0   |   nan    |
+-----------------------------------+------+----------+

Absolute latency (ms)

+-----------------------------------+------+-----------+
|               name                |  bs  | inductor  |
+-----------------------------------+------+-----------+
|   timm_vision_transformer_large   |  8   | 1422.3969 |
|            hf_T5_base             |  1   | 1206.0799 |
|            timm_nfnet             | 128  | 1060.8077 |
|           hf_GPT2_large           |  1   | 877.9491  |
|            Super_SloMo            |  6   | 638.0393  |
|            hf_T5_large            |  1   | 448.5525  |
|          vision_maskrcnn          |  1   | 383.4892  |
|            timm_regnet            |  32  | 382.0405  |
|             resnet152             |  32  | 309.6735  |
|            densenet121            |  64  | 288.5369  |
|        Background_Matting         |  1   | 265.7255  |
|          pytorch_stargan          |  16  | 246.0795  |
|              yolov3               |  8   | 233.6774  |
|           pytorch_unet            |  1   | 223.4771  |
|            hf_BigBird             |  1   | 199.5483  |
|              demucs               |  1   | 192.0452  |
|         timm_efficientnet         |  64  | 184.3108  |
|            timm_vovnet            |  32  | 165.5849  |
|             resnet50              |  32  | 133.4325  |
|           hf_Longformer           |  1   |  89.3846  |
|           timm_resnest            |  32  |  85.8821  |
|              hf_Bart              |  1   |  84.5563  |
|              hf_Bert              |  1   |  79.6488  |
|            tts_angular            |  64  |  59.9619  |
|        speech_transformer         |  1   |  59.4359  |
|             hf_Albert             |  1   |  57.6235  |
|           fastNLP_Bert            |  1   |  57.2893  |
|              alexnet              | 128  |  56.4508  |
| attention_is_all_you_need_pytorch |  32  |  55.1287  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  53.7583  |
|          LearningToPaint          |  96  |  51.7449  |
|               hf_T5               |  1   |  49.0007  |
|      timm_vision_transformer      |  8   |  48.6971  |
|              hf_GPT2              |  1   |  44.2553  |
|      nvidia_deeprecommender       | 256  |  42.8203  |
|               vgg16               |  4   |  40.8964  |
|           hf_DistilBert           |  1   |  37.3271  |
|          resnext50_32x4d          |  8   |  35.4202  |
|            hf_Reformer            |  1   |  34.1554  |
|           BERT_pytorch            |  2   |  31.4222  |
|            mnasnet1_0             |  32  |  28.8491  |
|        shufflenet_v2_x1_0         |  64  |  26.5383  |
|        mobilenet_v3_large         |  32  |  25.7928  |
|               dcgan               | 256  |  24.0592  |
|           mobilenet_v2            |  16  |  18.8528  |
|             resnet18              |  8   |  12.4553  |
|           squeezenet1_1           |  16  |  9.9098   |
|               dlrm                | 2048 |  6.7473   |
|       functorch_dp_cifar10        |  64  |  6.3668   |
|         soft_actor_critic         | 256  |  1.2018   |
|                drq                |  1   |  0.9108   |
|           lennard_jones           | 1000 |  0.5708   |
|     detectron2_fcos_r_50_fpn      |  0   |    nan    |
|    mobilenet_v2_quantized_qat     |  0   |    nan    |
|      resnet50_quantized_qat       |  0   |    nan    |
+-----------------------------------+------+-----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       MT5ForConditionalGeneration       |  8  |  2.5028  |
|            XLNetLMHeadModel             | 32  |  2.0138  |
|     MobileBertForQuestionAnswering      | 64  |  1.3572  |
|            YituTechConvBert             |  1  |  1.2789  |
|               DistillGPT2               |  1  |  1.2715  |
|          MobileBertForMaskedLM          | 32  |  1.125   |
|     M2M100ForConditionalGeneration      |  8  |  1.0679  |
|             XGLMForCausalLM             |  8  |  1.0595  |
|          AllenaiLongformerBase          |  1  |  1.0311  |
|       AlbertForQuestionAnswering        |  4  |  1.0261  |
|            AlbertForMaskedLM            |  4  |  1.0253  |
|                CamemBert                |  1  |  1.0154  |
|             OPTForCausalLM              | 32  |  0.9986  |
|                 T5Small                 |  1  |  0.9883  |
|           DebertaForMaskedLM            |  4  |  0.9875  |
|                 BigBird                 |  1  |  0.9837  |
|               GoogleFnet                |  1  |  0.9808  |
|      GPT2ForSequenceClassification      |  4  |  0.9491  |
|       DebertaForQuestionAnswering       |  8  |  0.9396  |
|         Speech2Text2ForCausalLM         | 128 |  0.9159  |
|     PLBartForConditionalGeneration      | 16  |  0.9126  |
|           RobertaForCausalLM            | 64  |  0.9119  |
|           ElectraForCausalLM            | 32  |  0.9106  |
|    LayoutLMForSequenceClassification    | 16  |  0.9102  |
|    MegatronBertForQuestionAnswering     | 16  |  0.9098  |
|     PegasusForConditionalGeneration     | 16  |  0.9068  |
|         MegatronBertForCausalLM         | 16  |  0.9059  |
|      MBartForConditionalGeneration      | 16  |  0.9038  |
|           LayoutLMForMaskedLM           | 16  |  0.8947  |
|     DistilBertForQuestionAnswering      | 64  |  0.8897  |
|       RobertaForQuestionAnswering       | 128 |  0.8895  |
|            MBartForCausalLM             | 32  |  0.8871  |
|        BertForQuestionAnswering         | 128 |  0.8853  |
|           PegasusForCausalLM            | 32  |  0.8853  |
|            PLBartForCausalLM            | 32  |  0.8813  |
|             BertForMaskedLM             | 64  |  0.8746  |
|       ElectraForQuestionAnswering       | 64  |  0.8578  |
|          DistilBertForMaskedLM          | 64  |  0.8569  |
|       T5ForConditionalGeneration        |  4  |  0.8568  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.8282  |
|             BartForCausalLM             |  4  |  0.8269  |
|      BartForConditionalGeneration       |  2  |  0.8117  |
|       BlenderbotSmallForCausalLM        | 64  |  0.7943  |
|            TrOCRForCausalLM             | 32  |  0.789   |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|          MobileBertForMaskedLM          | 32  | 25.2372  |
|     PegasusForConditionalGeneration     | 16  | 24.4352  |
|     MobileBertForQuestionAnswering      | 64  | 24.4324  |
|          AllenaiLongformerBase          |  1  | 23.6894  |
|     M2M100ForConditionalGeneration      |  8  | 23.5087  |
|             OPTForCausalLM              | 32  | 22.7762  |
|      BartForConditionalGeneration       |  2  | 22.7697  |
|           DebertaForMaskedLM            |  4  |  22.595  |
|      MBartForConditionalGeneration      | 16  | 21.9287  |
|       DebertaForQuestionAnswering       |  8  | 21.5273  |
|             XGLMForCausalLM             |  8  | 20.8673  |
|                 BigBird                 |  1  | 19.9053  |
|         MegatronBertForCausalLM         | 16  | 19.6555  |
|          DistilBertForMaskedLM          | 64  | 19.2973  |
|    MegatronBertForQuestionAnswering     | 16  | 18.7873  |
| BlenderbotSmallForConditionalGeneration | 64  | 18.2002  |
|            YituTechConvBert             |  1  | 18.0638  |
|           PegasusForCausalLM            | 32  | 16.5992  |
|            AlbertForMaskedLM            |  4  | 16.4087  |
|       AlbertForQuestionAnswering        |  4  | 16.0927  |
|       MT5ForConditionalGeneration       |  8  | 15.9537  |
|       RobertaForQuestionAnswering       | 128 |  15.901  |
|        BertForQuestionAnswering         | 128 | 15.7772  |
|           LayoutLMForMaskedLM           | 16  | 15.4624  |
|       T5ForConditionalGeneration        |  4  | 15.4344  |
|             BartForCausalLM             |  4  | 15.2338  |
|           RobertaForCausalLM            | 64  | 15.2191  |
|           ElectraForCausalLM            | 32  | 15.1027  |
|       ElectraForQuestionAnswering       | 64  | 15.0604  |
|     PLBartForConditionalGeneration      | 16  | 15.0183  |
|             BertForMaskedLM             | 64  | 14.8992  |
|    LayoutLMForSequenceClassification    | 16  | 14.8256  |
|            TrOCRForCausalLM             | 32  | 14.5027  |
|            MBartForCausalLM             | 32  | 14.3014  |
|      GPT2ForSequenceClassification      |  4  | 14.2973  |
|                 T5Small                 |  1  | 13.4727  |
|                CamemBert                |  1  | 12.9114  |
|               GoogleFnet                |  1  |  12.517  |
|       BlenderbotSmallForCausalLM        | 64  | 12.3886  |
|         Speech2Text2ForCausalLM         | 128 |  12.212  |
|            PLBartForCausalLM            | 32  | 11.0534  |
|     DistilBertForQuestionAnswering      | 64  | 10.9706  |
|               DistillGPT2               |  1  | 10.0647  |
|            XLNetLMHeadModel             | 32  |  7.6572  |
+-----------------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  |  0.9964  |
|       AlbertForQuestionAnswering        |  4  |  0.9962  |
|             BartForCausalLM             |  4  |  0.9933  |
|       BlenderbotSmallForCausalLM        | 64  |  0.9931  |
|          DistilBertForMaskedLM          | 64  |  0.9928  |
|           ElectraForCausalLM            | 32  |  0.9925  |
|           RobertaForCausalLM            | 64  |  0.9923  |
|           DebertaForMaskedLM            |  4  |  0.9918  |
|       ElectraForQuestionAnswering       | 64  |  0.9915  |
|      GPT2ForSequenceClassification      |  4  |  0.9913  |
|             BertForMaskedLM             | 64  |  0.9912  |
|         Speech2Text2ForCausalLM         | 128 |  0.9908  |
|            PLBartForCausalLM            | 32  |  0.9901  |
|            TrOCRForCausalLM             | 32  |   0.99   |
|           PegasusForCausalLM            | 32  |  0.9898  |
|     DistilBertForQuestionAnswering      | 64  |  0.9896  |
|                 BigBird                 |  1  |  0.9892  |
|            MBartForCausalLM             | 32  |  0.9891  |
|             OPTForCausalLM              | 32  |  0.9891  |
|               GoogleFnet                |  1  |  0.9887  |
|        BertForQuestionAnswering         | 128 |  0.9886  |
|       RobertaForQuestionAnswering       | 128 |  0.9886  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.9868  |
|     PegasusForConditionalGeneration     | 16  |  0.9851  |
|           LayoutLMForMaskedLM           | 16  |  0.9841  |
|       DebertaForQuestionAnswering       |  8  |  0.9836  |
|    LayoutLMForSequenceClassification    | 16  |  0.9811  |
|      BartForConditionalGeneration       |  2  |  0.978   |
|     PLBartForConditionalGeneration      | 16  |  0.9755  |
|       T5ForConditionalGeneration        |  4  |  0.9745  |
|               DistillGPT2               |  1  |  0.973   |
|            XLNetLMHeadModel             | 32  |  0.9711  |
|      MBartForConditionalGeneration      | 16  |  0.9687  |
|                CamemBert                |  1  |  0.9644  |
|             XGLMForCausalLM             |  8  |  0.9545  |
|         MegatronBertForCausalLM         | 16  |  0.9496  |
|     M2M100ForConditionalGeneration      |  8  |  0.9458  |
|          MobileBertForMaskedLM          | 32  |  0.9435  |
|          AllenaiLongformerBase          |  1  |  0.9049  |
|                 T5Small                 |  1  |  0.824   |
|            YituTechConvBert             |  1  |  0.7912  |
|     MobileBertForQuestionAnswering      | 64  |  0.7786  |
|    MegatronBertForQuestionAnswering     | 16  |  0.6873  |
|       MT5ForConditionalGeneration       |  8  |  0.6623  |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|            XLNetLMHeadModel             | 32  | 5994.3722 |
|            AlbertForMaskedLM            |  4  | 2922.6433 |
|       AlbertForQuestionAnswering        |  4  | 2902.1749 |
|        BertForQuestionAnswering         | 128 | 1092.4142 |
|       RobertaForQuestionAnswering       | 128 | 1066.0875 |
|      BartForConditionalGeneration       |  2  | 1003.2364 |
|             BartForCausalLM             |  4  | 848.5397  |
|           LayoutLMForMaskedLM           | 16  | 836.0798  |
| BlenderbotSmallForConditionalGeneration | 64  | 685.6462  |
|    LayoutLMForSequenceClassification    | 16  |  680.853  |
|             BertForMaskedLM             | 64  |  678.306  |
|           RobertaForCausalLM            | 64  | 659.2635  |
|       ElectraForQuestionAnswering       | 64  | 653.3479  |
|            TrOCRForCausalLM             | 32  | 637.5153  |
|      MBartForConditionalGeneration      | 16  | 617.9418  |
|     PegasusForConditionalGeneration     | 16  | 613.0034  |
|            MBartForCausalLM             | 32  | 578.3349  |
|           PegasusForCausalLM            | 32  | 577.1994  |
|      GPT2ForSequenceClassification      |  4  | 512.4605  |
|         MegatronBertForCausalLM         | 16  | 504.3404  |
|           ElectraForCausalLM            | 32  | 493.0583  |
|       T5ForConditionalGeneration        |  4  |  466.512  |
|             XGLMForCausalLM             |  8  | 459.3132  |
|    MegatronBertForQuestionAnswering     | 16  | 455.0356  |
|          DistilBertForMaskedLM          | 64  | 418.8048  |
|       BlenderbotSmallForCausalLM        | 64  |  390.33   |
|     M2M100ForConditionalGeneration      |  8  | 371.0493  |
|             OPTForCausalLM              | 32  | 343.9418  |
|       DebertaForQuestionAnswering       |  8  | 333.8932  |
|     DistilBertForQuestionAnswering      | 64  | 259.0878  |
|       MT5ForConditionalGeneration       |  8  |  252.337  |
|            PLBartForCausalLM            | 32  | 252.0114  |
|           DebertaForMaskedLM            |  4  | 246.1688  |
|                 BigBird                 |  1  | 230.7581  |
|     PLBartForConditionalGeneration      | 16  | 225.1841  |
|          AllenaiLongformerBase          |  1  |  169.836  |
|         Speech2Text2ForCausalLM         | 128 | 166.7482  |
|          MobileBertForMaskedLM          | 32  | 147.1068  |
|     MobileBertForQuestionAnswering      | 64  | 133.1138  |
|                 T5Small                 |  1  | 125.0146  |
|            YituTechConvBert             |  1  |  69.9418  |
|                CamemBert                |  1  |  68.9437  |
|               GoogleFnet                |  1  |  51.2721  |
|               DistillGPT2               |  1  |  46.0628  |
+-----------------------------------------+-----+-----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  |  1.428   |
|           res2next50            | 128 |  1.3149  |
|          inception_v3           | 128 |  1.2957  |
|        res2net50_14w_8s         | 128 |  1.2881  |
|       gluon_inception_v3        | 128 |  1.2857  |
|        adv_inception_v3         | 128 |  1.2816  |
|          ghostnet_100           | 128 |  1.2418  |
|        res2net101_26w_4s        | 64  |  1.2204  |
|             dla102              | 128 |  1.2104  |
|            hrnet_w18            | 128 |  1.2092  |
|        gluon_xception65         | 32  |  1.2061  |
|           volo_d1_224           | 64  |  1.2018  |
|           fbnetc_100            | 128 |  1.1962  |
|          botnet26t_256          | 128 |   1.19   |
|        convmixer_768_32         | 32  |  1.185   |
|           mnasnet_100           | 128 |  1.177   |
|          spnasnet_100           | 128 |  1.169   |
|      mobilenetv3_large_100      | 128 |  1.1684  |
|        ese_vovnet19b_dw         | 128 |  1.1673  |
|           selecsls42b           | 128 |  1.1524  |
|             dpn107              | 32  |  1.1516  |
|            lcnet_050            | 128 |  1.1462  |
|            repvgg_a2            | 128 |  1.1177  |
|            gernet_l             | 128 |  1.1166  |
|           resnest101e           | 64  |  1.098   |
|            fbnetv3_b            | 128 |  1.0851  |
|         mobilenetv2_100         | 128 |  1.0691  |
|         visformer_small         | 128 |  1.0611  |
|          cspdarknet53           | 64  |  1.0599  |
|           regnety_002           | 128 |  1.0506  |
|           tf_mixnet_l           | 128 |  1.0345  |
|          gmixer_24_224          | 128 |  1.0061  |
|      xcit_large_24_p8_224       |  5  |  1.0038  |
|            mixnet_l             | 128 |  0.988   |
|          cait_m36_384           |  4  |  0.9747  |
|           dm_nfnet_f0           | 128 |  0.9249  |
|        eca_halonext26ts         | 128 |  0.9112  |
|       eca_botnext26ts_256       | 128 |  0.9037  |
|        sebotnet33ts_256         | 64  |   0.89   |
|  swin_base_patch4_window7_224   | 64  |  0.8812  |
|      beit_base_patch16_224      | 64  |  0.8688  |
|            nfnet_l0             | 128 |  0.8651  |
| deit_base_distilled_patch16_224 | 64  |  0.8525  |
|      vit_base_patch16_224       | 64  |  0.8456  |
|           convit_base           | 64  |  0.8455  |
|         poolformer_m36          | 64  |  0.8436  |
|          resmlp_12_224          | 128 |  0.8339  |
|          jx_nest_base           | 32  |  0.8315  |
|          convnext_base          | 64  |  0.8298  |
|           rexnet_100            | 128 |  0.8239  |
|       tf_efficientnet_b0        | 128 |  0.8048  |
|        tnt_s_patch16_224        | 128 |  0.7939  |
|          mixer_b16_224          | 128 |  0.7852  |
|            pit_b_224            | 64  |  0.7817  |
|            tinynet_a            | 128 |  0.7681  |
|           mobilevit_s           | 64  |  0.7643  |
|         coat_lite_mini          | 128 |  0.7525  |
|         crossvit_9_240          | 128 |  0.7215  |
|        twins_pcpvt_base         | 64  |  0.6998  |
|          gmlp_s16_224           | 128 |  0.6209  |
|     swsl_resnext101_32x16d      | 32  |  0.0628  |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|          botnet26t_256          | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|        convmixer_768_32         | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|             dpn107              | 2  |     pass      |
|       eca_botnext26ts_256       | 2  |     pass      |
|        eca_halonext26ts         | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|            hrnet_w18            | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|           resnest101e           | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|        sebotnet33ts_256         | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|  swin_base_patch4_window7_224   | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|         visformer_small         | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|           convit_base           | 2  |  fail_to_run  |
|          ghostnet_100           | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|     swsl_resnext101_32x16d      | 32  | 125.8047 |
|          pnasnet5large          | 16  |  43.565  |
|            hrnet_w18            | 128 | 29.4082  |
|  swin_base_patch4_window7_224   | 64  | 28.9151  |
|          cait_m36_384           |  4  |  28.216  |
|       tf_efficientnet_b0        | 128 | 28.1704  |
|        twins_pcpvt_base         | 64  | 27.1524  |
|           tf_mixnet_l           | 128 | 26.7534  |
|           dm_nfnet_f0           | 128 |  26.659  |
|             dla102              | 128 | 25.8052  |
|           mobilevit_s           | 64  | 25.3669  |
|         poolformer_m36          | 64  |  25.07   |
|      xcit_large_24_p8_224       |  5  | 24.4275  |
|            mixnet_l             | 128 | 24.1128  |
|           resnest101e           | 64  | 23.5432  |
|        tnt_s_patch16_224        | 128 | 23.3802  |
|        res2net50_14w_8s         | 128 | 22.9929  |
|        sebotnet33ts_256         | 64  | 22.8585  |
|            nfnet_l0             | 128 |  21.852  |
|            fbnetv3_b            | 128 |  21.561  |
|             dpn107              | 32  | 21.4473  |
|        eca_halonext26ts         | 128 | 21.2382  |
|          gmlp_s16_224           | 128 | 20.9871  |
|          jx_nest_base           | 32  | 20.8849  |
|           rexnet_100            | 128 | 20.4116  |
|        res2net101_26w_4s        | 64  | 20.3678  |
|       eca_botnext26ts_256       | 128 | 19.5379  |
|          convnext_base          | 64  | 19.2767  |
|         coat_lite_mini          | 128 | 18.7639  |
|           volo_d1_224           | 64  | 18.7463  |
|         crossvit_9_240          | 128 | 18.5428  |
|           convit_base           | 64  | 18.2128  |
|            pit_b_224            | 64  | 18.0961  |
|            tinynet_a            | 128 | 17.9042  |
| deit_base_distilled_patch16_224 | 64  | 17.3892  |
|          botnet26t_256          | 128 | 16.9784  |
|          cspdarknet53           | 64  | 16.4149  |
|        adv_inception_v3         | 128 | 16.2581  |
|          inception_v3           | 128 | 16.1081  |
|       gluon_inception_v3        | 128 | 16.0881  |
|           res2next50            | 128 | 15.8701  |
|          mixer_b16_224          | 128 | 15.8593  |
|          gmixer_24_224          | 128 | 15.7285  |
|         visformer_small         | 128 | 15.2266  |
|      beit_base_patch16_224      | 64  | 15.0171  |
|      vit_base_patch16_224       | 64  | 14.6177  |
|         mobilenetv2_100         | 128 | 14.4065  |
|           fbnetc_100            | 128 | 14.3854  |
|          spnasnet_100           | 128 | 14.2649  |
|      mobilenetv3_large_100      | 128 | 14.1979  |
|        convmixer_768_32         | 32  | 14.1311  |
|        gluon_xception65         | 32  |  14.13   |
|          ghostnet_100           | 128 | 13.8001  |
|            gernet_l             | 128 | 13.5367  |
|           mnasnet_100           | 128 | 12.9553  |
|            repvgg_a2            | 128 | 12.9382  |
|           regnety_002           | 128 | 12.7798  |
|        ese_vovnet19b_dw         | 128 | 12.2247  |
|          resmlp_12_224          | 128 | 11.5833  |
|           selecsls42b           | 128 | 10.7187  |
|            lcnet_050            | 128 | 10.4701  |
+---------------------------------+-----+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        ese_vovnet19b_dw         | 128 |  0.9973  |
|           selecsls42b           | 128 |  0.9972  |
|           resnest101e           | 64  |  0.9971  |
|        gluon_xception65         | 32  |  0.9971  |
|           res2next50            | 128 |  0.9968  |
|        eca_halonext26ts         | 128 |  0.9968  |
|       eca_botnext26ts_256       | 128 |  0.9967  |
|       gluon_inception_v3        | 128 |  0.9966  |
|        adv_inception_v3         | 128 |  0.9965  |
|           tf_mixnet_l           | 128 |  0.9965  |
|        res2net50_14w_8s         | 128 |  0.9963  |
|            mixnet_l             | 128 |  0.9963  |
|          botnet26t_256          | 128 |  0.9962  |
|          mixer_b16_224          | 128 |  0.9959  |
|          cspdarknet53           | 64  |  0.9956  |
|            pit_b_224            | 64  |  0.9953  |
|             dla102              | 128 |  0.9951  |
|            hrnet_w18            | 128 |  0.995   |
|          resmlp_12_224          | 128 |  0.995   |
|         coat_lite_mini          | 128 |  0.9949  |
|           convit_base           | 64  |  0.9948  |
|            gernet_l             | 128 |  0.9946  |
|        res2net101_26w_4s        | 64  |  0.9946  |
|           mobilevit_s           | 64  |  0.9945  |
|           dm_nfnet_f0           | 128 |  0.9945  |
|       tf_efficientnet_b0        | 128 |  0.9944  |
|           rexnet_100            | 128 |  0.9942  |
| deit_base_distilled_patch16_224 | 64  |  0.9941  |
|      vit_base_patch16_224       | 64  |  0.9939  |
|          gmixer_24_224          | 128 |  0.9939  |
|        sebotnet33ts_256         | 64  |  0.9938  |
|      beit_base_patch16_224      | 64  |  0.9936  |
|             dpn107              | 32  |  0.9935  |
|         visformer_small         | 128 |  0.9933  |
|        tnt_s_patch16_224        | 128 |  0.9931  |
|            nfnet_l0             | 128 |  0.993   |
|          ghostnet_100           | 128 |  0.9928  |
|         mobilenetv2_100         | 128 |  0.9927  |
|          inception_v3           | 128 |  0.9925  |
|          convnext_base          | 64  |  0.9924  |
|           mnasnet_100           | 128 |  0.9923  |
|            tinynet_a            | 128 |  0.9921  |
|            fbnetv3_b            | 128 |  0.9919  |
|            repvgg_a2            | 128 |  0.9916  |
|      mobilenetv3_large_100      | 128 |  0.9915  |
|           fbnetc_100            | 128 |  0.9914  |
|        convmixer_768_32         | 32  |  0.9912  |
|          spnasnet_100           | 128 |  0.9912  |
|          gmlp_s16_224           | 128 |  0.991   |
|          pnasnet5large          | 16  |  0.991   |
|         crossvit_9_240          | 128 |  0.9904  |
|  swin_base_patch4_window7_224   | 64  |  0.9889  |
|     swsl_resnext101_32x16d      | 32  |  0.9877  |
|            lcnet_050            | 128 |  0.9861  |
|           volo_d1_224           | 64  |  0.9845  |
|           regnety_002           | 128 |  0.9837  |
|         poolformer_m36          | 64  |  0.9817  |
|          jx_nest_base           | 32  |  0.981   |
|          cait_m36_384           |  4  |  0.9783  |
|        twins_pcpvt_base         | 64  |  0.9755  |
|      xcit_large_24_p8_224       |  5  |  0.9709  |
+---------------------------------+-----+----------+

Absolute latency (ms)

+---------------------------------+-----+------------+
|              name               | bs  |  inductor  |
+---------------------------------+-----+------------+
|     swsl_resnext101_32x16d      | 32  | 19530.1615 |
|           resnest101e           | 64  | 3623.9337  |
|           dm_nfnet_f0           | 128 | 1893.5537  |
|            nfnet_l0             | 128 | 1387.5817  |
|          mixer_b16_224          | 128 | 1372.1228  |
|          cait_m36_384           |  4  | 1300.7156  |
|           convit_base           | 64  | 1275.8401  |
|        tnt_s_patch16_224        | 128 |  1158.66   |
|  swin_base_patch4_window7_224   | 64  | 1133.0562  |
|          gmlp_s16_224           | 128 | 1070.7234  |
|          convnext_base          | 64  |  960.0547  |
|            pit_b_224            | 64  |  954.8482  |
|      vit_base_patch16_224       | 64  |  931.6392  |
|             dla102              | 128 |  929.1243  |
|         poolformer_m36          | 64  |  925.6955  |
| deit_base_distilled_patch16_224 | 64  |  918.7619  |
|      beit_base_patch16_224      | 64  |  915.5773  |
|            hrnet_w18            | 128 |  898.8086  |
|        eca_halonext26ts         | 128 |  750.3109  |
|       eca_botnext26ts_256       | 128 |  742.1284  |
|        convmixer_768_32         | 32  |  717.7093  |
|        twins_pcpvt_base         | 64  |  674.5591  |
|          inception_v3           | 128 |  671.6297  |
|       gluon_inception_v3        | 128 |  671.5278  |
|        adv_inception_v3         | 128 |  669.7301  |
|          gmixer_24_224          | 128 |  655.8327  |
|          jx_nest_base           | 32  |  652.3996  |
|        res2net50_14w_8s         | 128 |  636.1703  |
|           res2next50            | 128 |  634.4357  |
|           tf_mixnet_l           | 128 |  622.4799  |
|            repvgg_a2            | 128 |  616.9888  |
|         coat_lite_mini          | 128 |  611.4089  |
|            mixnet_l             | 128 |  607.6028  |
|         visformer_small         | 128 |  601.7692  |
|          botnet26t_256          | 128 |  582.6177  |
|          pnasnet5large          | 16  |  576.5913  |
|           volo_d1_224           | 64  |  574.1977  |
|             dpn107              | 32  |  572.7044  |
|        sebotnet33ts_256         | 64  |  549.5431  |
|      xcit_large_24_p8_224       |  5  |  547.8695  |
|           mobilevit_s           | 64  |  538.4283  |
|        res2net101_26w_4s        | 64  |  529.9529  |
|          cspdarknet53           | 64  |  493.7609  |
|        gluon_xception65         | 32  |  487.5553  |
|            gernet_l             | 128 |  484.6508  |
|         crossvit_9_240          | 128 |  475.0907  |
|       tf_efficientnet_b0        | 128 |  432.611   |
|           rexnet_100            | 128 |  404.8396  |
|          resmlp_12_224          | 128 |  370.2756  |
|        ese_vovnet19b_dw         | 128 |  352.5807  |
|           selecsls42b           | 128 |  351.4123  |
|            fbnetv3_b            | 128 |  346.4762  |
|            tinynet_a            | 128 |  306.0505  |
|         mobilenetv2_100         | 128 |  250.3617  |
|           fbnetc_100            | 128 |  217.3378  |
|          spnasnet_100           | 128 |  195.4492  |
|           mnasnet_100           | 128 |  182.8092  |
|      mobilenetv3_large_100      | 128 |  152.9359  |
|          ghostnet_100           | 128 |  149.1116  |
|           regnety_002           | 128 |  89.2034   |
|            lcnet_050            | 128 |  38.4039   |
+---------------------------------+-----+------------+

@ESI-SYD
Copy link

ESI-SYD commented Nov 18, 2022

Performance Dashboard for float32 precision -- Single-core Single-thread (2022-11-16 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information

SW Nightly commit Master/Main commit
Pytorch 0662e90 e2f0648
Torchbench / 022dfe3
torchaudio 4b10b6a 74f9a89
torchtext 71e4561 c047efe
torchvision 797e1ac ffd5a56

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 20.04.5 LTS
Kernel 5.15.0-1022-aws
Microcode 0xd000331
GCC gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
GLIBC ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
Binutils GNU ld (GNU Binutils for Ubuntu) 2.34
Python Python 3.8.13
OpenSSL OpenSSL 1.1.1s 1 Nov 2022

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 93%, 51/55 | 100%, 44/44 | 95%, 58/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.07x    |    1.03x    |    1.11x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   22.72    |    24.36    |    31.14    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.89x    |    0.88x    |    0.88x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|            hf_Reformer            |  1  |  1.4271  |
|           mobilenet_v2            |  1  |  1.3724  |
|           timm_resnest            |  1  |  1.3189  |
| attention_is_all_you_need_pytorch |  1  |  1.2846  |
|        speech_transformer         |  1  |  1.251   |
|           squeezenet1_1           |  1  |  1.2136  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  1.1842  |
|            timm_nfnet             |  1  |  1.1784  |
|             resnet18              |  1  |  1.1612  |
|         soft_actor_critic         | 256 |  1.1594  |
|          vision_maskrcnn          |  1  |  1.1251  |
|           pytorch_unet            |  1  |  1.1247  |
|              hf_GPT2              |  1  |  1.1184  |
|            timm_vovnet            |  1  |  1.1149  |
|               vgg16               |  1  |  1.0887  |
|        Background_Matting         |  1  |  1.0855  |
|        shufflenet_v2_x1_0         |  1  |  1.0746  |
|              alexnet              |  1  |  1.0736  |
|            mnasnet1_0             |  1  |  1.0654  |
|            densenet121            |  1  |  1.065   |
|            timm_regnet            |  1  |  1.0611  |
|            Super_SloMo            |  1  |  1.0605  |
|              yolov3               |  1  |  1.0592  |
|             resnet50              |  1  |  1.0561  |
|          resnext50_32x4d          |  1  |  1.0498  |
|        mobilenet_v3_large         |  1  |  1.0495  |
|                drq                |  1  |  1.0488  |
|               dlrm                |  1  |  1.0433  |
|          LearningToPaint          |  1  |  1.0197  |
|               dcgan               |  1  |  1.0196  |
|          pytorch_stargan          | 16  |  1.0116  |
|      timm_vision_transformer      |  1  |  1.0066  |
|            tts_angular            |  1  |  1.0008  |
|             resnet152             |  1  |  1.0006  |
|              demucs               |  1  |  0.9998  |
|            hf_BigBird             |  1  |  0.9854  |
|      nvidia_deeprecommender       |  1  |  0.9782  |
|              hf_Bert              |  1  |  0.9751  |
|           hf_DistilBert           |  1  |  0.9618  |
|           BERT_pytorch            |  1  |  0.8705  |
|   timm_vision_transformer_large   |  1  |  0.8504  |
|           hf_Longformer           |  1  |  0.8393  |
|            hf_T5_large            |  1  |  0.7976  |
|         timm_efficientnet         |  1  |  0.7935  |
|           lennard_jones           |  1  |  0.7858  |
|             hf_Albert             |  1  |  0.7802  |
|              hf_Bart              |  1  |  0.747   |
|           hf_GPT2_large           |  1  |  0.7421  |
|           fastNLP_Bert            |  1  |  0.7328  |
|               hf_T5               |  1  |  0.725   |
|            hf_T5_base             |  1  |  0.6576  |
|       functorch_dp_cifar10        |  1  |  0.2274  |
|      resnet50_quantized_qat       |  0  |   0.0    |
|     detectron2_fcos_r_50_fpn      |  0  |   0.0    |
|    mobilenet_v2_quantized_qat     |  0  |   0.0    |
+-----------------------------------+-----+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  1  | pass_due_to_skip |
|           hf_GPT2_large           |  1  | pass_due_to_skip |
|   timm_vision_transformer_large   |  1  | pass_due_to_skip |
|           fastNLP_Bert            |  1  |       pass       |
|             hf_Albert             |  1  |       pass       |
|          LearningToPaint          |  1  |       pass       |
|            Super_SloMo            |  1  |       pass       |
|              alexnet              |  1  |       pass       |
| attention_is_all_you_need_pytorch |  1  |       pass       |
|               dcgan               |  1  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  1  |       pass       |
|               dlrm                |  1  |       pass       |
|                drq                |  1  |       pass       |
|       functorch_dp_cifar10        |  1  |       pass       |
|              hf_Bart              |  1  |       pass       |
|              hf_Bert              |  1  |       pass       |
|            hf_BigBird             |  1  |       pass       |
|           hf_DistilBert           |  1  |       pass       |
|              hf_GPT2              |  1  |       pass       |
|           hf_Longformer           |  1  |       pass       |
|            hf_Reformer            |  1  |       pass       |
|               hf_T5               |  1  |       pass       |
|            hf_T5_base             |  1  |       pass       |
|              yolov3               |  1  |       pass       |
|           lennard_jones           |  1  |       pass       |
|            mnasnet1_0             |  1  |       pass       |
|           mobilenet_v2            |  1  |       pass       |
|          resnext50_32x4d          |  1  |       pass       |
|           BERT_pytorch            |  1  |       pass       |
|        shufflenet_v2_x1_0         |  1  |       pass       |
|        mobilenet_v3_large         |  1  |       pass       |
|      nvidia_deeprecommender       |  1  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|           pytorch_unet            |  1  |       pass       |
|             resnet152             |  1  |       pass       |
|             resnet18              |  1  |       pass       |
|               vgg16               |  1  |       pass       |
|             resnet50              |  1  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|        Background_Matting         |  1  |       pass       |
|        speech_transformer         |  1  |       pass       |
|           squeezenet1_1           |  1  |       pass       |
|         timm_efficientnet         |  1  |       pass       |
|            timm_nfnet             |  1  |       pass       |
|            timm_regnet            |  1  |       pass       |
|           timm_resnest            |  1  |       pass       |
|      timm_vision_transformer      |  1  |       pass       |
|            timm_vovnet            |  1  |       pass       |
|            tts_angular            |  1  |       pass       |
|    mobilenet_v2_quantized_qat     |  1  |   fail_to_run    |
|      resnet50_quantized_qat       |  1  |   fail_to_run    |
|     detectron2_fcos_r_50_fpn      |  1  |   fail_to_run    |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|            hf_T5_base             |  1  |  88.253  |
|           hf_GPT2_large           |  1  | 75.2972  |
|            hf_T5_large            |  1  | 73.7311  |
|            densenet121            |  1  | 58.6129  |
|          vision_maskrcnn          |  1  | 51.5459  |
|            timm_nfnet             |  1  | 38.6469  |
|         timm_efficientnet         |  1  |  37.736  |
|           hf_Longformer           |  1  |  36.419  |
|   timm_vision_transformer_large   |  1  | 34.0175  |
|            Super_SloMo            |  1  | 31.6949  |
|              yolov3               |  1  | 31.2274  |
|            timm_regnet            |  1  | 28.7742  |
|            timm_vovnet            |  1  | 28.6898  |
|           pytorch_unet            |  1  | 28.5344  |
|           BERT_pytorch            |  1  | 27.3763  |
|            hf_Reformer            |  1  | 24.8126  |
|        Background_Matting         |  1  | 24.2711  |
|               hf_T5               |  1  | 23.8201  |
|            hf_BigBird             |  1  | 23.4584  |
|        speech_transformer         |  1  |  23.414  |
|              hf_Bart              |  1  | 22.4221  |
|              hf_GPT2              |  1  | 19.2972  |
|           fastNLP_Bert            |  1  | 18.3493  |
|       functorch_dp_cifar10        |  1  | 17.6411  |
|              hf_Bert              |  1  |  17.076  |
|      timm_vision_transformer      |  1  |  16.911  |
|             resnet152             |  1  | 16.3772  |
|        mobilenet_v3_large         |  1  |  16.242  |
|             hf_Albert             |  1  | 16.1589  |
| attention_is_all_you_need_pytorch |  1  | 16.1319  |
|           timm_resnest            |  1  | 15.8468  |
|        shufflenet_v2_x1_0         |  1  | 15.7815  |
|           mobilenet_v2            |  1  | 14.9252  |
|          pytorch_stargan          | 16  | 14.3669  |
|           hf_DistilBert           |  1  | 14.1932  |
|          LearningToPaint          |  1  | 13.8432  |
|   pytorch_CycleGAN_and_pix2pix    |  1  | 13.7945  |
|           squeezenet1_1           |  1  |   12.1   |
|                drq                |  1  | 11.5885  |
|               vgg16               |  1  | 11.4219  |
|             resnet50              |  1  | 10.9111  |
|          resnext50_32x4d          |  1  | 10.9055  |
|               dlrm                |  1  | 10.5436  |
|      nvidia_deeprecommender       |  1  | 10.1483  |
|              alexnet              |  1  |  9.7466  |
|            mnasnet1_0             |  1  |  9.7384  |
|             resnet18              |  1  |  9.4164  |
|            tts_angular            |  1  |  8.4291  |
|         soft_actor_critic         | 256 |  8.4037  |
|               dcgan               |  1  |  8.0043  |
|           lennard_jones           |  1  |  7.9502  |
|              demucs               |  1  |  1.496   |
|     detectron2_fcos_r_50_fpn      |  0  |   nan    |
|    mobilenet_v2_quantized_qat     |  0  |   nan    |
|      resnet50_quantized_qat       |  0  |   nan    |
+-----------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|              demucs               |  1  |  0.9982  |
|               dlrm                |  1  |  0.998   |
|        Background_Matting         |  1  |  0.9966  |
|           pytorch_unet            |  1  |  0.9904  |
|         soft_actor_critic         | 256 |  0.9865  |
|            tts_angular            |  1  |  0.9841  |
|           lennard_jones           |  1  |  0.9837  |
|                drq                |  1  |  0.9828  |
|            hf_BigBird             |  1  |  0.9805  |
|            Super_SloMo            |  1  |  0.9766  |
|              alexnet              |  1  |  0.9754  |
|               vgg16               |  1  |  0.9684  |
|           hf_DistilBert           |  1  |  0.9654  |
|          pytorch_stargan          | 16  |  0.9648  |
|   timm_vision_transformer_large   |  1  |  0.9642  |
|              hf_GPT2              |  1  |  0.9634  |
|        speech_transformer         |  1  |  0.9584  |
|              hf_Bert              |  1  |  0.9517  |
|              hf_Bart              |  1  |  0.9513  |
|            timm_vovnet            |  1  |  0.9511  |
|           squeezenet1_1           |  1  |  0.9499  |
|               dcgan               |  1  |  0.9496  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  0.9469  |
|       functorch_dp_cifar10        |  1  |  0.9449  |
| attention_is_all_you_need_pytorch |  1  |  0.9386  |
|             hf_Albert             |  1  |  0.933   |
|            mnasnet1_0             |  1  |  0.9325  |
|        shufflenet_v2_x1_0         |  1  |  0.9319  |
|           mobilenet_v2            |  1  |  0.9293  |
|        mobilenet_v3_large         |  1  |  0.924   |
|          LearningToPaint          |  1  |  0.9204  |
|            timm_regnet            |  1  |  0.9195  |
|            hf_Reformer            |  1  |  0.9193  |
|      timm_vision_transformer      |  1  |  0.9137  |
|           timm_resnest            |  1  |  0.8976  |
|         timm_efficientnet         |  1  |  0.8943  |
|             resnet18              |  1  |  0.8891  |
|          vision_maskrcnn          |  1  |  0.8685  |
|          resnext50_32x4d          |  1  |  0.8412  |
|             resnet50              |  1  |  0.8406  |
|            densenet121            |  1  |  0.8205  |
|           BERT_pytorch            |  1  |  0.8189  |
|               hf_T5               |  1  |  0.8059  |
|           hf_GPT2_large           |  1  |  0.7913  |
|             resnet152             |  1  |  0.7678  |
|              yolov3               |  1  |  0.7543  |
|            hf_T5_base             |  1  |  0.7463  |
|           fastNLP_Bert            |  1  |  0.7452  |
|           hf_Longformer           |  1  |  0.6739  |
|            timm_nfnet             |  1  |  0.6547  |
|      nvidia_deeprecommender       |  1  |  0.4974  |
|            hf_T5_large            |  1  |  0.4698  |
|     detectron2_fcos_r_50_fpn      |  0  |   nan    |
|    mobilenet_v2_quantized_qat     |  0  |   nan    |
|      resnet50_quantized_qat       |  0  |   nan    |
+-----------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------+-----+------------+
|               name                | bs  |  inductor  |
+-----------------------------------+-----+------------+
|            hf_T5_base             |  1  | 24444.0013 |
|           hf_GPT2_large           |  1  | 16924.0607 |
|            hf_T5_large            |  1  | 7376.7352  |
|        Background_Matting         |  1  | 5595.8396  |
|           pytorch_unet            |  1  | 4490.7341  |
|   timm_vision_transformer_large   |  1  | 3861.5341  |
|          vision_maskrcnn          |  1  |  2905.452  |
|          pytorch_stargan          | 16  |  2367.288  |
|              demucs               |  1  | 2262.7917  |
|            Super_SloMo            |  1  | 2218.7874  |
|            hf_BigBird             |  1  | 1796.9369  |
|              hf_Bart              |  1  | 1362.8665  |
|           hf_Longformer           |  1  | 1329.0983  |
|              hf_Bert              |  1  | 1099.2459  |
|             hf_Albert             |  1  |  928.676   |
|           fastNLP_Bert            |  1  |  877.165   |
|        speech_transformer         |  1  |  814.6247  |
|               hf_T5               |  1  |  748.1866  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  639.1924  |
|           hf_DistilBert           |  1  |  623.1241  |
|              yolov3               |  1  |  547.2033  |
|              hf_GPT2              |  1  |  538.6555  |
|            hf_Reformer            |  1  |  490.097   |
|             resnet152             |  1  |  327.4474  |
|               vgg16               |  1  |  241.9179  |
|            timm_nfnet             |  1  |  225.1512  |
|           BERT_pytorch            |  1  |  210.2374  |
|            timm_regnet            |  1  |  204.6491  |
|            timm_vovnet            |  1  |  152.4075  |
|             resnet50              |  1  |  130.0606  |
|          resnext50_32x4d          |  1  |  124.8423  |
|            densenet121            |  1  |  116.4706  |
|      timm_vision_transformer      |  1  |  108.2699  |
|           timm_resnest            |  1  |  71.7184   |
|         timm_efficientnet         |  1  |    61.7    |
|             resnet18              |  1  |  60.8499   |
|            tts_angular            |  1  |  54.8638   |
|      nvidia_deeprecommender       |  1  |  54.5474   |
| attention_is_all_you_need_pytorch |  1  |  34.8207   |
|            mnasnet1_0             |  1  |  31.6789   |
|              alexnet              |  1  |  31.0992   |
|        mobilenet_v3_large         |  1  |  27.8738   |
|           mobilenet_v2            |  1  |  27.2902   |
|       functorch_dp_cifar10        |  1  |  27.2832   |
|           squeezenet1_1           |  1  |  20.6629   |
|        shufflenet_v2_x1_0         |  1  |  19.9144   |
|          LearningToPaint          |  1  |  14.7087   |
|               dcgan               |  1  |   7.3288   |
|         soft_actor_critic         | 256 |   3.9583   |
|                drq                |  1  |   3.519    |
|               dlrm                |  1  |   0.703    |
|           lennard_jones           |  1  |   0.0778   |
|     detectron2_fcos_r_50_fpn      |  0  |    nan     |
|    mobilenet_v2_quantized_qat     |  0  |    nan     |
|      resnet50_quantized_qat       |  0  |    nan     |
+-----------------------------------+-----+------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|     MobileBertForQuestionAnswering      | 1  |  1.9192  |
|               GoogleFnet                | 1  |  1.2296  |
|          MobileBertForMaskedLM          | 1  |  1.2079  |
|         Speech2Text2ForCausalLM         | 1  |  1.1074  |
|            XLNetLMHeadModel             | 1  |  1.1002  |
|             OPTForCausalLM              | 1  |  1.0451  |
|     M2M100ForConditionalGeneration      | 1  |  1.0349  |
|        BertForQuestionAnswering         | 1  |  1.0318  |
|      MBartForConditionalGeneration      | 1  |  1.0187  |
|       DebertaForQuestionAnswering       | 1  |  1.0111  |
|     PegasusForConditionalGeneration     | 1  |  1.009   |
|       RobertaForQuestionAnswering       | 1  |  1.0082  |
| BlenderbotSmallForConditionalGeneration | 1  |  0.9945  |
|     DistilBertForQuestionAnswering      | 1  |  0.9928  |
|     PLBartForConditionalGeneration      | 1  |  0.9888  |
|                 BigBird                 | 1  |  0.9722  |
|            TrOCRForCausalLM             | 1  |  0.9624  |
|            MBartForCausalLM             | 1  |  0.9532  |
|           PegasusForCausalLM            | 1  |  0.951   |
|       AlbertForQuestionAnswering        | 1  |  0.9423  |
|       ElectraForQuestionAnswering       | 1  |  0.9421  |
|            AlbertForMaskedLM            | 1  |  0.9392  |
|          DistilBertForMaskedLM          | 1  |  0.9244  |
|            PLBartForCausalLM            | 1  |  0.9079  |
|           DebertaForMaskedLM            | 1  |  0.8949  |
|       BlenderbotSmallForCausalLM        | 1  |  0.8846  |
|            YituTechConvBert             | 1  |  0.863   |
|    MegatronBertForQuestionAnswering     | 1  |  0.854   |
|           ElectraForCausalLM            | 1  |  0.8465  |
|         MegatronBertForCausalLM         | 1  |  0.8386  |
|          AllenaiLongformerBase          | 1  |  0.815   |
|             XGLMForCausalLM             | 1  |  0.8107  |
|           RobertaForCausalLM            | 1  |  0.7954  |
|             BertForMaskedLM             | 1  |  0.787   |
|               DistillGPT2               | 1  |  0.7215  |
|    LayoutLMForSequenceClassification    | 1  |  0.7078  |
|           LayoutLMForMaskedLM           | 1  |  0.7049  |
|       MT5ForConditionalGeneration       | 1  |  0.6901  |
|                CamemBert                | 1  |  0.6848  |
|             BartForCausalLM             | 1  |  0.6845  |
|      GPT2ForSequenceClassification      | 1  |  0.6648  |
|      BartForConditionalGeneration       | 1  |  0.6546  |
|       T5ForConditionalGeneration        | 1  |  0.6156  |
|                 T5Small                 | 1  |  0.6139  |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|      BartForConditionalGeneration       | 1  | 48.9971  |
|       AlbertForQuestionAnswering        | 1  | 44.5563  |
|          AllenaiLongformerBase          | 1  |  43.111  |
|            AlbertForMaskedLM            | 1  | 35.9983  |
|    MegatronBertForQuestionAnswering     | 1  |  35.094  |
|      GPT2ForSequenceClassification      | 1  |   33.0   |
|          MobileBertForMaskedLM          | 1  | 31.5125  |
|     M2M100ForConditionalGeneration      | 1  | 31.1394  |
|                 T5Small                 | 1  | 31.0737  |
|       MT5ForConditionalGeneration       | 1  | 30.9215  |
|     MobileBertForQuestionAnswering      | 1  | 30.6931  |
|     PegasusForConditionalGeneration     | 1  | 30.2567  |
|       T5ForConditionalGeneration        | 1  | 29.9066  |
|             XGLMForCausalLM             | 1  | 29.3261  |
|      MBartForConditionalGeneration      | 1  | 29.1963  |
|         MegatronBertForCausalLM         | 1  | 26.1944  |
|             BertForMaskedLM             | 1  | 25.8671  |
|           DebertaForMaskedLM            | 1  | 24.9838  |
|            TrOCRForCausalLM             | 1  | 24.3498  |
|       DebertaForQuestionAnswering       | 1  | 23.4213  |
|            XLNetLMHeadModel             | 1  | 23.0369  |
|            YituTechConvBert             | 1  | 22.6534  |
|             BartForCausalLM             | 1  | 21.9363  |
|                 BigBird                 | 1  | 21.8919  |
|           PegasusForCausalLM            | 1  | 21.2559  |
|            MBartForCausalLM             | 1  | 20.7836  |
| BlenderbotSmallForConditionalGeneration | 1  | 20.5169  |
|     PLBartForConditionalGeneration      | 1  | 20.4035  |
|             OPTForCausalLM              | 1  | 19.5484  |
|    LayoutLMForSequenceClassification    | 1  |  18.248  |
|                CamemBert                | 1  | 17.9571  |
|           LayoutLMForMaskedLM           | 1  |  17.482  |
|           RobertaForCausalLM            | 1  | 17.1473  |
|               DistillGPT2               | 1  | 17.0975  |
|           ElectraForCausalLM            | 1  | 16.7446  |
|       RobertaForQuestionAnswering       | 1  | 16.5735  |
|        BertForQuestionAnswering         | 1  | 16.0552  |
|       ElectraForQuestionAnswering       | 1  | 15.7711  |
|         Speech2Text2ForCausalLM         | 1  | 15.4567  |
|       BlenderbotSmallForCausalLM        | 1  | 15.2025  |
|          DistilBertForMaskedLM          | 1  | 14.5338  |
|            PLBartForCausalLM            | 1  |  14.525  |
|     DistilBertForQuestionAnswering      | 1  | 13.7605  |
|               GoogleFnet                | 1  | 13.6029  |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|               GoogleFnet                | 1  |  0.984   |
|       DebertaForQuestionAnswering       | 1  |  0.9767  |
|           DebertaForMaskedLM            | 1  |  0.9749  |
|     M2M100ForConditionalGeneration      | 1  |  0.9727  |
|            TrOCRForCausalLM             | 1  |  0.9716  |
|            PLBartForCausalLM            | 1  |  0.9715  |
|           PegasusForCausalLM            | 1  |  0.9705  |
|            MBartForCausalLM             | 1  |  0.9695  |
|     DistilBertForQuestionAnswering      | 1  |  0.9672  |
|          DistilBertForMaskedLM          | 1  |  0.965   |
|     PegasusForConditionalGeneration     | 1  |  0.9648  |
|             XGLMForCausalLM             | 1  |  0.9611  |
|             OPTForCausalLM              | 1  |  0.9591  |
|       BlenderbotSmallForCausalLM        | 1  |  0.9536  |
|         Speech2Text2ForCausalLM         | 1  |  0.9526  |
|        BertForQuestionAnswering         | 1  |  0.9512  |
|       RobertaForQuestionAnswering       | 1  |  0.9507  |
|    MegatronBertForQuestionAnswering     | 1  |  0.9406  |
|     PLBartForConditionalGeneration      | 1  |  0.9398  |
|      BartForConditionalGeneration       | 1  |  0.9387  |
|         MegatronBertForCausalLM         | 1  |  0.933   |
|      MBartForConditionalGeneration      | 1  |  0.927   |
|           RobertaForCausalLM            | 1  |  0.925   |
|             BertForMaskedLM             | 1  |  0.9249  |
|      GPT2ForSequenceClassification      | 1  |  0.9058  |
|       ElectraForQuestionAnswering       | 1  |  0.892   |
|           ElectraForCausalLM            | 1  |  0.8905  |
| BlenderbotSmallForConditionalGeneration | 1  |  0.8836  |
|             BartForCausalLM             | 1  |  0.8579  |
|            XLNetLMHeadModel             | 1  |  0.8531  |
|                CamemBert                | 1  |  0.8527  |
|           LayoutLMForMaskedLM           | 1  |  0.8519  |
|          AllenaiLongformerBase          | 1  |  0.8253  |
|                 BigBird                 | 1  |  0.825   |
|            YituTechConvBert             | 1  |  0.8133  |
|          MobileBertForMaskedLM          | 1  |  0.7559  |
|               DistillGPT2               | 1  |  0.7556  |
|       T5ForConditionalGeneration        | 1  |  0.7544  |
|     MobileBertForQuestionAnswering      | 1  |  0.7307  |
|    LayoutLMForSequenceClassification    | 1  |  0.7296  |
|                 T5Small                 | 1  |   0.71   |
|            AlbertForMaskedLM            | 1  |  0.6963  |
|       AlbertForQuestionAnswering        | 1  |  0.6799  |
|       MT5ForConditionalGeneration       | 1  |  0.4384  |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+------------+
|                  name                   | bs |  inductor  |
+-----------------------------------------+----+------------+
|       AlbertForQuestionAnswering        | 1  | 16162.4794 |
|            AlbertForMaskedLM            | 1  | 16118.1871 |
|      BartForConditionalGeneration       | 1  | 10419.4101 |
|             BartForCausalLM             | 1  |  4643.22   |
|            XLNetLMHeadModel             | 1  | 3907.6893  |
|          AllenaiLongformerBase          | 1  | 2926.5363  |
|      GPT2ForSequenceClassification      | 1  | 2699.8503  |
|       T5ForConditionalGeneration        | 1  | 2494.1865  |
|                 T5Small                 | 1  | 2493.0742  |
|                 BigBird                 | 1  | 2295.8959  |
|             XGLMForCausalLM             | 1  | 1288.9455  |
|                CamemBert                | 1  | 1207.9531  |
|           LayoutLMForMaskedLM           | 1  | 1176.0064  |
|           DebertaForMaskedLM            | 1  | 1075.9537  |
|            YituTechConvBert             | 1  | 1047.5884  |
|     M2M100ForConditionalGeneration      | 1  | 1023.4148  |
|       DebertaForQuestionAnswering       | 1  |  966.3726  |
|    LayoutLMForSequenceClassification    | 1  |  930.3851  |
|     PegasusForConditionalGeneration     | 1  |  909.4274  |
|      MBartForConditionalGeneration      | 1  |  905.5053  |
|               DistillGPT2               | 1  |  851.721   |
|       MT5ForConditionalGeneration       | 1  |  836.0849  |
|         MegatronBertForCausalLM         | 1  |  736.6892  |
|               GoogleFnet                | 1  |  703.621   |
|    MegatronBertForQuestionAnswering     | 1  |  658.2027  |
|           PegasusForCausalLM            | 1  |  468.2069  |
|            MBartForCausalLM             | 1  |  465.857   |
|            TrOCRForCausalLM             | 1  |  461.3907  |
|     PLBartForConditionalGeneration      | 1  |  353.2526  |
|           ElectraForCausalLM            | 1  |  345.684   |
|             OPTForCausalLM              | 1  |  284.7286  |
| BlenderbotSmallForConditionalGeneration | 1  |  275.3079  |
|             BertForMaskedLM             | 1  |  271.0192  |
|           RobertaForCausalLM            | 1  |  268.3139  |
|       ElectraForQuestionAnswering       | 1  |  214.5785  |
|            PLBartForCausalLM            | 1  |  213.3463  |
|       RobertaForQuestionAnswering       | 1  |  204.6172  |
|        BertForQuestionAnswering         | 1  |  199.7581  |
|          MobileBertForMaskedLM          | 1  |  177.9792  |
|          DistilBertForMaskedLM          | 1  |  172.2906  |
|       BlenderbotSmallForCausalLM        | 1  |  169.7299  |
|     DistilBertForQuestionAnswering      | 1  |  104.4706  |
|     MobileBertForQuestionAnswering      | 1  |  61.2836   |
|         Speech2Text2ForCausalLM         | 1  |  33.5208   |
+-----------------------------------------+----+------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  |  1.6555  |
|           regnety_002           | 1  |  1.4812  |
|           fbnetc_100            | 1  |  1.4328  |
|          spnasnet_100           | 1  |  1.4102  |
|           mnasnet_100           | 1  |  1.4019  |
|         mobilenetv2_100         | 1  |  1.3831  |
|            fbnetv3_b            | 1  |  1.3684  |
|          inception_v3           | 1  |  1.3519  |
|       gluon_inception_v3        | 1  |  1.3388  |
|        adv_inception_v3         | 1  |  1.3315  |
|        ese_vovnet19b_dw         | 1  |  1.3231  |
|          botnet26t_256          | 1  |  1.3139  |
|      mobilenetv3_large_100      | 1  |  1.2819  |
|          gmixer_24_224          | 1  |  1.2701  |
|            lcnet_050            | 1  |  1.2552  |
|            gernet_l             | 1  |  1.1916  |
|          cspdarknet53           | 1  |  1.1765  |
|            hrnet_w18            | 1  |  1.1667  |
|             dla102              | 1  |  1.1396  |
|            repvgg_a2            | 1  |  1.1216  |
|           volo_d1_224           | 1  |  1.1141  |
|           res2next50            | 1  |  1.1042  |
|        gluon_xception65         | 1  |  1.0977  |
|        res2net50_14w_8s         | 1  |  1.0855  |
|           resnest101e           | 1  |  1.0801  |
|      beit_base_patch16_224      | 1  |  1.0752  |
|             dpn107              | 1  |  1.0698  |
|        tnt_s_patch16_224        | 1  |  1.058   |
|          ghostnet_100           | 1  |  1.0541  |
|        convmixer_768_32         | 1  |  1.0473  |
|        res2net101_26w_4s        | 1  |  1.0445  |
|            mixnet_l             | 1  |  1.0012  |
|         crossvit_9_240          | 1  |  0.9966  |
|           convit_base           | 1  |  0.9947  |
|           tf_mixnet_l           | 1  |  0.9911  |
|           rexnet_100            | 1  |   0.99   |
|           dm_nfnet_f0           | 1  |  0.9863  |
|           selecsls42b           | 1  |  0.9758  |
|          resmlp_12_224          | 1  |  0.9638  |
| deit_base_distilled_patch16_224 | 1  |  0.9633  |
|         coat_lite_mini          | 1  |  0.9632  |
|            nfnet_l0             | 1  |  0.9569  |
|         visformer_small         | 1  |  0.9539  |
|        sebotnet33ts_256         | 1  |  0.9414  |
|        eca_halonext26ts         | 1  |  0.9327  |
|      vit_base_patch16_224       | 1  |  0.932   |
|       eca_botnext26ts_256       | 1  |  0.9271  |
|            pit_b_224            | 1  |  0.9026  |
|      xcit_large_24_p8_224       | 1  |  0.8754  |
|           mobilevit_s           | 1  |  0.869   |
|          mixer_b16_224          | 1  |  0.8577  |
|  swin_base_patch4_window7_224   | 1  |  0.7902  |
|            tinynet_a            | 1  |  0.7672  |
|       tf_efficientnet_b0        | 1  |  0.7611  |
|         poolformer_m36          | 1  |  0.7558  |
|          cait_m36_384           | 1  |  0.7343  |
|          gmlp_s16_224           | 1  |  0.7227  |
|        twins_pcpvt_base         | 1  |  0.6653  |
|          convnext_base          | 1  |  0.664   |
|          jx_nest_base           | 1  |  0.6213  |
|     swsl_resnext101_32x16d      | 1  |  0.0663  |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|          botnet26t_256          | 1  |     pass      |
|         coat_lite_mini          | 1  |     pass      |
|        convmixer_768_32         | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|             dla102              | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|       eca_botnext26ts_256       | 1  |     pass      |
|        eca_halonext26ts         | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|        gluon_xception65         | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|            hrnet_w18            | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|      xcit_large_24_p8_224       | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|           volo_d1_224           | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|          pnasnet5large          | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|        res2net101_26w_4s        | 1  |     pass      |
|        res2net50_14w_8s         | 1  |     pass      |
|           res2next50            | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|           resnest101e           | 1  |     pass      |
|           rexnet_100            | 1  |     pass      |
|        sebotnet33ts_256         | 1  |     pass      |
|           selecsls42b           | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|     swsl_resnext101_32x16d      | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|         visformer_small         | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|           convit_base           | 1  |  fail_to_run  |
|          ghostnet_100           | 1  | fail_accuracy |
|          cait_m36_384           | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|     swsl_resnext101_32x16d      | 1  | 84.6102  |
|          pnasnet5large          | 1  | 62.6202  |
|        twins_pcpvt_base         | 1  | 56.5052  |
|           tf_mixnet_l           | 1  | 52.8404  |
|           mobilevit_s           | 1  | 46.8707  |
|           rexnet_100            | 1  | 45.3422  |
|            mixnet_l             | 1  |  45.226  |
|         coat_lite_mini          | 1  | 44.8874  |
|  swin_base_patch4_window7_224   | 1  | 44.2222  |
|          cait_m36_384           | 1  | 43.4292  |
|       tf_efficientnet_b0        | 1  | 40.8872  |
|             dpn107              | 1  | 39.7548  |
|            tinynet_a            | 1  | 39.2097  |
|           dm_nfnet_f0           | 1  | 39.0219  |
|            hrnet_w18            | 1  |  38.823  |
|      xcit_large_24_p8_224       | 1  |  38.371  |
|            fbnetv3_b            | 1  | 37.2976  |
|        sebotnet33ts_256         | 1  | 35.6706  |
|          jx_nest_base           | 1  | 35.5893  |
|            nfnet_l0             | 1  | 34.3365  |
|        res2net50_14w_8s         | 1  | 33.8936  |
|        eca_halonext26ts         | 1  | 33.0631  |
|         poolformer_m36          | 1  |  32.494  |
|      mobilenetv3_large_100      | 1  |  31.978  |
|         crossvit_9_240          | 1  | 31.2509  |
|           volo_d1_224           | 1  | 30.2039  |
|          ghostnet_100           | 1  | 29.9092  |
|       eca_botnext26ts_256       | 1  | 29.5306  |
|        tnt_s_patch16_224        | 1  |  28.646  |
|          spnasnet_100           | 1  |  28.506  |
|          cspdarknet53           | 1  | 28.3475  |
|           fbnetc_100            | 1  | 28.0247  |
|           resnest101e           | 1  | 27.7135  |
|          botnet26t_256          | 1  | 27.3314  |
|           regnety_002           | 1  | 27.2782  |
|          inception_v3           | 1  | 27.0959  |
|        adv_inception_v3         | 1  | 27.0685  |
|       gluon_inception_v3        | 1  | 26.9963  |
|            pit_b_224            | 1  | 26.5334  |
|        res2net101_26w_4s        | 1  | 26.3073  |
|           mnasnet_100           | 1  | 25.4305  |
|         mobilenetv2_100         | 1  | 24.2003  |
|          convnext_base          | 1  | 23.9086  |
|          gmlp_s16_224           | 1  | 23.5881  |
|         visformer_small         | 1  | 23.3023  |
|           convit_base           | 1  | 23.1088  |
|           selecsls42b           | 1  | 21.8964  |
|           res2next50            | 1  | 21.7189  |
|             dla102              | 1  | 20.6287  |
|            gernet_l             | 1  |  19.839  |
|          gmixer_24_224          | 1  | 19.6573  |
|        ese_vovnet19b_dw         | 1  | 19.0535  |
| deit_base_distilled_patch16_224 | 1  | 18.5192  |
|          mixer_b16_224          | 1  |  18.021  |
|      vit_base_patch16_224       | 1  | 17.9716  |
|            repvgg_a2            | 1  | 17.6986  |
|      beit_base_patch16_224      | 1  | 16.9351  |
|            lcnet_050            | 1  | 16.8918  |
|        gluon_xception65         | 1  | 14.7129  |
|        convmixer_768_32         | 1  | 14.0338  |
|          resmlp_12_224          | 1  | 13.6005  |
+---------------------------------+----+----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  |  0.956   |
| deit_base_distilled_patch16_224 | 1  |  0.9518  |
|            pit_b_224            | 1  |  0.9516  |
|      vit_base_patch16_224       | 1  |  0.9513  |
|            repvgg_a2            | 1  |   0.95   |
|          resmlp_12_224          | 1  |  0.9487  |
|        ese_vovnet19b_dw         | 1  |  0.9479  |
|            gernet_l             | 1  |  0.9477  |
|             dpn107              | 1  |  0.9459  |
|           convit_base           | 1  |  0.945   |
|      beit_base_patch16_224      | 1  |  0.9445  |
|          cait_m36_384           | 1  |  0.9429  |
|          mixer_b16_224          | 1  |  0.9393  |
|          convnext_base          | 1  |  0.9366  |
|          cspdarknet53           | 1  |  0.9334  |
|          botnet26t_256          | 1  |  0.9315  |
|            lcnet_050            | 1  |  0.9311  |
|         coat_lite_mini          | 1  |  0.9253  |
|           mnasnet_100           | 1  |  0.9248  |
|         mobilenetv2_100         | 1  |  0.9208  |
|       eca_botnext26ts_256       | 1  |  0.9207  |
|      xcit_large_24_p8_224       | 1  |   0.92   |
|           regnety_002           | 1  |  0.9191  |
|          spnasnet_100           | 1  |  0.9179  |
|           fbnetc_100            | 1  |  0.9176  |
|        eca_halonext26ts         | 1  |  0.9142  |
|           rexnet_100            | 1  |  0.9096  |
|        sebotnet33ts_256         | 1  |  0.9093  |
|          ghostnet_100           | 1  |  0.9057  |
|      mobilenetv3_large_100      | 1  |  0.8935  |
|       tf_efficientnet_b0        | 1  |  0.8927  |
|            mixnet_l             | 1  |  0.8886  |
|         crossvit_9_240          | 1  |  0.8855  |
|           mobilevit_s           | 1  |  0.8842  |
|          gmixer_24_224          | 1  |  0.8837  |
|           tf_mixnet_l           | 1  |  0.8837  |
|         visformer_small         | 1  |  0.8829  |
|            tinynet_a            | 1  |  0.8809  |
|  swin_base_patch4_window7_224   | 1  |  0.877   |
|          jx_nest_base           | 1  |  0.8548  |
|           volo_d1_224           | 1  |  0.8435  |
|        tnt_s_patch16_224        | 1  |  0.8427  |
|          inception_v3           | 1  |  0.8365  |
|        adv_inception_v3         | 1  |  0.8365  |
|     swsl_resnext101_32x16d      | 1  |  0.8359  |
|           res2next50            | 1  |  0.8356  |
|       gluon_inception_v3        | 1  |  0.8355  |
|           selecsls42b           | 1  |  0.829   |
|        convmixer_768_32         | 1  |  0.8208  |
|            fbnetv3_b            | 1  |  0.8183  |
|        res2net50_14w_8s         | 1  |  0.8166  |
|             dla102              | 1  |  0.8133  |
|        gluon_xception65         | 1  |  0.8008  |
|            hrnet_w18            | 1  |  0.7942  |
|          gmlp_s16_224           | 1  |  0.7915  |
|            nfnet_l0             | 1  |  0.7703  |
|           resnest101e           | 1  |  0.7647  |
|        res2net101_26w_4s        | 1  |  0.7614  |
|         poolformer_m36          | 1  |   0.75   |
|        twins_pcpvt_base         | 1  |  0.731   |
|           dm_nfnet_f0           | 1  |  0.6665  |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+------------+
|              name               | bs |  inductor  |
+---------------------------------+----+------------+
|     swsl_resnext101_32x16d      | 1  | 13498.4926 |
|          cait_m36_384           | 1  | 4985.9813  |
|      xcit_large_24_p8_224       | 1  | 2097.5778  |
|           resnest101e           | 1  |  980.7668  |
|          pnasnet5large          | 1  |  484.1883  |
|          jx_nest_base           | 1  |  470.3207  |
|           convit_base           | 1  |  393.0478  |
|  swin_base_patch4_window7_224   | 1  |  372.0333  |
|          convnext_base          | 1  |  362.3913  |
|           dm_nfnet_f0           | 1  |  353.6843  |
|            pit_b_224            | 1  |  340.8464  |
|      vit_base_patch16_224       | 1  |  325.1788  |
| deit_base_distilled_patch16_224 | 1  |  316.8631  |
|      beit_base_patch16_224      | 1  |  316.0932  |
|         poolformer_m36          | 1  |  287.6353  |
|             dpn107              | 1  |  287.2039  |
|        convmixer_768_32         | 1  |  287.1061  |
|          mixer_b16_224          | 1  |  256.1196  |
|            hrnet_w18            | 1  |  238.2177  |
|        twins_pcpvt_base         | 1  |  223.0628  |
|        gluon_xception65         | 1  |  221.8742  |
|            nfnet_l0             | 1  |  215.7706  |
|           volo_d1_224           | 1  |  203.9664  |
|          gmlp_s16_224           | 1  |  203.4203  |
|             dla102              | 1  |  191.7719  |
|        tnt_s_patch16_224        | 1  |  184.6197  |
|        sebotnet33ts_256         | 1  |  183.8818  |
|          cspdarknet53           | 1  |  181.7662  |
|            repvgg_a2            | 1  |  160.6748  |
|        adv_inception_v3         | 1  |  160.2409  |
|       gluon_inception_v3        | 1  |  160.1876  |
|          inception_v3           | 1  |  160.1089  |
|        res2net101_26w_4s        | 1  |  154.658   |
|           mobilevit_s           | 1  |  149.3921  |
|         visformer_small         | 1  |  141.4109  |
|        res2net50_14w_8s         | 1  |  140.7363  |
|           selecsls42b           | 1  |  131.5543  |
|           res2next50            | 1  |  126.2329  |
|        eca_halonext26ts         | 1  |  122.1701  |
|       eca_botnext26ts_256       | 1  |  119.1028  |
|          gmixer_24_224          | 1  |  109.0381  |
|         coat_lite_mini          | 1  |  102.7589  |
|           tf_mixnet_l           | 1  |  97.3142   |
|            gernet_l             | 1  |  94.0719   |
|          botnet26t_256          | 1  |  88.7531   |
|            mixnet_l             | 1  |  88.0298   |
|         crossvit_9_240          | 1  |  71.2385   |
|       tf_efficientnet_b0        | 1  |  69.9641   |
|          resmlp_12_224          | 1  |  67.8867   |
|            tinynet_a            | 1  |  55.4643   |
|           rexnet_100            | 1  |  54.8369   |
|            fbnetv3_b            | 1  |  46.7386   |
|        ese_vovnet19b_dw         | 1  |  44.5951   |
|          ghostnet_100           | 1  |  34.9799   |
|           fbnetc_100            | 1  |  28.5539   |
|         mobilenetv2_100         | 1  |  27.4313   |
|          spnasnet_100           | 1  |  26.9549   |
|           mnasnet_100           | 1  |  24.4474   |
|      mobilenetv3_large_100      | 1  |  23.7742   |
|           regnety_002           | 1  |   15.387   |
|            lcnet_050            | 1  |   8.0342   |
+---------------------------------+----+------------+

@zxd1997066
Copy link
Contributor

[cppwrapper_dynamic_shape] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-04-21 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main bad8d25881d850eaf0b326f6ce5c78305e38c001
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000" 

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 79%, 62/78 | 100%, 46/46 | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.39x    |    1.29x    |    1.83x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   47.74    |    45.51    |    44.56    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.90x    |    0.98x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 11.011533 |
|          squeezenet1_1          |   16    | 3.130585  |
|       mobilenet_v3_large        |   32    | 2.964608  |
|           mnasnet1_0            |   32    | 2.960907  |
|          mobilenet_v2           |   16    | 2.911283  |
|        timm_efficientnet        |   64    | 2.852742  |
|       shufflenet_v2_x1_0        |   64    | 2.541593  |
|          timm_resnest           |   32    | 2.304185  |
|            resnet50             |   32    | 2.220962  |
|        phlippe_densenet         |   128   | 2.095688  |
|           densenet121           |   64    | 2.066775  |
|            resnet152            |   32    | 2.011895  |
|       doctr_det_predictor       |    1    | 1.933875  |
|             hf_GPT2             |    1    | 1.909815  |
|           timm_regnet           |   32    |  1.87947  |
|           timm_nfnet            |   128   | 1.866607  |
|         resnext50_32x4d         |    8    | 1.801933  |
|         phlippe_resnet          |   128   | 1.741357  |
|           timm_vovnet           |   32    | 1.686708  |
|            resnet18             |    8    | 1.672323  |
|      doctr_reco_predictor       |    1    | 1.655302  |
|             alexnet             |   128   | 1.601905  |
|     functorch_maml_omniglot     |    1    | 1.573616  |
|            hf_Albert            |    1    |  1.56143  |
|          hf_Bert_large          |    1    | 1.540917  |
|            moondream            |    1    | 1.531219  |
|          fastNLP_Bert           |    1    | 1.520962  |
|             yolov3              |    8    | 1.520568  |
|          hf_GPT2_large          |    1    | 1.516699  |
|        basic_gnn_edgecnn        |    1    | 1.506952  |
|              dcgan              |   256   | 1.481005  |
|             hf_Bert             |    1    | 1.467202  |
|          hf_Longformer          |    1    | 1.465647  |
|          basic_gnn_gcn          |    1    | 1.451337  |
|         LearningToPaint         |   96    |  1.44862  |
|          hf_DistilBert          |    1    | 1.344469  |
|              vgg16              |    4    | 1.301489  |
|             hf_Bart             |    1    | 1.253448  |
|           hf_BigBird            |    1    | 1.243217  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.235924  |
|        hf_distil_whisper        |    1    | 1.227145  |
|           hf_T5_large           |    1    | 1.210905  |
|          pytorch_unet           |    1    | 1.208832  |
|         basic_gnn_sage          |    1    | 1.208467  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.191823  |
|         pytorch_stargan         |   16    | 1.185902  |
|          BERT_pytorch           |    2    | 1.185135  |
|              dlrm               |  2048   | 1.181556  |
|          maml_omniglot          |    5    | 1.179998  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.170186  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.165778  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.159338  |
|      torch_multimodal_clip      |   32    | 1.149951  |
|          basic_gnn_gin          |    1    | 1.131702  |
|              hf_T5              |    1    |  1.13115  |
|        soft_actor_critic        |   256   | 1.128506  |
|               drq               |    1    | 1.086186  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.066888  |
|     timm_vision_transformer     |   32    | 1.064479  |
|          lennard_jones          |  1000   | 1.054142  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.051431  |
|     nvidia_deeprecommender      |   256   |  1.04998  |
|           hf_Reformer           |    1    | 1.046194  |
|       speech_transformer        |    1    | 1.029397  |
|             demucs              |    1    |  1.01661  |
|  timm_vision_transformer_large  |   32    | 1.008735  |
|     resnet50_quantized_qat      |   32    | 1.008659  |
|   mobilenet_v2_quantized_qat    |   96    | 1.002629  |
|           tts_angular           |   64    | 0.995706  |
|           hf_T5_base            |    1    | 0.826797  |
|       Background_Matting        |    1    | 0.823745  |
|              maml               |    1    |  0.71891  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.681607  |
|         opacus_cifar10          |   64    | 0.639961  |
|      functorch_dp_cifar10       |   64    | 0.607801  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+--------------------------------+---------+--------------------+
|              name              |   bs    |      inductor      |
+--------------------------------+---------+--------------------+
|          hf_T5_large           |    4    |  pass_due_to_skip  |
|       Background_Matting       |    1    |  pass_due_to_skip  |
| timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|         hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml              |    1    |  pass_due_to_skip  |
|         basic_gnn_sage         |    1    |        pass        |
|           hf_T5_base           |    4    |        pass        |
|      doctr_det_predictor       |    4    |        pass        |
|              dlrm              |    4    |        pass        |
|    detectron2_fcos_r_50_fpn    |    4    |        pass        |
|             demucs             |    1    |        pass        |
|         basic_gnn_gcn          |    1    |        pass        |
|         basic_gnn_gin          |    1    |        pass        |
|              drq               |    1    |        pass        |
|       basic_gnn_edgecnn        |    1    |        pass        |
|        LearningToPaint         |    4    |        pass        |
|      functorch_dp_cifar10      |    4    |        pass        |
|      doctr_reco_predictor      |    4    |        pass        |
|             yolov3             |    4    |        pass        |
|          fastNLP_Bert          |    4    |        pass        |
|         maml_omniglot          |    5    |        pass        |
|    functorch_maml_omniglot     |    1    |        pass        |
|            hf_Bart             |    4    |        pass        |
|            hf_Bert             |    4    |        pass        |
|             hf_T5              |    4    |        pass        |
|         hf_Bert_large          |    4    |        pass        |
|          hf_Reformer           |    4    |        pass        |
|         hf_Longformer          |    4    |        pass        |
|           hf_BigBird           |    4    |        pass        |
|         hf_DistilBert          |    4    |        pass        |
|            hf_GPT2             |    2    |        pass        |
|           hf_Albert            |    4    |        pass        |
|       hf_distil_whisper        |    4    |        pass        |
|            alexnet             |    4    |        pass        |
|        pytorch_stargan         |   16    |        pass        |
|         lennard_jones          |    4    |        pass        |
|         opacus_cifar10         |    4    |        pass        |
|    pyhpc_isoneutral_mixing     |    4    |        pass        |
|    pyhpc_equation_of_state     |    4    |        pass        |
|         phlippe_resnet         |    4    |        pass        |
|        phlippe_densenet        |    4    |        pass        |
|       mobilenet_v3_large       |    4    |        pass        |
|     nvidia_deeprecommender     |    4    |        pass        |
|             vgg16              |    4    |        pass        |
|  pytorch_CycleGAN_and_pix2pix  |    1    |        pass        |
|   mobilenet_v2_quantized_qat   |    4    |        pass        |
|             llama              |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy | 1048576 |        pass        |
|          BERT_pytorch          |    4    |        pass        |
|           moondream            |    4    |        pass        |
|          pytorch_unet          |    2    |        pass        |
|       soft_actor_critic        |   256   |        pass        |
|       speech_transformer       |    1    |        pass        |
|         squeezenet1_1          |    4    |        pass        |
|       timm_efficientnet        |    4    |        pass        |
|           timm_nfnet           |    4    |        pass        |
|          timm_regnet           |    4    |        pass        |
|          timm_resnest          |    4    |        pass        |
|    timm_vision_transformer     |    4    |        pass        |
|          timm_vovnet           |    4    |        pass        |
|     torch_multimodal_clip      |    4    |        pass        |
|          tts_angular           |    4    |        pass        |
|     resnet50_quantized_qat     |    4    |        pass        |
|       timm_efficientdet        |    0    | model_fail_to_load |
|              moco              |    0    | model_fail_to_load |
|         DALLE2_pytorch         |    0    | model_fail_to_load |
|          Super_SloMo           |    4    |    fail_to_run     |
|        vision_maskrcnn         |    1    |    fail_to_run     |
|             dcgan              |    4    |   fail_accuracy    |
|          densenet121           |    4    |   fail_accuracy    |
|       shufflenet_v2_x1_0       |    4    |   fail_accuracy    |
|          mobilenet_v2          |    4    |   fail_accuracy    |
|           mnasnet1_0           |    4    |   fail_accuracy    |
|        resnext50_32x4d         |    4    |   fail_accuracy    |
|            resnet50            |    4    |   fail_accuracy    |
|           resnet152            |    4    |   fail_accuracy    |
|            resnet18            |    4    |   fail_accuracy    |
+--------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 470.252834 |
|    detectron2_fcos_r_50_fpn     |    1    | 406.31848  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 230.10697  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 222.699224 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 215.248073 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 206.154828 |
|              maml               |    1    | 140.405017 |
|           hf_T5_large           |    1    | 106.818246 |
|       speech_transformer        |    1    | 89.744089  |
|          hf_Longformer          |    1    | 85.150963  |
|           hf_Reformer           |    1    | 80.477242  |
|  timm_vision_transformer_large  |   32    |  68.32991  |
|      torch_multimodal_clip      |   32    | 65.654858  |
|          basic_gnn_gcn          |    1    | 60.990657  |
|            resnet152            |   32    | 58.773005  |
|           densenet121           |   64    | 58.450862  |
|          fastNLP_Bert           |    1    | 55.091871  |
|          hf_GPT2_large          |    1    | 53.020688  |
|           hf_T5_base            |    1    | 50.425958  |
|            moondream            |    1    | 49.534872  |
|     pyhpc_isoneutral_mixing     | 1048576 | 48.499184  |
|       doctr_det_predictor       |    1    | 47.029238  |
|          hf_Bert_large          |    1    | 44.304363  |
|        hf_distil_whisper        |    1    | 43.486575  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 42.618701  |
|           timm_nfnet            |   128   | 37.928953  |
|           timm_regnet           |   32    | 37.854668  |
|          BERT_pytorch           |    2    |  37.09382  |
|             yolov3              |    8    | 35.874482  |
|             demucs              |    1    | 34.462895  |
|        timm_efficientnet        |   64    | 33.325567  |
|             hf_Bart             |    1    | 31.529661  |
|              hf_T5              |    1    | 31.067474  |
|        phlippe_densenet         |   128   | 29.446973  |
|       shufflenet_v2_x1_0        |   64    | 29.261453  |
|     timm_vision_transformer     |   32    | 28.961786  |
|       mobilenet_v3_large        |   32    | 28.087406  |
|             hf_Bert             |    1    | 27.264459  |
|         pytorch_stargan         |   16    | 26.219576  |
|         opacus_cifar10          |   64    | 26.218402  |
|            hf_Albert            |    1    | 25.815746  |
|      doctr_reco_predictor       |    1    | 25.642484  |
|             hf_GPT2             |    1    | 24.907378  |
|       Background_Matting        |    1    | 24.582417  |
|      functorch_dp_cifar10       |   64    |  24.51645  |
|          mobilenet_v2           |   16    |  24.34692  |
|          timm_resnest           |   32    | 24.107161  |
|           timm_vovnet           |   32    | 23.649233  |
|         resnext50_32x4d         |    8    | 23.412303  |
|            resnet50             |   32    | 23.286366  |
|           mnasnet1_0            |   32    | 21.940634  |
|          hf_DistilBert          |    1    | 21.132672  |
|          pytorch_unet           |    1    | 19.952172  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 19.590325  |
|          squeezenet1_1          |   16    | 19.573539  |
|            resnet18             |    8    | 18.171427  |
|         LearningToPaint         |   96    | 17.921915  |
|         phlippe_resnet          |   128   | 17.611682  |
|              vgg16              |    4    |  17.41797  |
|     pyhpc_equation_of_state     | 1048576 | 17.412794  |
|             alexnet             |   128   | 17.098679  |
|               drq               |    1    | 15.928759  |
|     functorch_maml_omniglot     |    1    | 15.805736  |
|          maml_omniglot          |    5    | 15.719782  |
|              dlrm               |  2048   | 15.344388  |
|              dcgan              |   256   | 15.143332  |
|        basic_gnn_edgecnn        |    1    | 14.940679  |
|     nvidia_deeprecommender      |   256   | 14.883743  |
|          basic_gnn_gin          |    1    | 14.872804  |
|         basic_gnn_sage          |    1    |  14.85394  |
|        soft_actor_critic        |   256   | 14.541094  |
|          lennard_jones          |  1000   | 14.103588  |
|           tts_angular           |   64    | 13.958789  |
|   mobilenet_v2_quantized_qat    |   96    |  0.126372  |
|     resnet50_quantized_qat      |   32    |  0.10142   |
|        timm_efficientdet        |    0    |    0.0     |
|              moco               |    0    |    0.0     |
|         DALLE2_pytorch          |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|           timm_nfnet            |   128   | 0.991844 |
|  timm_vision_transformer_large  |   32    | 0.991486 |
|              dlrm               |  2048   | 0.988164 |
|           hf_T5_base            |    1    | 0.987876 |
|        timm_efficientnet        |   64    | 0.985654 |
|   mobilenet_v2_quantized_qat    |   96    | 0.983696 |
|     resnet50_quantized_qat      |   32    | 0.98354  |
|       Background_Matting        |    1    | 0.983479 |
|           timm_regnet           |   32    | 0.983372 |
|            resnet152            |   32    | 0.982504 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981745 |
|             yolov3              |    8    | 0.981718 |
|           densenet121           |   64    | 0.981268 |
|     nvidia_deeprecommender      |   256   | 0.979373 |
|          pytorch_unet           |    1    | 0.979278 |
|             demucs              |    1    | 0.979275 |
|      torch_multimodal_clip      |   32    | 0.977086 |
|           timm_vovnet           |   32    | 0.975281 |
|            resnet50             |   32    | 0.974769 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973075 |
|          hf_GPT2_large          |    1    | 0.972494 |
|         LearningToPaint         |   96    |  0.9721  |
|          timm_resnest           |   32    | 0.971509 |
|           mnasnet1_0            |   32    | 0.964804 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.964534 |
|     timm_vision_transformer     |   32    | 0.964447 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.963098 |
|          mobilenet_v2           |   16    | 0.960151 |
|       mobilenet_v3_large        |   32    | 0.959716 |
|              vgg16              |    4    | 0.958478 |
|       shufflenet_v2_x1_0        |   64    | 0.957729 |
|       doctr_det_predictor       |    1    | 0.956722 |
|             alexnet             |   128   | 0.956536 |
|        basic_gnn_edgecnn        |    1    | 0.954346 |
|         resnext50_32x4d         |    8    | 0.95238  |
|        phlippe_densenet         |   128   | 0.94866  |
|         pytorch_stargan         |   16    | 0.945367 |
|      doctr_reco_predictor       |    1    | 0.935081 |
|          BERT_pytorch           |    2    | 0.933385 |
|          squeezenet1_1          |   16    | 0.922195 |
|           tts_angular           |   64    | 0.921412 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.915941 |
|              dcgan              |   256   | 0.913103 |
|          basic_gnn_gcn          |    1    | 0.913002 |
|     pyhpc_equation_of_state     | 1048576 | 0.906223 |
|            resnet18             |    8    | 0.905263 |
|         phlippe_resnet          |   128   | 0.90169  |
|        hf_distil_whisper        |    1    | 0.893787 |
|        soft_actor_critic        |   256   | 0.892874 |
|         opacus_cifar10          |   64    | 0.890251 |
|           hf_BigBird            |    1    | 0.889129 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.881049 |
|          lennard_jones          |  1000   | 0.867946 |
|          maml_omniglot          |    5    | 0.858201 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.855812 |
|     functorch_maml_omniglot     |    1    | 0.852787 |
|          fastNLP_Bert           |    1    | 0.839321 |
|         basic_gnn_sage          |    1    | 0.826963 |
|          basic_gnn_gin          |    1    | 0.822398 |
|      functorch_dp_cifar10       |   64    | 0.821553 |
|       speech_transformer        |    1    | 0.812747 |
|          hf_Bert_large          |    1    | 0.802714 |
|          hf_Longformer          |    1    | 0.799665 |
|              maml               |    1    |  0.7995  |
|            moondream            |    1    | 0.795344 |
|            hf_Albert            |    1    | 0.792638 |
|             hf_Bert             |    1    | 0.792608 |
|           hf_T5_large           |    1    | 0.79018  |
|               drq               |    1    | 0.765039 |
|          hf_DistilBert          |    1    | 0.763024 |
|             hf_GPT2             |    1    | 0.760499 |
|              hf_T5              |    1    | 0.75541  |
|             hf_Bart             |    1    | 0.740088 |
|           hf_Reformer           |    1    | 0.73224  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.69015  |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4476.475357 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1430.973495 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1294.163173 |
|           hf_T5_base            |    1    | 1245.580455 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1167.083876 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1091.220667 |
|          hf_GPT2_large          |    1    | 547.852636  |
|           timm_nfnet            |   128   | 530.712294  |
|           hf_T5_large           |    1    | 392.603217  |
|            moondream            |    1    | 387.903438  |
|        hf_distil_whisper        |    1    | 343.273919  |
|       Background_Matting        |    1    | 342.998609  |
|          pytorch_unet           |    1    | 223.727235  |
|           timm_regnet           |   32    | 216.443467  |
|            resnet152            |   32    | 186.199442  |
|           densenet121           |   64    |  181.19152  |
|    detectron2_fcos_r_50_fpn     |    1    | 177.604818  |
|      torch_multimodal_clip      |   32    | 165.055183  |
|             yolov3              |    8    | 162.302978  |
|             demucs              |    1    | 137.242601  |
|           hf_BigBird            |    1    | 120.505737  |
|           timm_vovnet           |   32    | 114.862604  |
|          hf_Bert_large          |    1    | 104.779004  |
|     timm_vision_transformer     |   32    | 101.973639  |
|         pytorch_stargan         |   16    |  96.359131  |
|       doctr_det_predictor       |    1    |  85.301115  |
|            resnet50             |   32    |  74.318316  |
|          hf_Longformer          |    1    |  69.586717  |
|             hf_Bart             |    1    |  53.038488  |
|          timm_resnest           |   32    |  52.721766  |
|       speech_transformer        |    1    |  51.417513  |
|        timm_efficientnet        |   64    |  49.178539  |
|              maml               |    1    |  48.703122  |
|             alexnet             |   128   |  43.370176  |
|              hf_T5              |    1    |  43.173188  |
|             hf_Bert             |    1    |  40.135952  |
|   mobilenet_v2_quantized_qat    |   96    |  38.414008  |
|         LearningToPaint         |   96    |  36.468496  |
|              vgg16              |    4    |  35.917037  |
|           hf_Reformer           |    1    |  35.860127  |
|            hf_Albert            |    1    |  34.866152  |
|     nvidia_deeprecommender      |   256   |  34.42996   |
|          fastNLP_Bert           |    1    |  32.840076  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  32.705863  |
|     pyhpc_isoneutral_mixing     | 1048576 |  28.716607  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  28.126082  |
|          BERT_pytorch           |    2    |  27.700835  |
|          hf_DistilBert          |    1    |  26.421886  |
|     resnet50_quantized_qat      |   32    |  24.030806  |
|         resnext50_32x4d         |    8    |  23.749618  |
|             hf_GPT2             |    1    |  22.958395  |
|        basic_gnn_edgecnn        |    1    |  20.788086  |
|           tts_angular           |   64    |  20.573812  |
|        phlippe_densenet         |   128   |  19.859285  |
|              dcgan              |   256   |  18.117257  |
|       shufflenet_v2_x1_0        |   64    |  15.613253  |
|           mnasnet1_0            |   32    |  14.581734  |
|       mobilenet_v3_large        |   32    |  13.541678  |
|         opacus_cifar10          |   64    |  10.611018  |
|      functorch_dp_cifar10       |   64    |  10.465411  |
|          basic_gnn_gcn          |    1    |  9.609066   |
|            resnet18             |    8    |  9.604997   |
|          mobilenet_v2           |   16    |  9.173339   |
|              dlrm               |  2048   |  6.393615   |
|          squeezenet1_1          |   16    |  5.557064   |
|         basic_gnn_sage          |    1    |  5.452668   |
|          basic_gnn_gin          |    1    |  5.005861   |
|         phlippe_resnet          |   128   |  4.472209   |
|      doctr_reco_predictor       |    1    |  3.263038   |
|     pyhpc_equation_of_state     | 1048576 |  1.152132   |
|               drq               |    1    |  0.823935   |
|        soft_actor_critic        |   256   |  0.524066   |
|          maml_omniglot          |    5    |  0.503278   |
|     functorch_maml_omniglot     |    1    |  0.427771   |
|          lennard_jones          |  1000   |  0.213845   |
|        timm_efficientdet        |    0    |     0.0     |
|              moco               |    0    |     0.0     |
|         DALLE2_pytorch          |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.596451 |
|     MobileBertForQuestionAnswering      | 128 | 1.827277 |
|      GPT2ForSequenceClassification      |  4  | 1.796245 |
|           ElectraForCausalLM            | 32  | 1.710669 |
|       ElectraForQuestionAnswering       | 64  | 1.672294 |
|          MobileBertForMaskedLM          | 128 | 1.620638 |
|               DistillGPT2               | 16  | 1.499322 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.432437 |
|       RobertaForQuestionAnswering       | 16  | 1.419132 |
|    LayoutLMForSequenceClassification    | 16  | 1.40932  |
|        BertForQuestionAnswering         | 16  | 1.394351 |
|            YituTechConvBert             | 16  | 1.389314 |
|           RobertaForCausalLM            | 16  | 1.388058 |
|               GoogleFnet                | 16  | 1.357345 |
|           LayoutLMForMaskedLM           | 16  | 1.339278 |
|                CamemBert                | 16  | 1.332891 |
|             BertForMaskedLM             | 16  | 1.323692 |
|          AllenaiLongformerBase          |  4  | 1.310319 |
|    MegatronBertForQuestionAnswering     |  8  | 1.290994 |
|         MegatronBertForCausalLM         |  4  | 1.269791 |
|       DebertaForQuestionAnswering       | 16  | 1.228072 |
|     PLBartForConditionalGeneration      |  4  | 1.225967 |
|      MBartForConditionalGeneration      |  2  | 1.195375 |
|           DebertaForMaskedLM            |  8  | 1.187287 |
|             OPTForCausalLM              |  2  | 1.178791 |
|       MT5ForConditionalGeneration       | 16  | 1.177624 |
|       T5ForConditionalGeneration        |  4  | 1.175045 |
|                 T5Small                 |  4  | 1.17427  |
| BlenderbotSmallForConditionalGeneration | 64  | 1.151444 |
|            AlbertForMaskedLM            |  4  | 1.133684 |
|       AlbertForQuestionAnswering        |  4  | 1.125176 |
|          DistilBertForMaskedLM          | 128 | 1.101081 |
|         Speech2Text2ForCausalLM         | 256 | 1.089162 |
|      BartForConditionalGeneration       |  2  | 1.088394 |
|       BlenderbotSmallForCausalLM        | 64  | 1.084805 |
|     DistilBertForQuestionAnswering      | 256 | 1.083705 |
|     M2M100ForConditionalGeneration      | 16  | 1.08357  |
|             XGLMForCausalLM             |  8  | 1.083401 |
|          DebertaV2ForMaskedLM           |  2  | 1.077529 |
|            PLBartForCausalLM            |  8  | 1.054407 |
|     PegasusForConditionalGeneration     | 32  | 1.046348 |
|            TrOCRForCausalLM             | 32  | 1.045984 |
|            MBartForCausalLM             |  4  | 1.045438 |
|             BartForCausalLM             |  4  | 1.03368  |
|          BlenderbotForCausalLM          |  4  | 1.027662 |
|           PegasusForCausalLM            | 32  | 1.021083 |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|          AllenaiLongformerBase          |  4  | 160.34052 |
|     MobileBertForQuestionAnswering      | 128 | 80.426212 |
|          MobileBertForMaskedLM          | 128 | 80.385404 |
|     PegasusForConditionalGeneration     | 32  | 73.695261 |
|      MBartForConditionalGeneration      |  2  | 73.694154 |
|     M2M100ForConditionalGeneration      | 16  | 72.088318 |
|          BlenderbotForCausalLM          |  4  | 62.577574 |
|             XGLMForCausalLM             |  8  | 60.564372 |
|      BartForConditionalGeneration       |  2  | 59.859235 |
|          DebertaV2ForMaskedLM           |  2  | 58.737959 |
|            XLNetLMHeadModel             |  8  | 57.092758 |
|       MT5ForConditionalGeneration       | 16  | 56.004738 |
| BlenderbotSmallForConditionalGeneration | 64  | 51.987899 |
|         MegatronBertForCausalLM         |  4  | 50.865661 |
|    MegatronBertForQuestionAnswering     |  8  | 50.696666 |
|            YituTechConvBert             | 16  | 50.371741 |
|      DebertaV2ForQuestionAnswering      |  1  | 48.204793 |
|     PLBartForConditionalGeneration      |  4  | 43.539459 |
|                 T5Small                 |  4  | 42.389562 |
|       T5ForConditionalGeneration        |  4  | 42.36473  |
|             OPTForCausalLM              |  2  | 37.228183 |
|       DebertaForQuestionAnswering       | 16  | 37.08474  |
|            TrOCRForCausalLM             | 32  | 37.039082 |
|            MBartForCausalLM             |  4  | 37.017046 |
|           PegasusForCausalLM            | 32  | 36.888637 |
|           DebertaForMaskedLM            |  8  | 36.801598 |
|           ElectraForCausalLM            | 32  | 32.649477 |
|           RobertaForCausalLM            | 16  | 32.383696 |
|       ElectraForQuestionAnswering       | 64  | 32.295521 |
|       RobertaForQuestionAnswering       | 16  | 32.120804 |
|           LayoutLMForMaskedLM           | 16  | 31.852021 |
|             BertForMaskedLM             | 16  | 31.754636 |
|        BertForQuestionAnswering         | 16  | 31.702713 |
|                CamemBert                | 16  | 31.626018 |
|    LayoutLMForSequenceClassification    | 16  | 31.408515 |
|       AlbertForQuestionAnswering        |  4  | 30.981298 |
|             BartForCausalLM             |  4  | 30.968191 |
|      GPT2ForSequenceClassification      |  4  | 30.909456 |
|            AlbertForMaskedLM            |  4  | 30.532764 |
|       BlenderbotSmallForCausalLM        | 64  | 29.562488 |
|     DistilBertForQuestionAnswering      | 256 | 26.869441 |
|          DistilBertForMaskedLM          | 128 | 26.510667 |
|            PLBartForCausalLM            |  8  | 26.196991 |
|               GoogleFnet                | 16  | 25.859583 |
|         Speech2Text2ForCausalLM         | 256 | 25.687024 |
|               DistillGPT2               | 16  | 23.635769 |
+-----------------------------------------+-----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  | 0.99456  |
|       AlbertForQuestionAnswering        |  4  | 0.994047 |
|     DistilBertForQuestionAnswering      | 256 | 0.992408 |
|           RobertaForCausalLM            | 16  | 0.992406 |
|            TrOCRForCausalLM             | 32  | 0.992356 |
|          DistilBertForMaskedLM          | 128 | 0.991901 |
|             OPTForCausalLM              |  2  | 0.991891 |
|           ElectraForCausalLM            | 32  | 0.991511 |
|               GoogleFnet                | 16  | 0.991388 |
|               DistillGPT2               | 16  | 0.991246 |
|             BertForMaskedLM             | 16  | 0.991069 |
|       ElectraForQuestionAnswering       | 64  | 0.990963 |
|           LayoutLMForMaskedLM           | 16  | 0.990688 |
|            PLBartForCausalLM            |  8  | 0.990653 |
|                CamemBert                | 16  | 0.990423 |
|            MBartForCausalLM             |  4  | 0.990122 |
|            YituTechConvBert             | 16  | 0.989755 |
|       DebertaForQuestionAnswering       | 16  | 0.988803 |
|       RobertaForQuestionAnswering       | 16  | 0.988743 |
|     PegasusForConditionalGeneration     | 32  | 0.988687 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.988121 |
|         Speech2Text2ForCausalLM         | 256 | 0.988068 |
|      GPT2ForSequenceClassification      |  4  | 0.987896 |
|        BertForQuestionAnswering         | 16  | 0.987822 |
|     PLBartForConditionalGeneration      |  4  | 0.987411 |
|    LayoutLMForSequenceClassification    | 16  | 0.987119 |
|       BlenderbotSmallForCausalLM        | 64  | 0.986932 |
|           PegasusForCausalLM            | 32  | 0.98682  |
|             BartForCausalLM             |  4  | 0.986426 |
|      MBartForConditionalGeneration      |  2  | 0.986371 |
|          BlenderbotForCausalLM          |  4  | 0.985836 |
|         MegatronBertForCausalLM         |  4  | 0.985324 |
|          MobileBertForMaskedLM          | 128 | 0.984892 |
|           DebertaForMaskedLM            |  8  | 0.984892 |
|       MT5ForConditionalGeneration       | 16  | 0.984753 |
|            XLNetLMHeadModel             |  8  | 0.983785 |
|      BartForConditionalGeneration       |  2  | 0.983408 |
|       T5ForConditionalGeneration        |  4  | 0.982957 |
|                 T5Small                 |  4  | 0.982066 |
|    MegatronBertForQuestionAnswering     |  8  | 0.981324 |
|     MobileBertForQuestionAnswering      | 128 | 0.97832  |
|     M2M100ForConditionalGeneration      | 16  | 0.977733 |
|          DebertaV2ForMaskedLM           |  2  | 0.976777 |
|             XGLMForCausalLM             |  8  | 0.973164 |
|          AllenaiLongformerBase          |  4  | 0.972932 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.869781 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|       AlbertForQuestionAnswering        |  4  | 2692.257211 |
|            AlbertForMaskedLM            |  4  | 2681.000303 |
|            XLNetLMHeadModel             |  8  | 1298.626108 |
|     PegasusForConditionalGeneration     | 32  |  991.87818  |
|            TrOCRForCausalLM             | 32  | 974.378098  |
|     DistilBertForQuestionAnswering      | 256 |  890.38258  |
|    MegatronBertForQuestionAnswering     |  8  | 784.909817  |
|            MBartForCausalLM             |  4  | 677.311163  |
|          BlenderbotForCausalLM          |  4  | 675.490503  |
|      MBartForConditionalGeneration      |  2  | 673.262472  |
|          DistilBertForMaskedLM          | 128 | 671.993386  |
|          DebertaV2ForMaskedLM           |  2  | 610.467211  |
|           RobertaForCausalLM            | 16  |  606.42634  |
|     M2M100ForConditionalGeneration      | 16  | 600.669037  |
|             OPTForCausalLM              |  2  | 596.230415  |
|      BartForConditionalGeneration       |  2  | 595.173181  |
|            YituTechConvBert             | 16  | 580.487044  |
|                CamemBert                | 16  |  566.71479  |
|             BertForMaskedLM             | 16  |  565.05776  |
|           LayoutLMForMaskedLM           | 16  | 561.626914  |
|          AllenaiLongformerBase          |  4  | 536.272564  |
|       DebertaForQuestionAnswering       | 16  | 528.599538  |
|             BartForCausalLM             |  4  | 523.845566  |
|            PLBartForCausalLM            |  8  | 510.541227  |
|           PegasusForCausalLM            | 32  | 494.875879  |
| BlenderbotSmallForConditionalGeneration | 64  | 487.592057  |
|     PLBartForConditionalGeneration      |  4  | 474.766769  |
|         MegatronBertForCausalLM         |  4  | 451.282273  |
|        BertForQuestionAnswering         | 16  | 448.836301  |
|    LayoutLMForSequenceClassification    | 16  | 446.695444  |
|       RobertaForQuestionAnswering       | 16  | 433.364872  |
|               GoogleFnet                | 16  | 409.076547  |
|          MobileBertForMaskedLM          | 128 | 394.136863  |
|               DistillGPT2               | 16  | 385.677667  |
|             XGLMForCausalLM             |  8  | 382.994375  |
|           DebertaForMaskedLM            |  8  | 369.637322  |
|       ElectraForQuestionAnswering       | 64  | 333.570069  |
|                 T5Small                 |  4  | 331.194846  |
|       T5ForConditionalGeneration        |  4  |  331.09336  |
|         Speech2Text2ForCausalLM         | 256 | 279.247271  |
|       BlenderbotSmallForCausalLM        | 64  | 277.494333  |
|      GPT2ForSequenceClassification      |  4  | 273.554715  |
|           ElectraForCausalLM            | 32  | 255.767902  |
|     MobileBertForQuestionAnswering      | 128 | 242.037068  |
|       MT5ForConditionalGeneration       | 16  | 226.330932  |
|      DebertaV2ForQuestionAnswering      |  1  | 223.811108  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           fbnetc_100            | 512  | 3.941087 |
|           mnasnet_100           | 512  | 3.852914 |
|            lcnet_050            | 256  | 3.830848 |
|         mobilenetv2_100         | 128  | 3.803532 |
|      mobilenetv3_large_100      | 512  | 3.663712 |
|          spnasnet_100           | 128  | 3.562066 |
|            fbnetv3_b            | 256  | 3.484032 |
|           regnety_002           | 1024 | 3.286181 |
|           rexnet_100            | 256  | 3.095038 |
|       tf_efficientnet_b0        | 128  | 2.941182 |
|            tinynet_a            | 128  | 2.720907 |
|          pnasnet5large          |  16  | 2.648821 |
|        ese_vovnet19b_dw         | 256  | 2.59969  |
|          botnet26t_256          | 128  | 2.591725 |
|            hrnet_w18            | 128  | 2.572404 |
|           res2next50            | 128  | 2.413865 |
|          ghostnet_100           | 512  | 2.349077 |
|       eca_botnext26ts_256       | 128  |  2.3435  |
|       gluon_inception_v3        | 256  | 2.291902 |
|        eca_halonext26ts         | 128  | 2.244696 |
|          inception_v3           | 128  | 2.229542 |
|        adv_inception_v3         | 128  | 2.214725 |
|           resnest101e           |  64  | 2.21426  |
|             dla102              | 128  | 2.214159 |
|        res2net50_14w_8s         | 128  | 2.152574 |
|        res2net101_26w_4s        | 128  | 2.135322 |
|          cspdarknet53           |  64  | 2.071137 |
|            repvgg_a2            | 128  | 2.06498  |
|            nfnet_l0             | 128  | 2.003898 |
|        convmixer_768_32         |  32  | 1.960591 |
|           tf_mixnet_l           | 128  | 1.958513 |
|            gernet_l             | 128  | 1.913153 |
|           dm_nfnet_f0           | 128  | 1.855113 |
|        sebotnet33ts_256         |  64  | 1.774088 |
|           selecsls42b           | 128  | 1.773003 |
|            mixnet_l             | 128  | 1.762061 |
|         visformer_small         | 128  | 1.672752 |
|         poolformer_m36          |  64  | 1.670224 |
|           volo_d1_224           |  64  | 1.637871 |
|     swsl_resnext101_32x16d      |  32  | 1.588801 |
|             dpn107              |  64  | 1.503046 |
|            levit_128            | 1024 | 1.467957 |
|           mobilevit_s           |  64  | 1.45237  |
|          gmlp_s16_224           | 128  | 1.277836 |
|          resmlp_12_224          | 128  | 1.195718 |
|      xcit_large_24_p8_224       |  16  | 1.194968 |
|           convit_base           |  64  | 1.174185 |
|          gmixer_24_224          | 128  | 1.156863 |
|          cait_m36_384           |  4   | 1.150961 |
|  swin_base_patch4_window7_224   |  64  | 1.122343 |
|        tnt_s_patch16_224        | 128  | 1.104315 |
|        twins_pcpvt_base         | 128  | 1.090059 |
|          convnext_base          |  64  | 1.062577 |
|      beit_base_patch16_224      |  64  | 1.056601 |
|          mixer_b16_224          | 128  | 1.053151 |
|          jx_nest_base           |  32  |  1.0395  |
|      vit_base_patch16_224       |  64  | 1.032574 |
|            pit_b_224            |  64  | 1.030763 |
| deit_base_distilled_patch16_224 |  64  | 1.030029 |
|         crossvit_9_240          | 256  | 0.985947 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 8  |     pass      |
|           dm_nfnet_f0           | 8  |     pass      |
|            mixnet_l             | 8  |     pass      |
|       eca_botnext26ts_256       | 8  |     pass      |
|         crossvit_9_240          | 8  |     pass      |
| deit_base_distilled_patch16_224 | 8  |     pass      |
|          convnext_base          | 8  |     pass      |
|           regnety_002           | 8  |     pass      |
|           convit_base           | 8  |     pass      |
|          gmlp_s16_224           | 8  |     pass      |
|          cait_m36_384           | 8  |     pass      |
|             dpn107              | 8  |     pass      |
|      vit_base_patch16_224       | 8  |     pass      |
|          cspdarknet53           | 8  |     pass      |
|        eca_halonext26ts         | 8  |     pass      |
|        ese_vovnet19b_dw         | 8  |     pass      |
|           fbnetc_100            | 8  |     pass      |
|            fbnetv3_b            | 8  |     pass      |
|            gernet_l             | 8  |     pass      |
|          botnet26t_256          | 8  |     pass      |
|       gluon_inception_v3        | 8  |     pass      |
|          gmixer_24_224          | 8  |     pass      |
|          jx_nest_base           | 8  |     pass      |
|        convmixer_768_32         | 8  |     pass      |
|        twins_pcpvt_base         | 8  |     pass      |
|         poolformer_m36          | 8  |     pass      |
|            lcnet_050            | 8  |     pass      |
|          mixer_b16_224          | 8  |     pass      |
|        tnt_s_patch16_224        | 8  |     pass      |
|           mnasnet_100           | 8  |     pass      |
|         mobilenetv2_100         | 8  |     pass      |
|      mobilenetv3_large_100      | 8  |     pass      |
|           mobilevit_s           | 8  |     pass      |
|            nfnet_l0             | 8  |     pass      |
|            pit_b_224            | 8  |     pass      |
|      beit_base_patch16_224      | 8  |     pass      |
|            repvgg_a2            | 8  |     pass      |
|  swin_base_patch4_window7_224   | 8  |     pass      |
|          inception_v3           | 8  |     pass      |
|           tf_mixnet_l           | 8  |     pass      |
|       tf_efficientnet_b0        | 8  |     pass      |
|            tinynet_a            | 8  |     pass      |
|          spnasnet_100           | 8  |     pass      |
|        sebotnet33ts_256         | 8  |     pass      |
|          resmlp_12_224          | 8  |     pass      |
|         coat_lite_mini          | 8  |  fail_to_run  |
|           resnest101e           | 8  | fail_accuracy |
|            levit_128            | 8  | fail_accuracy |
|          ghostnet_100           | 8  | fail_accuracy |
|          pnasnet5large          | 8  | fail_accuracy |
|        res2net101_26w_4s        | 8  | fail_accuracy |
|        res2net50_14w_8s         | 8  | fail_accuracy |
|             dla102              | 8  | fail_accuracy |
|     swsl_resnext101_32x16d      | 8  | fail_accuracy |
|           rexnet_100            | 8  | fail_accuracy |
|           selecsls42b           | 8  | fail_accuracy |
|           res2next50            | 8  | fail_accuracy |
|            hrnet_w18            | 8  | fail_accuracy |
|           volo_d1_224           | 8  | fail_accuracy |
|         visformer_small         | 8  | fail_accuracy |
|      xcit_large_24_p8_224       | 8  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          pnasnet5large          |  16  | 495.482876 |
|            hrnet_w18            | 128  | 277.323326 |
|          cait_m36_384           |  4   | 106.268642 |
|      xcit_large_24_p8_224       |  16  | 100.217863 |
|  swin_base_patch4_window7_224   |  64  | 99.474005  |
|        res2net101_26w_4s        | 128  | 90.237686  |
|           tf_mixnet_l           | 128  | 89.044504  |
|            mixnet_l             | 128  | 78.796561  |
|        res2net50_14w_8s         | 128  | 78.131552  |
|           mobilevit_s           |  64  | 76.761147  |
|         poolformer_m36          |  64  | 75.935293  |
|           resnest101e           |  64  | 75.408628  |
|        tnt_s_patch16_224        | 128  | 70.430286  |
|          jx_nest_base           |  32  | 68.864327  |
|        twins_pcpvt_base         | 128  | 68.266476  |
|             dpn107              |  64  | 65.853396  |
|           volo_d1_224           |  64  | 57.141631  |
|            fbnetv3_b            | 256  |  55.01293  |
|        eca_halonext26ts         | 128  | 51.574353  |
|          gmixer_24_224          | 128  | 48.968955  |
|            levit_128            | 1024 | 48.087173  |
|         crossvit_9_240          | 256  | 47.941832  |
|          gmlp_s16_224           | 128  | 47.280456  |
|          convnext_base          |  64  | 46.692022  |
|           convit_base           |  64  | 44.465542  |
|        sebotnet33ts_256         |  64  | 43.898885  |
|        adv_inception_v3         | 128  | 40.700959  |
|          inception_v3           | 128  | 40.652208  |
|             dla102              | 128  | 40.037153  |
|          ghostnet_100           | 512  | 39.663164  |
|       gluon_inception_v3        | 256  | 39.226524  |
|           res2next50            | 128  | 39.044987  |
|            tinynet_a            | 128  | 38.267999  |
|           rexnet_100            | 256  | 37.265849  |
|           dm_nfnet_f0           | 128  | 37.008834  |
|       tf_efficientnet_b0        | 128  | 36.569358  |
|         visformer_small         | 128  | 36.370337  |
|       eca_botnext26ts_256       | 128  | 36.003185  |
|     swsl_resnext101_32x16d      |  32  | 35.994667  |
|        convmixer_768_32         |  32  | 35.556395  |
|            nfnet_l0             | 128  | 34.556944  |
|          botnet26t_256          | 128  | 32.800999  |
|            pit_b_224            |  64  | 32.118271  |
|          mixer_b16_224          | 128  | 31.469843  |
|      beit_base_patch16_224      |  64  | 30.672224  |
|      vit_base_patch16_224       |  64  | 29.564422  |
| deit_base_distilled_patch16_224 |  64  | 29.389587  |
|          cspdarknet53           |  64  | 29.258553  |
|           regnety_002           | 1024 | 27.808249  |
|      mobilenetv3_large_100      | 512  | 27.600138  |
|          resmlp_12_224          | 128  | 24.740412  |
|          spnasnet_100           | 128  |  24.68533  |
|         mobilenetv2_100         | 128  | 24.660352  |
|            gernet_l             | 128  | 23.593164  |
|           fbnetc_100            | 512  | 23.514128  |
|            repvgg_a2            | 128  | 23.340747  |
|        ese_vovnet19b_dw         | 256  | 22.633326  |
|            lcnet_050            | 256  | 22.266568  |
|           mnasnet_100           | 512  | 21.044005  |
|           selecsls42b           | 128  |  20.93049  |
+---------------------------------+------+------------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.99743  |
|      mobilenetv3_large_100      | 512  | 0.996905 |
|           fbnetc_100            | 512  | 0.996825 |
|           mnasnet_100           | 512  | 0.996589 |
|            fbnetv3_b            | 256  | 0.996415 |
|          ghostnet_100           | 512  | 0.996193 |
|           dm_nfnet_f0           | 128  | 0.996093 |
|          convnext_base          |  64  | 0.995958 |
|           regnety_002           | 1024 | 0.995819 |
|            levit_128            | 1024 | 0.99493  |
|        eca_halonext26ts         | 128  | 0.994608 |
|       eca_botnext26ts_256       | 128  | 0.994435 |
|        res2net101_26w_4s        | 128  | 0.994238 |
|           rexnet_100            | 256  | 0.994133 |
|            nfnet_l0             | 128  | 0.994109 |
|             dpn107              |  64  | 0.99406  |
|             dla102              | 128  | 0.993961 |
|       gluon_inception_v3        | 256  | 0.993868 |
|           res2next50            | 128  | 0.99355  |
|        convmixer_768_32         |  32  | 0.993453 |
|          mixer_b16_224          | 128  | 0.993369 |
|        res2net50_14w_8s         | 128  | 0.993321 |
|      xcit_large_24_p8_224       |  16  | 0.993194 |
|            mixnet_l             | 128  | 0.99312  |
|       tf_efficientnet_b0        | 128  | 0.993116 |
|          gmlp_s16_224           | 128  | 0.993067 |
|          botnet26t_256          | 128  | 0.992906 |
|         mobilenetv2_100         | 128  | 0.992888 |
|          gmixer_24_224          | 128  | 0.992834 |
|           tf_mixnet_l           | 128  | 0.992428 |
|         visformer_small         | 128  | 0.992414 |
|           convit_base           |  64  | 0.99222  |
|          pnasnet5large          |  16  | 0.992157 |
|           resnest101e           |  64  | 0.992157 |
|            gernet_l             | 128  | 0.991893 |
|        twins_pcpvt_base         | 128  | 0.991556 |
|        sebotnet33ts_256         |  64  | 0.990753 |
|           mobilevit_s           |  64  | 0.990673 |
|          inception_v3           | 128  | 0.990642 |
|      beit_base_patch16_224      |  64  | 0.990521 |
|        adv_inception_v3         | 128  | 0.990408 |
|           selecsls42b           | 128  | 0.990055 |
|        tnt_s_patch16_224        | 128  | 0.989721 |
|          spnasnet_100           | 128  | 0.989051 |
|            tinynet_a            | 128  | 0.988988 |
|            pit_b_224            |  64  | 0.988958 |
|          cait_m36_384           |  4   | 0.988817 |
|          resmlp_12_224          | 128  | 0.98861  |
|         poolformer_m36          |  64  | 0.98845  |
|            hrnet_w18            | 128  | 0.988389 |
|  swin_base_patch4_window7_224   |  64  | 0.988124 |
|      vit_base_patch16_224       |  64  | 0.987656 |
| deit_base_distilled_patch16_224 |  64  | 0.987607 |
|     swsl_resnext101_32x16d      |  32  | 0.986558 |
|            lcnet_050            | 256  | 0.985792 |
|            repvgg_a2            | 128  | 0.984479 |
|           volo_d1_224           |  64  | 0.983973 |
|          jx_nest_base           |  32  |  0.9833  |
|          cspdarknet53           |  64  | 0.978205 |
|         crossvit_9_240          | 256  | 0.973779 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+-------------+
|              name               |  bs  |  inductor   |
+---------------------------------+------+-------------+
|      xcit_large_24_p8_224       |  16  | 1461.044199 |
|          convnext_base          |  64  | 1162.048271 |
|          cait_m36_384           |  4   | 1111.04861  |
|          mixer_b16_224          | 128  | 1054.101633 |
|           dm_nfnet_f0           | 128  | 948.308555  |
|           convit_base           |  64  | 921.990798  |
|             dpn107              |  64  | 917.434565  |
|  swin_base_patch4_window7_224   |  64  | 833.394161  |
|        twins_pcpvt_base         | 128  | 827.613416  |
|        tnt_s_patch16_224        | 128  | 818.992948  |
|       gluon_inception_v3        | 256  | 806.608374  |
| deit_base_distilled_patch16_224 |  64  | 693.375908  |
|      vit_base_patch16_224       |  64  | 689.310819  |
|      beit_base_patch16_224      |  64  | 680.665321  |
|        res2net101_26w_4s        | 128  | 639.663817  |
|     swsl_resnext101_32x16d      |  32  | 639.080732  |
|            nfnet_l0             | 128  | 603.972843  |
|            levit_128            | 1024 | 563.240453  |
|          gmixer_24_224          | 128  | 561.013088  |
|            pit_b_224            |  64  | 556.956289  |
|          gmlp_s16_224           | 128  |  553.96907  |
|        ese_vovnet19b_dw         | 256  | 551.193059  |
|          jx_nest_base           |  32  | 525.409965  |
|             dla102              | 128  | 523.266705  |
|         crossvit_9_240          | 256  | 519.389536  |
|           resnest101e           |  64  | 488.237382  |
|         poolformer_m36          |  64  | 468.415703  |
|        convmixer_768_32         |  32  | 441.825572  |
|           volo_d1_224           |  64  |  433.10414  |
|            hrnet_w18            | 128  | 431.757292  |
|          inception_v3           | 128  |  406.0987   |
|        adv_inception_v3         | 128  | 405.447642  |
|        res2net50_14w_8s         | 128  | 395.026913  |
|         visformer_small         | 128  | 389.198695  |
|          ghostnet_100           | 512  | 355.898419  |
|           res2next50            | 128  | 355.555635  |
|            mixnet_l             | 128  | 345.827795  |
|          pnasnet5large          |  16  | 334.732041  |
|            repvgg_a2            | 128  | 333.114519  |
|           tf_mixnet_l           | 128  | 331.978504  |
|        eca_halonext26ts         | 128  | 310.929453  |
|           fbnetc_100            | 512  | 304.436663  |
|       eca_botnext26ts_256       | 128  |  290.91009  |
|            gernet_l             | 128  | 290.261208  |
|           regnety_002           | 1024 | 280.463938  |
|        sebotnet33ts_256         |  64  |  279.03047  |
|          botnet26t_256          | 128  | 274.671404  |
|          resmlp_12_224          | 128  | 263.583227  |
|           mobilevit_s           |  64  | 258.779006  |
|           mnasnet_100           | 512  | 257.839289  |
|          cspdarknet53           |  64  | 257.588738  |
|            fbnetv3_b            | 256  | 239.836178  |
|           selecsls42b           | 128  |  231.08715  |
|      mobilenetv3_large_100      | 512  | 227.229916  |
|           rexnet_100            | 256  | 225.140985  |
|       tf_efficientnet_b0        | 128  |  118.2646   |
|            tinynet_a            | 128  |  86.530946  |
|         mobilenetv2_100         | 128  |  71.666488  |
|          spnasnet_100           | 128  |  65.37866   |
|            lcnet_050            | 256  |  26.974227  |
+---------------------------------+------+-------------+

@zxd1997066
Copy link
Contributor

[cppwrapper_dynamic_shape] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-04-21 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main bad8d25881d850eaf0b326f6ce5c78305e38c001
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 86%, 68/79 | 100%, 46/46 | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.57x    |    1.20x    |    1.54x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   55.57    |    37.40    |    34.44    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.86x    |    0.81x    |    0.82x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 56.775445 |
|     pyhpc_equation_of_state     |    1    | 29.957414 |
|          squeezenet1_1          |    1    | 3.587657  |
|         basic_gnn_sage          |    1    | 3.586685  |
|     functorch_maml_omniglot     |    1    | 3.444776  |
|          basic_gnn_gin          |    1    | 3.442157  |
|          maml_omniglot          |    5    |  2.8989   |
|          basic_gnn_gcn          |    1    | 2.861673  |
|           timm_nfnet            |    1    | 2.803839  |
|         opacus_cifar10          |    1    | 2.494537  |
|       shufflenet_v2_x1_0        |    1    |  2.31924  |
|      functorch_dp_cifar10       |    1    |  2.31835  |
|              dcgan              |    1    | 2.256601  |
|            resnet18             |    1    | 2.246083  |
|          lennard_jones          |    1    | 2.184502  |
|          mobilenet_v2           |    1    | 2.051139  |
|          timm_resnest           |    1    | 2.017242  |
|           mnasnet1_0            |    1    | 1.938797  |
|       mobilenet_v3_large        |    1    | 1.901117  |
|           densenet121           |    1    | 1.887915  |
|         phlippe_resnet          |    1    | 1.866538  |
|        phlippe_densenet         |    1    | 1.847067  |
|            resnet50             |    1    | 1.765417  |
|        timm_efficientnet        |    1    | 1.727114  |
|            resnet152            |    1    | 1.680913  |
|         LearningToPaint         |    1    | 1.622904  |
|           timm_vovnet           |    1    | 1.610113  |
|              llama              |    1    | 1.556233  |
|      doctr_reco_predictor       |    1    | 1.511601  |
|         resnext50_32x4d         |    1    |  1.50516  |
|           timm_regnet           |    1    | 1.493108  |
|              dlrm               |    1    | 1.485395  |
|              vgg16              |    1    |  1.44724  |
|        basic_gnn_edgecnn        |    1    | 1.402609  |
|             yolov3              |    1    | 1.381093  |
|             alexnet             |    1    | 1.367756  |
|          BERT_pytorch           |    1    |  1.31004  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.300642  |
|       doctr_det_predictor       |    1    | 1.297189  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.294304  |
|               drq               |    1    | 1.285922  |
|            hf_Albert            |    1    | 1.279996  |
|             hf_GPT2             |    1    | 1.251829  |
|              maml               |    1    | 1.246779  |
|            moondream            |    1    | 1.228772  |
|          hf_GPT2_large          |    1    | 1.228223  |
|     timm_vision_transformer     |    1    | 1.220839  |
|          fastNLP_Bert           |    1    | 1.197436  |
|         pytorch_stargan         |   16    | 1.191811  |
|  timm_vision_transformer_large  |    1    | 1.172134  |
|          hf_Bert_large          |    1    | 1.167108  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.165617  |
|           hf_BigBird            |    1    | 1.160507  |
|             hf_Bert             |    1    | 1.157964  |
|          hf_DistilBert          |    1    |  1.1363   |
|      torch_multimodal_clip      |    1    | 1.132179  |
|             hf_Bart             |    1    | 1.099708  |
|        hf_distil_whisper        |    1    | 1.083766  |
|       speech_transformer        |    1    | 1.072568  |
|          pytorch_unet           |    1    | 1.070186  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.047552  |
|        soft_actor_critic        |   256   | 1.046845  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.045639  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.044499  |
|          hf_Longformer          |    1    | 1.018363  |
|             demucs              |    1    |  1.00435  |
|           tts_angular           |    1    | 0.999736  |
|     resnet50_quantized_qat      |    1    | 0.996327  |
|   mobilenet_v2_quantized_qat    |    1    | 0.988582  |
|     nvidia_deeprecommender      |    1    | 0.959485  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.913384  |
|           hf_Reformer           |    1    | 0.861134  |
|       Background_Matting        |    1    | 0.832634  |
|           hf_T5_large           |    1    | 0.792708  |
|              hf_T5              |    1    | 0.711471  |
|           hf_T5_base            |    1    | 0.592218  |
|              moco               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|         basic_gnn_sage          |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|               drq               |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|              llama              |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|              hf_T5              |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|       speech_transformer        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|           timm_regnet           |    1    |        pass        |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_fpn |    1    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
|           Super_SloMo           |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    1    |    fail_to_run     |
|         resnext50_32x4d         |    1    |   fail_accuracy    |
|            resnet152            |    1    |   fail_accuracy    |
|       shufflenet_v2_x1_0        |    1    |   fail_accuracy    |
|           mnasnet1_0            |    1    |   fail_accuracy    |
|          mobilenet_v2           |    1    |   fail_accuracy    |
|            resnet18             |    1    |   fail_accuracy    |
|            resnet50             |    1    |   fail_accuracy    |
|           densenet121           |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 478.175128 |
|    detectron2_fcos_r_50_fpn     |    1    | 413.614773 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 226.277581 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 220.242276 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 209.798396 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 204.723185 |
|              maml               |    1    | 142.833579 |
|           hf_T5_large           |    1    | 111.180205 |
|           hf_T5_base            |    1    | 105.68489  |
|       speech_transformer        |    1    | 90.345207  |
|          hf_Longformer          |    1    | 84.954469  |
|           hf_Reformer           |    1    | 81.428985  |
|          basic_gnn_gcn          |    1    | 62.003793  |
|          fastNLP_Bert           |    1    | 54.491161  |
|            resnet152            |    1    | 53.443071  |
|           densenet121           |    1    |  50.25351  |
|  timm_vision_transformer_large  |    1    | 49.796346  |
|       doctr_det_predictor       |    1    | 45.717841  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 42.848896  |
|          hf_Bert_large          |    1    | 42.681337  |
|          hf_GPT2_large          |    1    | 41.721205  |
|            moondream            |    1    | 41.326625  |
|        hf_distil_whisper        |    1    | 40.553426  |
|      torch_multimodal_clip      |    1    | 36.711515  |
|             demucs              |    1    | 35.056946  |
|           timm_regnet           |    1    | 33.680725  |
|           timm_nfnet            |    1    | 32.626198  |
|              hf_T5              |    1    | 31.780141  |
|             hf_Bart             |    1    | 30.745214  |
|       Background_Matting        |    1    |  29.50512  |
|             yolov3              |    1    | 29.449724  |
|        timm_efficientnet        |    1    | 28.463651  |
|          BERT_pytorch           |    1    | 28.166348  |
|             hf_Bert             |    1    | 26.703023  |
|        phlippe_densenet         |    1    | 26.185213  |
|      doctr_reco_predictor       |    1    | 26.131495  |
|       shufflenet_v2_x1_0        |    1    | 25.341691  |
|            hf_Albert            |    1    | 24.868069  |
|             hf_GPT2             |    1    | 24.338008  |
|         pytorch_stargan         |   16    | 24.279376  |
|       mobilenet_v3_large        |    1    |  23.57727  |
|     timm_vision_transformer     |    1    | 23.555338  |
|              llama              |    1    | 23.425623  |
|         opacus_cifar10          |    1    | 21.888043  |
|            resnet50             |    1    | 21.775742  |
|         resnext50_32x4d         |    1    | 21.753633  |
|           timm_vovnet           |    1    | 21.670528  |
|          timm_resnest           |    1    | 21.346667  |
|          mobilenet_v2           |    1    | 21.131828  |
|           mnasnet1_0            |    1    | 20.868357  |
|          hf_DistilBert          |    1    | 20.771632  |
|      functorch_dp_cifar10       |    1    | 20.609299  |
|     pyhpc_isoneutral_mixing     |    1    | 20.548868  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 18.956895  |
|          pytorch_unet           |    1    |  18.64347  |
|          squeezenet1_1          |    1    | 18.057882  |
|            resnet18             |    1    | 17.455359  |
|         LearningToPaint         |    1    | 17.235456  |
|              vgg16              |    1    | 17.195135  |
|     pyhpc_equation_of_state     |    1    | 17.156086  |
|         phlippe_resnet          |    1    |  17.04454  |
|             alexnet             |    1    | 16.837857  |
|               drq               |    1    | 16.116831  |
|     functorch_maml_omniglot     |    1    | 15.958993  |
|          maml_omniglot          |    5    | 15.866956  |
|     nvidia_deeprecommender      |    1    | 15.678137  |
|              dlrm               |    1    | 15.310724  |
|              dcgan              |    1    | 15.060839  |
|          basic_gnn_gin          |    1    | 14.759822  |
|        soft_actor_critic        |   256   | 14.730881  |
|         basic_gnn_sage          |    1    | 14.711204  |
|        basic_gnn_edgecnn        |    1    | 14.705098  |
|          lennard_jones          |    1    | 14.258884  |
|           tts_angular           |    1    |  14.06379  |
|   mobilenet_v2_quantized_qat    |    1    |  0.093643  |
|     resnet50_quantized_qat      |    1    |  0.067853  |
|        timm_efficientdet        |    0    |    0.0     |
|              moco               |    0    |    0.0     |
|         DALLE2_pytorch          |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.988115 |
|           hf_T5_base            |    1    | 0.987952 |
|       Background_Matting        |    1    | 0.983343 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.982432 |
|             demucs              |    1    | 0.982399 |
|     resnet50_quantized_qat      |    1    | 0.982037 |
|          pytorch_unet           |    1    | 0.977276 |
|          hf_GPT2_large          |    1    | 0.974504 |
|   mobilenet_v2_quantized_qat    |    1    | 0.973175 |
|       doctr_det_predictor       |    1    | 0.971353 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.97125  |
|    detectron2_fcos_r_50_fpn     |    1    | 0.960247 |
|        basic_gnn_edgecnn        |    1    | 0.955071 |
|         LearningToPaint         |    1    | 0.949146 |
|           hf_BigBird            |    1    | 0.944876 |
|         pytorch_stargan         |   16    | 0.94434  |
|      doctr_reco_predictor       |    1    |  0.9432  |
|              llama              |    1    | 0.920446 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.91696  |
|          basic_gnn_gin          |    1    | 0.908988 |
|      torch_multimodal_clip      |    1    | 0.908959 |
|          basic_gnn_gcn          |    1    | 0.90763  |
|         basic_gnn_sage          |    1    | 0.904145 |
|        hf_distil_whisper        |    1    | 0.893406 |
|           tts_angular           |    1    | 0.888889 |
|        soft_actor_critic        |   256   | 0.888478 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.886882 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.88682  |
|         opacus_cifar10          |    1    | 0.884248 |
|        timm_efficientnet        |    1    | 0.880037 |
|          mobilenet_v2           |    1    | 0.877347 |
|           mnasnet1_0            |    1    | 0.86585  |
|          squeezenet1_1          |    1    | 0.862588 |
|          maml_omniglot          |    5    | 0.859705 |
|          lennard_jones          |    1    | 0.859393 |
|              dcgan              |    1    | 0.859141 |
|          timm_resnest           |    1    | 0.856739 |
|     functorch_maml_omniglot     |    1    | 0.855941 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.854812 |
|          fastNLP_Bert           |    1    | 0.85023  |
|       mobilenet_v3_large        |    1    | 0.848345 |
|       shufflenet_v2_x1_0        |    1    | 0.844709 |
|         phlippe_resnet          |    1    | 0.841935 |
|     pyhpc_equation_of_state     |    1    | 0.83167  |
|       speech_transformer        |    1    | 0.827006 |
|        phlippe_densenet         |    1    | 0.824787 |
|         resnext50_32x4d         |    1    | 0.815825 |
|           timm_nfnet            |    1    | 0.815658 |
|          hf_Longformer          |    1    | 0.806742 |
|     pyhpc_isoneutral_mixing     |    1    | 0.806718 |
|          hf_Bert_large          |    1    | 0.806521 |
|           hf_T5_large           |    1    | 0.804486 |
|            hf_Albert            |    1    | 0.802936 |
|     timm_vision_transformer     |    1    | 0.80028  |
|             hf_Bert             |    1    | 0.797144 |
|            moondream            |    1    | 0.796748 |
|              maml               |    1    | 0.788443 |
|             yolov3              |    1    | 0.782409 |
|          BERT_pytorch           |    1    | 0.780508 |
|          hf_DistilBert          |    1    | 0.774319 |
|            resnet50             |    1    | 0.77285  |
|            resnet18             |    1    | 0.769231 |
|             hf_GPT2             |    1    | 0.766595 |
|           densenet121           |    1    | 0.764895 |
|           timm_regnet           |    1    | 0.764585 |
|               drq               |    1    | 0.762046 |
|           timm_vovnet           |    1    | 0.761647 |
|             hf_Bart             |    1    | 0.756035 |
|      functorch_dp_cifar10       |    1    |   0.75   |
|              hf_T5              |    1    | 0.749995 |
|           hf_Reformer           |    1    | 0.745011 |
|             alexnet             |    1    | 0.741474 |
|  timm_vision_transformer_large  |    1    | 0.732283 |
|              vgg16              |    1    | 0.724355 |
|            resnet152            |    1    | 0.695833 |
|     nvidia_deeprecommender      |    1    | 0.673728 |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26469.38237  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11779.636971 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11200.368235 |
|          hf_GPT2_large          |    1    | 10133.546611 |
|           hf_T5_large           |    1    | 7461.328999  |
|            moondream            |    1    | 7310.961182  |
|        hf_distil_whisper        |    1    | 6926.595396  |
|       Background_Matting        |    1    | 6694.436309  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5738.064964  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 5054.699644  |
|          pytorch_unet           |    1    | 4720.720295  |
|  timm_vision_transformer_large  |    1    |  2784.29623  |
|    detectron2_fcos_r_50_fpn     |    1    | 2528.455195  |
|             demucs              |    1    |  2346.29628  |
|         pytorch_stargan         |   16    | 2011.341866  |
|          hf_Bert_large          |    1    | 1757.051187  |
|       doctr_det_predictor       |    1    | 1726.964802  |
|           hf_BigBird            |    1    | 1464.981592  |
|      torch_multimodal_clip      |    1    | 1250.032753  |
|          hf_Longformer          |    1    |  1110.03399  |
|             hf_Bart             |    1    |  880.279219  |
|              hf_T5              |    1    |  758.933901  |
|             hf_Bert             |    1    |  678.223355  |
|       speech_transformer        |    1    |  670.883464  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  623.570923  |
|            hf_Albert            |    1    |  573.512861  |
|          fastNLP_Bert           |    1    |  524.20188   |
|             yolov3              |    1    |  427.106025  |
|          hf_DistilBert          |    1    |  414.035894  |
|           hf_Reformer           |    1    |  411.754538  |
|             hf_GPT2             |    1    |  354.224886  |
|        basic_gnn_edgecnn        |    1    |  230.350348  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  209.77799   |
|              vgg16              |    1    |  189.304296  |
|           timm_regnet           |    1    |  148.249385  |
|          BERT_pytorch           |    1    |  139.425546  |
|            resnet152            |    1    |  135.762501  |
|           timm_nfnet            |    1    |  95.177212   |
|           timm_vovnet           |    1    |  79.023788   |
|              maml               |    1    |  74.450538   |
|     nvidia_deeprecommender      |    1    |  58.115344   |
|     timm_vision_transformer     |    1    |  57.521969   |
|         resnext50_32x4d         |    1    |  56.247899   |
|           tts_angular           |    1    |   54.02811   |
|            resnet50             |    1    |  51.185995   |
|           densenet121           |    1    |  39.507617   |
|          basic_gnn_gcn          |    1    |  35.402401   |
|          timm_resnest           |    1    |  32.815518   |
|      doctr_reco_predictor       |    1    |  23.040855   |
|             alexnet             |    1    |  22.035443   |
|            resnet18             |    1    |  21.563953   |
|              llama              |    1    |   20.39326   |
|     resnet50_quantized_qat      |    1    |  17.196109   |
|          basic_gnn_gin          |    1    |  16.756453   |
|         basic_gnn_sage          |    1    |  16.378553   |
|        timm_efficientnet        |    1    |   12.31737   |
|         LearningToPaint         |    1    |   9.509714   |
|           mnasnet1_0            |    1    |   7.141764   |
|          mobilenet_v2           |    1    |   6.984315   |
|       mobilenet_v3_large        |    1    |   6.624153   |
|   mobilenet_v2_quantized_qat    |    1    |   6.000043   |
|          squeezenet1_1          |    1    |   5.514184   |
|       shufflenet_v2_x1_0        |    1    |   4.914999   |
|        soft_actor_critic        |   256   |   3.380855   |
|        phlippe_densenet         |    1    |   2.647069   |
|      functorch_dp_cifar10       |    1    |   2.176118   |
|         opacus_cifar10          |    1    |   2.132657   |
|               drq               |    1    |   1.790686   |
|              dcgan              |    1    |   1.641117   |
|         phlippe_resnet          |    1    |   1.212071   |
|     functorch_maml_omniglot     |    1    |   0.830444   |
|          maml_omniglot          |    5    |   0.754271   |
|              dlrm               |    1    |    0.5437    |
|     pyhpc_isoneutral_mixing     |    1    |   0.048479   |
|     pyhpc_equation_of_state     |    1    |   0.035872   |
|          lennard_jones          |    1    |   0.031783   |
|              moco               |    0    |     0.0      |
|         DALLE2_pytorch          |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 2.085892 |
|     MobileBertForQuestionAnswering      | 1  |  1.7379  |
|            XLNetLMHeadModel             | 1  | 1.383574 |
|         Speech2Text2ForCausalLM         | 1  | 1.364712 |
|            YituTechConvBert             | 1  | 1.331075 |
|      GPT2ForSequenceClassification      | 1  | 1.323234 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.31055  |
|          DistilBertForMaskedLM          | 1  | 1.302419 |
|     DistilBertForQuestionAnswering      | 1  | 1.297592 |
|       BlenderbotSmallForCausalLM        | 1  | 1.291929 |
|       DebertaForQuestionAnswering       | 1  | 1.26157  |
|       MT5ForConditionalGeneration       | 1  | 1.251497 |
|          BlenderbotForCausalLM          | 1  | 1.250015 |
|     PegasusForConditionalGeneration     | 1  | 1.248669 |
|     M2M100ForConditionalGeneration      | 1  | 1.246622 |
|             XGLMForCausalLM             | 1  | 1.245078 |
|           DebertaForMaskedLM            | 1  | 1.239016 |
|           PegasusForCausalLM            | 1  | 1.237744 |
|               GoogleFnet                | 1  | 1.222821 |
|            AlbertForMaskedLM            | 1  | 1.206681 |
|       AlbertForQuestionAnswering        | 1  | 1.203694 |
|               DistillGPT2               | 1  | 1.180611 |
|    LayoutLMForSequenceClassification    | 1  | 1.180211 |
|           ElectraForCausalLM            | 1  | 1.179982 |
|             BertForMaskedLM             | 1  | 1.178546 |
|    MegatronBertForQuestionAnswering     | 1  | 1.173778 |
|                CamemBert                | 1  | 1.168778 |
|         MegatronBertForCausalLM         | 1  | 1.168394 |
|           RobertaForCausalLM            | 1  | 1.168375 |
|       ElectraForQuestionAnswering       | 1  | 1.162817 |
|        BertForQuestionAnswering         | 1  | 1.162085 |
|           LayoutLMForMaskedLM           | 1  | 1.16201  |
|      DebertaV2ForQuestionAnswering      | 1  | 1.158336 |
|          DebertaV2ForMaskedLM           | 1  | 1.152185 |
|            TrOCRForCausalLM             | 1  | 1.151527 |
|       RobertaForQuestionAnswering       | 1  | 1.140947 |
|     PLBartForConditionalGeneration      | 1  | 1.088241 |
|      MBartForConditionalGeneration      | 1  | 1.068061 |
|             BartForCausalLM             | 1  | 1.061262 |
|      BartForConditionalGeneration       | 1  | 1.052916 |
|             OPTForCausalLM              | 1  | 1.031638 |
|            PLBartForCausalLM            | 1  | 1.022988 |
|            MBartForCausalLM             | 1  | 1.009923 |
|          AllenaiLongformerBase          | 1  | 0.965982 |
|                 T5Small                 | 1  | 0.623194 |
|       T5ForConditionalGeneration        | 1  | 0.621746 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+------------+
|                  name                   | bs |  inductor  |
+-----------------------------------------+----+------------+
|          MobileBertForMaskedLM          | 1  | 122.565586 |
|     MobileBertForQuestionAnswering      | 1  | 120.547243 |
|          AllenaiLongformerBase          | 1  | 92.625341  |
|      BartForConditionalGeneration       | 1  | 56.538442  |
|     PegasusForConditionalGeneration     | 1  | 49.745877  |
|     M2M100ForConditionalGeneration      | 1  | 49.580941  |
|      MBartForConditionalGeneration      | 1  | 49.256905  |
|          BlenderbotForCausalLM          | 1  | 47.572154  |
|            XLNetLMHeadModel             | 1  | 46.337159  |
|             XGLMForCausalLM             | 1  | 46.307666  |
|         MegatronBertForCausalLM         | 1  | 43.696453  |
|          DebertaV2ForMaskedLM           | 1  | 43.658679  |
|      DebertaV2ForQuestionAnswering      | 1  | 43.624622  |
|    MegatronBertForQuestionAnswering     | 1  | 43.348654  |
|       MT5ForConditionalGeneration       | 1  | 42.407179  |
| BlenderbotSmallForConditionalGeneration | 1  | 38.589726  |
|       T5ForConditionalGeneration        | 1  | 36.266575  |
|                 T5Small                 | 1  | 36.220696  |
|            YituTechConvBert             | 1  | 35.771282  |
|     PLBartForConditionalGeneration      | 1  | 31.842087  |
|            MBartForCausalLM             | 1  | 28.839078  |
|            TrOCRForCausalLM             | 1  | 28.804611  |
|           PegasusForCausalLM            | 1  |  28.46897  |
|             OPTForCausalLM              | 1  | 28.178171  |
|           ElectraForCausalLM            | 1  | 27.377208  |
|       ElectraForQuestionAnswering       | 1  | 27.046133  |
|                CamemBert                | 1  | 27.020528  |
|           RobertaForCausalLM            | 1  | 27.014805  |
|           LayoutLMForMaskedLM           | 1  | 26.990287  |
|       RobertaForQuestionAnswering       | 1  |  26.94923  |
|             BertForMaskedLM             | 1  | 26.896743  |
|        BertForQuestionAnswering         | 1  | 26.797579  |
|           DebertaForMaskedLM            | 1  | 26.770029  |
|       DebertaForQuestionAnswering       | 1  | 26.721506  |
|    LayoutLMForSequenceClassification    | 1  | 26.656438  |
|             BartForCausalLM             | 1  | 26.262212  |
|       BlenderbotSmallForCausalLM        | 1  | 23.312397  |
|      GPT2ForSequenceClassification      | 1  | 23.199596  |
|               GoogleFnet                | 1  | 21.763865  |
|            PLBartForCausalLM            | 1  | 21.448745  |
|     DistilBertForQuestionAnswering      | 1  |  21.12008  |
|          DistilBertForMaskedLM          | 1  | 21.101711  |
|         Speech2Text2ForCausalLM         | 1  | 21.003086  |
|               DistillGPT2               | 1  | 19.435826  |
|            AlbertForMaskedLM            | 1  | 17.382242  |
|       AlbertForQuestionAnswering        | 1  | 17.336577  |
+-----------------------------------------+----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.98617  |
|      MBartForConditionalGeneration      | 1  | 0.977017 |
|      GPT2ForSequenceClassification      | 1  | 0.952319 |
|          AllenaiLongformerBase          | 1  | 0.94955  |
|            MBartForCausalLM             | 1  | 0.926021 |
|            XLNetLMHeadModel             | 1  | 0.910649 |
|       T5ForConditionalGeneration        | 1  | 0.906015 |
|     PLBartForConditionalGeneration      | 1  | 0.905952 |
|                 T5Small                 | 1  | 0.905943 |
|            PLBartForCausalLM            | 1  | 0.903065 |
|       DebertaForQuestionAnswering       | 1  | 0.873127 |
|               GoogleFnet                | 1  | 0.854071 |
|       RobertaForQuestionAnswering       | 1  | 0.849243 |
|        BertForQuestionAnswering         | 1  | 0.844138 |
|       ElectraForQuestionAnswering       | 1  | 0.843787 |
|    LayoutLMForSequenceClassification    | 1  | 0.841914 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.835871 |
|    MegatronBertForQuestionAnswering     | 1  | 0.832504 |
|               DistillGPT2               | 1  | 0.830961 |
|           DebertaForMaskedLM            | 1  | 0.819384 |
|           LayoutLMForMaskedLM           | 1  | 0.814816 |
|         Speech2Text2ForCausalLM         | 1  | 0.814512 |
|                CamemBert                | 1  | 0.812392 |
|         MegatronBertForCausalLM         | 1  | 0.811932 |
|           RobertaForCausalLM            | 1  | 0.810668 |
|             BertForMaskedLM             | 1  | 0.806909 |
|     DistilBertForQuestionAnswering      | 1  | 0.800998 |
|           ElectraForCausalLM            | 1  |  0.7992  |
|          BlenderbotForCausalLM          | 1  | 0.798053 |
|          DebertaV2ForMaskedLM           | 1  | 0.796472 |
|             BartForCausalLM             | 1  | 0.789957 |
|            TrOCRForCausalLM             | 1  | 0.786982 |
|       MT5ForConditionalGeneration       | 1  | 0.781112 |
|      BartForConditionalGeneration       | 1  | 0.777543 |
|            YituTechConvBert             | 1  | 0.776104 |
|       BlenderbotSmallForCausalLM        | 1  | 0.762298 |
|           PegasusForCausalLM            | 1  | 0.750355 |
|          DistilBertForMaskedLM          | 1  | 0.746445 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.737752 |
|     MobileBertForQuestionAnswering      | 1  | 0.733103 |
|     PegasusForConditionalGeneration     | 1  | 0.715788 |
|     M2M100ForConditionalGeneration      | 1  | 0.705872 |
|             XGLMForCausalLM             | 1  | 0.705763 |
|          MobileBertForMaskedLM          | 1  | 0.701622 |
|            AlbertForMaskedLM            | 1  | 0.44789  |
|       AlbertForQuestionAnswering        | 1  | 0.44352  |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12732.369735 |
|       AlbertForQuestionAnswering        | 1  | 12692.146884 |
|      MBartForConditionalGeneration      | 1  | 6125.456976  |
|      BartForConditionalGeneration       | 1  |  5748.17409  |
|             OPTForCausalLM              | 1  | 5213.425431  |
|          DebertaV2ForMaskedLM           | 1  | 5046.285601  |
|      DebertaV2ForQuestionAnswering      | 1  |  3954.61823  |
|            XLNetLMHeadModel             | 1  | 3111.247388  |
|            MBartForCausalLM             | 1  |  3040.74034  |
|          BlenderbotForCausalLM          | 1  | 2626.901826  |
|             BartForCausalLM             | 1  | 2545.015466  |
|       T5ForConditionalGeneration        | 1  |  2479.41544  |
|                 T5Small                 | 1  | 2457.076078  |
|          AllenaiLongformerBase          | 1  |  2408.36028  |
|     PLBartForConditionalGeneration      | 1  | 2174.090061  |
|         MegatronBertForCausalLM         | 1  | 2035.673808  |
|    MegatronBertForQuestionAnswering     | 1  | 1855.596213  |
|      GPT2ForSequenceClassification      | 1  | 1313.876308  |
|            PLBartForCausalLM            | 1  | 1217.471062  |
|             XGLMForCausalLM             | 1  |  832.739533  |
|           DebertaForMaskedLM            | 1  |  784.818728  |
|           RobertaForCausalLM            | 1  |  776.671268  |
|     M2M100ForConditionalGeneration      | 1  |  711.593943  |
|                CamemBert                | 1  |  688.916308  |
|           LayoutLMForMaskedLM           | 1  |  683.362061  |
|             BertForMaskedLM             | 1  |  680.959531  |
|            YituTechConvBert             | 1  |  668.837935  |
|     PegasusForConditionalGeneration     | 1  |  602.433977  |
|            TrOCRForCausalLM             | 1  |  586.156605  |
|       DebertaForQuestionAnswering       | 1  |  557.552012  |
|       RobertaForQuestionAnswering       | 1  |  551.589131  |
|        BertForQuestionAnswering         | 1  |  545.471605  |
|    LayoutLMForSequenceClassification    | 1  |  541.218104  |
|               DistillGPT2               | 1  |  507.052972  |
|               GoogleFnet                | 1  |  472.095481  |
|       MT5ForConditionalGeneration       | 1  |  301.992143  |
|           PegasusForCausalLM            | 1  |  299.791492  |
| BlenderbotSmallForConditionalGeneration | 1  |  143.750454  |
|           ElectraForCausalLM            | 1  |  135.422008  |
|          DistilBertForMaskedLM          | 1  |  99.423357   |
|       ElectraForQuestionAnswering       | 1  |   93.11353   |
|       BlenderbotSmallForCausalLM        | 1  |  83.987355   |
|          MobileBertForMaskedLM          | 1  |  63.617416   |
|     DistilBertForQuestionAnswering      | 1  |  63.363023   |
|     MobileBertForQuestionAnswering      | 1  |  36.761912   |
|         Speech2Text2ForCausalLM         | 1  |  18.226191   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.45117  |
|          inception_v3           | 1  | 2.275755 |
|       gluon_inception_v3        | 1  | 2.251344 |
|        adv_inception_v3         | 1  | 2.232316 |
|           dm_nfnet_f0           | 1  | 2.229979 |
|            nfnet_l0             | 1  | 2.198749 |
|          ghostnet_100           | 1  | 2.15491  |
|         mobilenetv2_100         | 1  | 2.131096 |
|          spnasnet_100           | 1  | 2.072692 |
|            levit_128            | 1  | 2.034356 |
|            lcnet_050            | 1  | 2.033662 |
|           mnasnet_100           | 1  | 2.022395 |
|           fbnetc_100            | 1  | 1.997452 |
|            repvgg_a2            | 1  | 1.991048 |
|            hrnet_w18            | 1  | 1.990854 |
|      mobilenetv3_large_100      | 1  | 1.955342 |
|           regnety_002           | 1  | 1.89346  |
|           rexnet_100            | 1  | 1.809458 |
|       tf_efficientnet_b0        | 1  | 1.805205 |
|            fbnetv3_b            | 1  | 1.804086 |
|           selecsls42b           | 1  | 1.789008 |
|             dla102              | 1  | 1.72376  |
|        ese_vovnet19b_dw         | 1  | 1.713475 |
|            tinynet_a            | 1  | 1.699406 |
|          botnet26t_256          | 1  | 1.693723 |
|       eca_botnext26ts_256       | 1  | 1.674356 |
|          cspdarknet53           | 1  | 1.634206 |
|           resnest101e           | 1  | 1.629079 |
|        eca_halonext26ts         | 1  | 1.571609 |
|           res2next50            | 1  | 1.559975 |
|        res2net50_14w_8s         | 1  | 1.540462 |
|         poolformer_m36          | 1  | 1.534682 |
|           volo_d1_224           | 1  | 1.502508 |
|           mobilevit_s           | 1  | 1.501608 |
|        res2net101_26w_4s        | 1  | 1.499298 |
|           tf_mixnet_l           | 1  | 1.485762 |
|         visformer_small         | 1  | 1.413458 |
|           convit_base           | 1  | 1.386255 |
|          gmixer_24_224          | 1  | 1.384483 |
|            mixnet_l             | 1  | 1.362373 |
|     swsl_resnext101_32x16d      | 1  | 1.347222 |
|        twins_pcpvt_base         | 1  | 1.340592 |
|            gernet_l             | 1  | 1.316146 |
|  swin_base_patch4_window7_224   | 1  | 1.29011  |
|      beit_base_patch16_224      | 1  | 1.279615 |
|          resmlp_12_224          | 1  | 1.267181 |
|        convmixer_768_32         | 1  | 1.257619 |
|             dpn107              | 1  | 1.216123 |
|          mixer_b16_224          | 1  | 1.210232 |
| deit_base_distilled_patch16_224 | 1  | 1.203397 |
|      vit_base_patch16_224       | 1  | 1.199871 |
|      xcit_large_24_p8_224       | 1  | 1.189057 |
|         crossvit_9_240          | 1  | 1.187208 |
|        tnt_s_patch16_224        | 1  | 1.18103  |
|          jx_nest_base           | 1  | 1.172565 |
|            pit_b_224            | 1  | 1.165758 |
|          gmlp_s16_224           | 1  | 1.158256 |
|          convnext_base          | 1  | 1.144592 |
|        sebotnet33ts_256         | 1  | 1.11116  |
|          cait_m36_384           | 1  | 0.98103  |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|       eca_botnext26ts_256       | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          cait_m36_384           | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
|        eca_halonext26ts         | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|          botnet26t_256          | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|        convmixer_768_32         | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|        sebotnet33ts_256         | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|         coat_lite_mini          | 1  |  fail_to_run  |
|           resnest101e           | 1  | fail_accuracy |
|            levit_128            | 1  | fail_accuracy |
|          ghostnet_100           | 1  | fail_accuracy |
|          pnasnet5large          | 1  | fail_accuracy |
|        res2net101_26w_4s        | 1  | fail_accuracy |
|        res2net50_14w_8s         | 1  | fail_accuracy |
|             dla102              | 1  | fail_accuracy |
|     swsl_resnext101_32x16d      | 1  | fail_accuracy |
|           rexnet_100            | 1  | fail_accuracy |
|           selecsls42b           | 1  | fail_accuracy |
|           res2next50            | 1  | fail_accuracy |
|            hrnet_w18            | 1  | fail_accuracy |
|           volo_d1_224           | 1  | fail_accuracy |
|         visformer_small         | 1  | fail_accuracy |
|      xcit_large_24_p8_224       | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+------------+
|              name               | bs |  inductor  |
+---------------------------------+----+------------+
|          pnasnet5large          | 1  | 490.874258 |
|            hrnet_w18            | 1  | 266.515619 |
|        res2net101_26w_4s        | 1  | 80.194112  |
|           tf_mixnet_l           | 1  | 72.476506  |
|           resnest101e           | 1  | 71.850546  |
|            mixnet_l             | 1  |  67.52312  |
|          cait_m36_384           | 1  | 67.409465  |
|        res2net50_14w_8s         | 1  | 64.726407  |
|        twins_pcpvt_base         | 1  | 61.281544  |
|      xcit_large_24_p8_224       | 1  |  58.39541  |
|         poolformer_m36          | 1  | 57.078928  |
|  swin_base_patch4_window7_224   | 1  | 56.062974  |
|             dpn107              | 1  | 51.193965  |
|        tnt_s_patch16_224        | 1  | 45.805067  |
|          jx_nest_base           | 1  | 45.024692  |
|           mobilevit_s           | 1  | 43.277467  |
|            fbnetv3_b            | 1  | 42.911109  |
|          convnext_base          | 1  | 38.170333  |
|             dla102              | 1  | 37.341686  |
|        adv_inception_v3         | 1  | 36.584814  |
|       gluon_inception_v3        | 1  | 36.544629  |
|          inception_v3           | 1  | 36.536228  |
|          gmlp_s16_224           | 1  | 35.258835  |
|           volo_d1_224           | 1  | 34.990813  |
|          ghostnet_100           | 1  | 34.683771  |
|          gmixer_24_224          | 1  | 34.605083  |
|           res2next50            | 1  | 34.436759  |
|         crossvit_9_240          | 1  | 33.239095  |
|            tinynet_a            | 1  | 32.638258  |
|           dm_nfnet_f0           | 1  | 32.321751  |
|     swsl_resnext101_32x16d      | 1  | 32.139088  |
|        sebotnet33ts_256         | 1  | 31.713947  |
|        eca_halonext26ts         | 1  | 31.673934  |
|            levit_128            | 1  | 31.132793  |
|           rexnet_100            | 1  |  30.48105  |
|            nfnet_l0             | 1  | 30.230325  |
|        convmixer_768_32         | 1  | 30.088706  |
|       tf_efficientnet_b0        | 1  | 29.845435  |
|           convit_base           | 1  | 27.267359  |
|         visformer_small         | 1  | 26.542806  |
|       eca_botnext26ts_256       | 1  |  26.47967  |
|          cspdarknet53           | 1  | 25.675656  |
|           regnety_002           | 1  | 25.625366  |
|            pit_b_224            | 1  | 25.595022  |
|          botnet26t_256          | 1  |  25.51369  |
|      beit_base_patch16_224      | 1  | 24.617362  |
|      mobilenetv3_large_100      | 1  | 23.878685  |
| deit_base_distilled_patch16_224 | 1  | 23.617544  |
|      vit_base_patch16_224       | 1  | 23.586444  |
|           fbnetc_100            | 1  | 23.539171  |
|          spnasnet_100           | 1  | 23.289252  |
|          mixer_b16_224          | 1  |  23.27914  |
|            gernet_l             | 1  | 22.610419  |
|            repvgg_a2            | 1  | 22.524757  |
|         mobilenetv2_100         | 1  | 21.523405  |
|        ese_vovnet19b_dw         | 1  | 21.470961  |
|           mnasnet_100           | 1  | 21.216722  |
|          resmlp_12_224          | 1  | 20.392134  |
|           selecsls42b           | 1  | 19.720248  |
|            lcnet_050            | 1  | 18.623536  |
+---------------------------------+----+------------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.945033 |
|          pnasnet5large          | 1  | 0.931058 |
|        convmixer_768_32         | 1  |  0.9185  |
|            nfnet_l0             | 1  | 0.904133 |
|      xcit_large_24_p8_224       | 1  | 0.89033  |
|        ese_vovnet19b_dw         | 1  | 0.888804 |
|            fbnetv3_b            | 1  | 0.883541 |
|         mobilenetv2_100         | 1  | 0.881878 |
|       tf_efficientnet_b0        | 1  | 0.879652 |
|           mnasnet_100           | 1  | 0.876929 |
|          spnasnet_100           | 1  | 0.875029 |
|      mobilenetv3_large_100      | 1  | 0.869221 |
|           fbnetc_100            | 1  | 0.867555 |
|            tinynet_a            | 1  | 0.863951 |
|       eca_botnext26ts_256       | 1  | 0.862589 |
|            lcnet_050            | 1  | 0.862103 |
|           rexnet_100            | 1  | 0.861363 |
|         poolformer_m36          | 1  | 0.85571  |
|           dm_nfnet_f0           | 1  | 0.855393 |
|           mobilevit_s           | 1  | 0.851105 |
|          ghostnet_100           | 1  | 0.850899 |
|           tf_mixnet_l           | 1  | 0.849152 |
|        eca_halonext26ts         | 1  | 0.843947 |
|          botnet26t_256          | 1  | 0.843575 |
|            mixnet_l             | 1  | 0.841754 |
|           regnety_002           | 1  | 0.841435 |
|          resmlp_12_224          | 1  | 0.829409 |
|         visformer_small         | 1  | 0.821309 |
|           res2next50            | 1  | 0.816252 |
|            levit_128            | 1  | 0.809084 |
|             dpn107              | 1  | 0.802441 |
|          convnext_base          | 1  | 0.802236 |
|        sebotnet33ts_256         | 1  | 0.79944  |
|        res2net50_14w_8s         | 1  | 0.799304 |
|            hrnet_w18            | 1  | 0.797637 |
|          cspdarknet53           | 1  | 0.795567 |
|          gmlp_s16_224           | 1  | 0.794777 |
|          gmixer_24_224          | 1  | 0.791685 |
|        tnt_s_patch16_224        | 1  | 0.786048 |
|           volo_d1_224           | 1  | 0.785705 |
|        twins_pcpvt_base         | 1  | 0.783944 |
|         crossvit_9_240          | 1  | 0.782695 |
|           convit_base           | 1  | 0.781468 |
|             dla102              | 1  | 0.778767 |
|          mixer_b16_224          | 1  | 0.776942 |
|          jx_nest_base           | 1  | 0.774045 |
|           resnest101e           | 1  | 0.774016 |
|      beit_base_patch16_224      | 1  | 0.771362 |
|       gluon_inception_v3        | 1  | 0.769822 |
|          inception_v3           | 1  | 0.769554 |
|        adv_inception_v3         | 1  | 0.769058 |
| deit_base_distilled_patch16_224 | 1  | 0.76332  |
|      vit_base_patch16_224       | 1  | 0.760396 |
|            pit_b_224            | 1  | 0.755338 |
|  swin_base_patch4_window7_224   | 1  | 0.742816 |
|        res2net101_26w_4s        | 1  | 0.741378 |
|           selecsls42b           | 1  | 0.741199 |
|            gernet_l             | 1  | 0.735898 |
|            repvgg_a2            | 1  | 0.692561 |
|     swsl_resnext101_32x16d      | 1  | 0.640778 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 3641.937596 |
|      xcit_large_24_p8_224       | 1  | 1532.303128 |
|     swsl_resnext101_32x16d      | 1  | 440.814026  |
|          pnasnet5large          | 1  | 355.121277  |
|          convnext_base          | 1  |  306.47775  |
|             dpn107              | 1  | 256.873784  |
|        convmixer_768_32         | 1  | 243.923076  |
|          jx_nest_base           | 1  | 232.621075  |
|      beit_base_patch16_224      | 1  |  197.81396  |
| deit_base_distilled_patch16_224 | 1  | 195.928941  |
|      vit_base_patch16_224       | 1  | 195.581285  |
|           convit_base           | 1  | 194.667762  |
|  swin_base_patch4_window7_224   | 1  | 193.898101  |
|            pit_b_224            | 1  | 167.820146  |
|           resnest101e           | 1  |  163.5918   |
|           dm_nfnet_f0           | 1  | 156.980592  |
|          mixer_b16_224          | 1  | 139.355728  |
|         poolformer_m36          | 1  | 136.554654  |
|        res2net101_26w_4s        | 1  | 110.025075  |
|        twins_pcpvt_base         | 1  | 104.464399  |
|            nfnet_l0             | 1  |  93.532397  |
|           volo_d1_224           | 1  |  93.423413  |
|        tnt_s_patch16_224        | 1  |  92.101052  |
|             dla102              | 1  |  88.858274  |
|        sebotnet33ts_256         | 1  |  84.175244  |
|            hrnet_w18            | 1  |  82.740517  |
|          cspdarknet53           | 1  |  81.639195  |
|       gluon_inception_v3        | 1  |  71.592366  |
|          inception_v3           | 1  |  71.515557  |
|        adv_inception_v3         | 1  |  71.493918  |
|          gmlp_s16_224           | 1  |  69.254104  |
|         visformer_small         | 1  |  65.78337   |
|            repvgg_a2            | 1  |  62.558406  |
|        res2net50_14w_8s         | 1  |  62.307177  |
|          gmixer_24_224          | 1  |  61.786075  |
|           res2next50            | 1  |  57.818302  |
|            gernet_l             | 1  |  55.803564  |
|           selecsls42b           | 1  |  44.024688  |
|          botnet26t_256          | 1  |  43.938341  |
|        eca_halonext26ts         | 1  |  43.503999  |
|           mobilevit_s           | 1  |  40.949935  |
|       eca_botnext26ts_256       | 1  |  39.573063  |
|          resmlp_12_224          | 1  |  35.776793  |
|         crossvit_9_240          | 1  |  32.702559  |
|        ese_vovnet19b_dw         | 1  |  30.491427  |
|            mixnet_l             | 1  |  27.108389  |
|           tf_mixnet_l           | 1  |  26.434282  |
|            fbnetv3_b            | 1  |  14.441304  |
|       tf_efficientnet_b0        | 1  |  12.616951  |
|           rexnet_100            | 1  |  12.515301  |
|            tinynet_a            | 1  |  11.253794  |
|           fbnetc_100            | 1  |  8.606279   |
|            levit_128            | 1  |  8.234482   |
|          spnasnet_100           | 1  |  7.637889   |
|          ghostnet_100           | 1  |   7.58111   |
|           mnasnet_100           | 1  |  7.085285   |
|         mobilenetv2_100         | 1  |  6.970685   |
|      mobilenetv3_large_100      | 1  |   6.61853   |
|           regnety_002           | 1  |  5.834446   |
|            lcnet_050            | 1  |  2.223803   |
+---------------------------------+----+-------------+

@zxd1997066
Copy link
Contributor

[dynamic] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-04-22 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 7706cd7d12781b8c537dd045745738d60c6c31f1
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000" 

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 79%, 62/78 | 100%, 46/46 | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.36x    |    1.28x    |    1.81x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   26.64    |    41.60    |    49.03    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.90x    |    0.98x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 11.010697 |
|          squeezenet1_1          |   16    | 2.908089  |
|        timm_efficientnet        |   64    | 2.777417  |
|           mnasnet1_0            |   32    | 2.644122  |
|       mobilenet_v3_large        |   32    | 2.611947  |
|          mobilenet_v2           |   16    | 2.587159  |
|       shufflenet_v2_x1_0        |   64    | 2.407141  |
|          timm_resnest           |   32    | 2.275795  |
|            resnet50             |   32    | 2.202143  |
|            resnet152            |   32    | 1.995648  |
|           densenet121           |   64    | 1.947928  |
|        phlippe_densenet         |   128   | 1.944809  |
|       doctr_det_predictor       |    1    | 1.937201  |
|             hf_GPT2             |    1    | 1.891385  |
|           timm_nfnet            |   128   | 1.873614  |
|           timm_regnet           |   32    | 1.854058  |
|         resnext50_32x4d         |    8    | 1.742898  |
|         phlippe_resnet          |   128   | 1.729096  |
|           timm_vovnet           |   32    | 1.688946  |
|            resnet18             |    8    | 1.640246  |
|             alexnet             |   128   | 1.617272  |
|          hf_Bert_large          |    1    | 1.572013  |
|            hf_Albert            |    1    | 1.541353  |
|            moondream            |    1    | 1.525613  |
|             yolov3              |    8    | 1.518415  |
|      doctr_reco_predictor       |    1    | 1.509126  |
|          hf_GPT2_large          |    1    | 1.507726  |
|        basic_gnn_edgecnn        |    1    | 1.503446  |
|          fastNLP_Bert           |    1    | 1.491111  |
|         LearningToPaint         |   96    | 1.441759  |
|             hf_Bert             |    1    | 1.436844  |
|     functorch_maml_omniglot     |    1    | 1.402588  |
|              dcgan              |   256   | 1.390561  |
|          hf_Longformer          |    1    |  1.38102  |
|          hf_DistilBert          |    1    | 1.299967  |
|              vgg16              |    4    |  1.28911  |
|          basic_gnn_gcn          |    1    | 1.272189  |
|             hf_Bart             |    1    | 1.266662  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.238053  |
|        hf_distil_whisper        |    1    | 1.219513  |
|          pytorch_unet           |    1    | 1.215637  |
|         basic_gnn_sage          |    1    | 1.202286  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.201701  |
|         pytorch_stargan         |   16    | 1.191784  |
|           hf_T5_large           |    1    | 1.190992  |
|           hf_BigBird            |    1    | 1.190112  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.164016  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.162697  |
|              dlrm               |  2048   | 1.138735  |
|        soft_actor_critic        |   256   |  1.12607  |
|      torch_multimodal_clip      |   32    | 1.120774  |
|          BERT_pytorch           |    2    | 1.110769  |
|              hf_T5              |    1    | 1.108229  |
|          basic_gnn_gin          |    1    | 1.106848  |
|          lennard_jones          |  1000   | 1.097015  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.096159  |
|          maml_omniglot          |    5    | 1.080242  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.078386  |
|     nvidia_deeprecommender      |   256   | 1.061053  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.050105  |
|     timm_vision_transformer     |   32    | 1.016276  |
|             demucs              |    1    | 1.011548  |
|           hf_Reformer           |    1    | 1.010569  |
|     resnet50_quantized_qat      |   32    | 1.007654  |
|  timm_vision_transformer_large  |   32    | 1.006452  |
|   mobilenet_v2_quantized_qat    |   96    | 0.997463  |
|           tts_angular           |   64    | 0.996338  |
|       speech_transformer        |    1    | 0.994281  |
|               drq               |    1    | 0.949136  |
|       Background_Matting        |    1    |  0.81133  |
|           hf_T5_base            |    1    | 0.779867  |
|              maml               |    1    | 0.700924  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.678794  |
|         opacus_cifar10          |   64    | 0.636162  |
|      functorch_dp_cifar10       |   64    | 0.581821  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+--------------------------------+---------+--------------------+
|              name              |   bs    |      inductor      |
+--------------------------------+---------+--------------------+
|          hf_T5_large           |    4    |  pass_due_to_skip  |
|       Background_Matting       |    1    |  pass_due_to_skip  |
| timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|         hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml              |    1    |  pass_due_to_skip  |
|         basic_gnn_sage         |    1    |        pass        |
|           hf_T5_base           |    4    |        pass        |
|      doctr_det_predictor       |    4    |        pass        |
|              dlrm              |    4    |        pass        |
|    detectron2_fcos_r_50_fpn    |    4    |        pass        |
|             demucs             |    1    |        pass        |
|         basic_gnn_gcn          |    1    |        pass        |
|         basic_gnn_gin          |    1    |        pass        |
|              drq               |    1    |        pass        |
|       basic_gnn_edgecnn        |    1    |        pass        |
|        LearningToPaint         |    4    |        pass        |
|      functorch_dp_cifar10      |    4    |        pass        |
|      doctr_reco_predictor      |    4    |        pass        |
|             yolov3             |    4    |        pass        |
|          fastNLP_Bert          |    4    |        pass        |
|         maml_omniglot          |    5    |        pass        |
|    functorch_maml_omniglot     |    1    |        pass        |
|            hf_Bart             |    4    |        pass        |
|            hf_Bert             |    4    |        pass        |
|             hf_T5              |    4    |        pass        |
|         hf_Bert_large          |    4    |        pass        |
|          hf_Reformer           |    4    |        pass        |
|         hf_Longformer          |    4    |        pass        |
|           hf_BigBird           |    4    |        pass        |
|         hf_DistilBert          |    4    |        pass        |
|            hf_GPT2             |    2    |        pass        |
|           hf_Albert            |    4    |        pass        |
|       hf_distil_whisper        |    4    |        pass        |
|            alexnet             |    4    |        pass        |
|        pytorch_stargan         |   16    |        pass        |
|         lennard_jones          |    4    |        pass        |
|         opacus_cifar10         |    4    |        pass        |
|    pyhpc_isoneutral_mixing     |    4    |        pass        |
|    pyhpc_equation_of_state     |    4    |        pass        |
|         phlippe_resnet         |    4    |        pass        |
|        phlippe_densenet        |    4    |        pass        |
|       mobilenet_v3_large       |    4    |        pass        |
|     nvidia_deeprecommender     |    4    |        pass        |
|             vgg16              |    4    |        pass        |
|  pytorch_CycleGAN_and_pix2pix  |    1    |        pass        |
|   mobilenet_v2_quantized_qat   |    4    |        pass        |
|             llama              |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy | 1048576 |        pass        |
|          BERT_pytorch          |    4    |        pass        |
|           moondream            |    4    |        pass        |
|          pytorch_unet          |    2    |        pass        |
|       soft_actor_critic        |   256   |        pass        |
|       speech_transformer       |    1    |        pass        |
|         squeezenet1_1          |    4    |        pass        |
|       timm_efficientnet        |    4    |        pass        |
|           timm_nfnet           |    4    |        pass        |
|          timm_regnet           |    4    |        pass        |
|          timm_resnest          |    4    |        pass        |
|    timm_vision_transformer     |    4    |        pass        |
|          timm_vovnet           |    4    |        pass        |
|     torch_multimodal_clip      |    4    |        pass        |
|          tts_angular           |    4    |        pass        |
|     resnet50_quantized_qat     |    4    |        pass        |
|       timm_efficientdet        |    0    | model_fail_to_load |
|              moco              |    0    | model_fail_to_load |
|         DALLE2_pytorch         |    0    | model_fail_to_load |
|          Super_SloMo           |    4    |    fail_to_run     |
|        vision_maskrcnn         |    1    |    fail_to_run     |
|             dcgan              |    4    |   fail_accuracy    |
|          densenet121           |    4    |   fail_accuracy    |
|       shufflenet_v2_x1_0       |    4    |   fail_accuracy    |
|          mobilenet_v2          |    4    |   fail_accuracy    |
|           mnasnet1_0           |    4    |   fail_accuracy    |
|        resnext50_32x4d         |    4    |   fail_accuracy    |
|            resnet50            |    4    |   fail_accuracy    |
|           resnet152            |    4    |   fail_accuracy    |
|            resnet18            |    4    |   fail_accuracy    |
+--------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           densenet121           |   64    | 104.865562 |
|           hf_BigBird            |    1    | 76.377296  |
|    detectron2_fcos_r_50_fpn     |    1    |  70.31366  |
|  timm_vision_transformer_large  |   32    | 63.384049  |
|           hf_T5_large           |    1    | 62.014033  |
|           timm_nfnet            |   128   | 57.303303  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 52.753034  |
|              maml               |    1    | 52.429297  |
|          hf_Longformer          |    1    | 51.457058  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 50.612961  |
|           hf_Reformer           |    1    | 47.185561  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 46.183372  |
|        phlippe_densenet         |   128   | 45.685869  |
|           hf_T5_base            |    1    | 44.684617  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 43.931894  |
|      torch_multimodal_clip      |   32    | 40.558443  |
|     pyhpc_isoneutral_mixing     | 1048576 | 39.504697  |
|       speech_transformer        |    1    | 39.255441  |
|        timm_efficientnet        |   64    | 39.242816  |
|             yolov3              |    8    | 35.973102  |
|          hf_GPT2_large          |    1    | 35.741927  |
|          BERT_pytorch           |    2    | 35.507912  |
|             demucs              |    1    | 34.647839  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 33.247417  |
|            moondream            |    1    | 32.997072  |
|         opacus_cifar10          |   64    | 31.878993  |
|              hf_T5              |    1    | 31.031521  |
|      functorch_dp_cifar10       |   64    | 30.144885  |
|        hf_distil_whisper        |    1    | 30.141755  |
|     timm_vision_transformer     |   32    | 29.038585  |
|       mobilenet_v3_large        |   32    | 27.968206  |
|          timm_resnest           |   32    | 27.797008  |
|       shufflenet_v2_x1_0        |   64    | 26.747581  |
|       doctr_det_predictor       |    1    | 26.615936  |
|          hf_Bert_large          |    1    | 26.225593  |
|           timm_regnet           |   32    | 25.195439  |
|       Background_Matting        |    1    | 23.724823  |
|           timm_vovnet           |   32    | 23.021285  |
|          fastNLP_Bert           |    1    |  22.61457  |
|             hf_Bart             |    1    | 22.366799  |
|          pytorch_unet           |    1    | 21.303577  |
|         pytorch_stargan         |   16    | 21.117145  |
|            hf_Albert            |    1    | 20.320968  |
|            resnet152            |   32    | 20.273781  |
|             hf_GPT2             |    1    | 19.622124  |
|          hf_DistilBert          |    1    | 18.976521  |
|             hf_Bert             |    1    | 18.847079  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 18.347009  |
|          squeezenet1_1          |   16    | 17.624054  |
|              vgg16              |    4    | 14.304878  |
|      doctr_reco_predictor       |    1    | 13.512199  |
|          mobilenet_v2           |   16    | 12.665911  |
|         resnext50_32x4d         |    8    | 12.438681  |
|            resnet50             |   32    | 12.386594  |
|             alexnet             |   128   | 12.087744  |
|          basic_gnn_gcn          |    1    | 11.843025  |
|          basic_gnn_gin          |    1    | 11.294021  |
|         basic_gnn_sage          |    1    | 11.261843  |
|               drq               |    1    | 10.855582  |
|              dlrm               |  2048   | 10.711366  |
|           mnasnet1_0            |   32    | 10.218416  |
|            resnet18             |    8    | 10.040031  |
|         LearningToPaint         |   96    |  9.874127  |
|     functorch_maml_omniglot     |    1    |  9.689264  |
|        basic_gnn_edgecnn        |    1    |  9.212636  |
|     pyhpc_equation_of_state     | 1048576 |  9.188226  |
|     nvidia_deeprecommender      |   256   |  8.731374  |
|          maml_omniglot          |    5    |  8.648584  |
|         phlippe_resnet          |   128   |  8.359522  |
|        soft_actor_critic        |   256   |  7.961561  |
|          lennard_jones          |  1000   |  6.757823  |
|              dcgan              |   256   |  5.937967  |
|           tts_angular           |   64    |  5.553046  |
|   mobilenet_v2_quantized_qat    |   96    |  0.137945  |
|     resnet50_quantized_qat      |   32    |  0.102071  |
|        timm_efficientdet        |    0    |    0.0     |
|              moco               |    0    |    0.0     |
|         DALLE2_pytorch          |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|  timm_vision_transformer_large  |   32    | 0.996202 |
|           timm_nfnet            |   128   | 0.991875 |
|              dlrm               |  2048   | 0.988079 |
|           hf_T5_base            |    1    | 0.987833 |
|        timm_efficientnet        |   64    | 0.984961 |
|       Background_Matting        |    1    | 0.982992 |
|           timm_regnet           |   32    | 0.982831 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981395 |
|             yolov3              |    8    | 0.979925 |
|           densenet121           |   64    | 0.979149 |
|            resnet152            |   32    | 0.978759 |
|     nvidia_deeprecommender      |   256   | 0.978663 |
|          pytorch_unet           |    1    | 0.978368 |
|             demucs              |    1    | 0.978279 |
|          hf_GPT2_large          |    1    | 0.978118 |
|      torch_multimodal_clip      |   32    | 0.977939 |
|           timm_vovnet           |   32    | 0.97477  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.97368  |
|            resnet50             |   32    | 0.973667 |
|         LearningToPaint         |   96    | 0.970975 |
|          timm_resnest           |   32    | 0.970956 |
|        basic_gnn_edgecnn        |    1    | 0.97053  |
|     timm_vision_transformer     |   32    | 0.964651 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.964246 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.963836 |
|   mobilenet_v2_quantized_qat    |   96    | 0.963085 |
|           mnasnet1_0            |   32    | 0.962084 |
|     resnet50_quantized_qat      |   32    | 0.961901 |
|          mobilenet_v2           |   16    | 0.959942 |
|       mobilenet_v3_large        |   32    | 0.959796 |
|       doctr_det_predictor       |    1    | 0.956968 |
|       shufflenet_v2_x1_0        |   64    | 0.956924 |
|             alexnet             |   128   | 0.954664 |
|           hf_BigBird            |    1    | 0.948809 |
|         resnext50_32x4d         |    8    | 0.946388 |
|        phlippe_densenet         |   128   | 0.946308 |
|              vgg16              |    4    | 0.944956 |
|         pytorch_stargan         |   16    | 0.944368 |
|      doctr_reco_predictor       |    1    | 0.936197 |
|          BERT_pytorch           |    2    | 0.934266 |
|          basic_gnn_gcn          |    1    | 0.933653 |
|           tts_angular           |   64    | 0.929692 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.915833 |
|        hf_distil_whisper        |    1    | 0.912325 |
|          squeezenet1_1          |   16    | 0.911955 |
|     pyhpc_equation_of_state     | 1048576 | 0.908374 |
|              dcgan              |   256   | 0.907821 |
|            resnet18             |    8    | 0.896153 |
|         opacus_cifar10          |   64    | 0.890946 |
|         phlippe_resnet          |   128   | 0.890767 |
|        soft_actor_critic        |   256   | 0.889197 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.880722 |
|          lennard_jones          |  1000   | 0.871116 |
|     functorch_maml_omniglot     |    1    | 0.862405 |
|          maml_omniglot          |    5    | 0.860541 |
|         basic_gnn_sage          |    1    | 0.860298 |
|          basic_gnn_gin          |    1    | 0.852798 |
|          fastNLP_Bert           |    1    | 0.846715 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.840343 |
|            moondream            |    1    | 0.824099 |
|      functorch_dp_cifar10       |   64    | 0.821024 |
|       speech_transformer        |    1    | 0.81522  |
|          hf_Bert_large          |    1    | 0.808852 |
|          hf_Longformer          |    1    | 0.800242 |
|              maml               |    1    | 0.800228 |
|             hf_Bert             |    1    | 0.797178 |
|           hf_T5_large           |    1    | 0.791046 |
|            hf_Albert            |    1    | 0.790957 |
|               drq               |    1    | 0.772306 |
|             hf_Bart             |    1    | 0.766662 |
|          hf_DistilBert          |    1    | 0.762532 |
|             hf_GPT2             |    1    | 0.760906 |
|              hf_T5              |    1    | 0.752729 |
|           hf_Reformer           |    1    | 0.73882  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.68894  |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4414.126845 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1428.666704 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1296.848698 |
|           hf_T5_base            |    1    | 1257.050593 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1173.366992 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1083.098435 |
|          hf_GPT2_large          |    1    | 541.614722  |
|           timm_nfnet            |   128   | 522.503087  |
|           hf_T5_large           |    1    | 396.244926  |
|            moondream            |    1    | 380.590087  |
|        hf_distil_whisper        |    1    |  340.59861  |
|       Background_Matting        |    1    | 336.206162  |
|          pytorch_unet           |    1    | 220.579493  |
|           timm_regnet           |   32    | 217.614518  |
|           densenet121           |   64    | 191.842727  |
|            resnet152            |   32    | 186.861752  |
|    detectron2_fcos_r_50_fpn     |    1    | 176.463753  |
|      torch_multimodal_clip      |   32    | 164.502088  |
|             yolov3              |    8    |  160.14262  |
|             demucs              |    1    | 141.161883  |
|           hf_BigBird            |    1    | 125.843723  |
|           timm_vovnet           |   32    | 113.045851  |
|     timm_vision_transformer     |   32    | 104.701033  |
|          hf_Bert_large          |    1    | 102.193153  |
|         pytorch_stargan         |   16    |  93.387008  |
|       doctr_det_predictor       |    1    |  84.771064  |
|            resnet50             |   32    |  74.455672  |
|          hf_Longformer          |    1    |  70.820078  |
|       speech_transformer        |    1    |  53.777599  |
|          timm_resnest           |   32    |  53.260992  |
|             hf_Bart             |    1    |  52.409422  |
|        timm_efficientnet        |   64    |  51.047941  |
|              maml               |    1    |  49.016355  |
|              hf_T5              |    1    |  42.674287  |
|             alexnet             |   128   |  42.500339  |
|             hf_Bert             |    1    |  40.727966  |
|   mobilenet_v2_quantized_qat    |   96    |  40.022983  |
|           hf_Reformer           |    1    |   36.7406   |
|         LearningToPaint         |   96    |  36.10582   |
|              vgg16              |    4    |  35.677922  |
|            hf_Albert            |    1    |  35.49324   |
|     nvidia_deeprecommender      |   256   |  33.601122  |
|          fastNLP_Bert           |    1    |  33.060165  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  33.025231  |
|          BERT_pytorch           |    2    |  29.700861  |
|     pyhpc_isoneutral_mixing     | 1048576 |  28.273715  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  27.784234  |
|          hf_DistilBert          |    1    |  26.745246  |
|     resnet50_quantized_qat      |   32    |  25.126999  |
|         resnext50_32x4d         |    8    |  24.68052   |
|             hf_GPT2             |    1    |  23.033416  |
|        phlippe_densenet         |   128   |  20.660679  |
|           tts_angular           |   64    |  19.540092  |
|        basic_gnn_edgecnn        |    1    |  18.793714  |
|              dcgan              |   256   |  18.005285  |
|       shufflenet_v2_x1_0        |   64    |  16.540175  |
|           mnasnet1_0            |   32    |  15.58115   |
|       mobilenet_v3_large        |   32    |  14.573311  |
|      functorch_dp_cifar10       |   64    |  10.868066  |
|         opacus_cifar10          |   64    |  10.77925   |
|            resnet18             |    8    |  9.855105   |
|          mobilenet_v2           |   16    |  9.753547   |
|          basic_gnn_gcn          |    1    |  9.374803   |
|              dlrm               |  2048   |  6.680958   |
|          squeezenet1_1          |   16    |  5.949107   |
|         basic_gnn_sage          |    1    |  4.915204   |
|          basic_gnn_gin          |    1    |  4.705467   |
|         phlippe_resnet          |   128   |  4.414737   |
|      doctr_reco_predictor       |    1    |  3.531278   |
|     pyhpc_equation_of_state     | 1048576 |  1.123576   |
|               drq               |    1    |  0.951963   |
|        soft_actor_critic        |   256   |  0.575728   |
|          maml_omniglot          |    5    |  0.547484   |
|     functorch_maml_omniglot     |    1    |  0.479048   |
|          lennard_jones          |  1000   |  0.203403   |
|        timm_efficientdet        |    0    |     0.0     |
|         DALLE2_pytorch          |    0    |     0.0     |
|              moco               |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.569916 |
|     MobileBertForQuestionAnswering      | 128 | 1.809093 |
|      GPT2ForSequenceClassification      |  4  | 1.777181 |
|           ElectraForCausalLM            | 32  | 1.721435 |
|       ElectraForQuestionAnswering       | 64  | 1.659335 |
|          MobileBertForMaskedLM          | 128 | 1.632032 |
|               DistillGPT2               | 16  | 1.506972 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.418682 |
|            YituTechConvBert             | 16  | 1.411224 |
|    LayoutLMForSequenceClassification    | 16  | 1.404241 |
|       RobertaForQuestionAnswering       | 16  | 1.403307 |
|        BertForQuestionAnswering         | 16  | 1.400068 |
|           RobertaForCausalLM            | 16  | 1.37609  |
|               GoogleFnet                | 16  | 1.356846 |
|           LayoutLMForMaskedLM           | 16  | 1.349209 |
|                CamemBert                | 16  | 1.343892 |
|             BertForMaskedLM             | 16  | 1.336746 |
|          AllenaiLongformerBase          |  4  | 1.29756  |
|    MegatronBertForQuestionAnswering     |  8  | 1.288078 |
|         MegatronBertForCausalLM         |  4  | 1.261476 |
|       DebertaForQuestionAnswering       | 16  | 1.225233 |
|     PLBartForConditionalGeneration      |  4  | 1.207406 |
|      MBartForConditionalGeneration      |  2  | 1.175967 |
|             OPTForCausalLM              |  2  | 1.170829 |
|       MT5ForConditionalGeneration       | 16  | 1.167853 |
|           DebertaForMaskedLM            |  8  | 1.165354 |
|                 T5Small                 |  4  | 1.16044  |
| BlenderbotSmallForConditionalGeneration | 64  | 1.140947 |
|       T5ForConditionalGeneration        |  4  | 1.136798 |
|       AlbertForQuestionAnswering        |  4  | 1.132689 |
|            AlbertForMaskedLM            |  4  | 1.127943 |
|          DistilBertForMaskedLM          | 128 | 1.100218 |
|     DistilBertForQuestionAnswering      | 256 | 1.081414 |
|         Speech2Text2ForCausalLM         | 256 | 1.079714 |
|       BlenderbotSmallForCausalLM        | 64  | 1.078842 |
|          DebertaV2ForMaskedLM           |  2  | 1.076111 |
|             XGLMForCausalLM             |  8  | 1.074709 |
|     M2M100ForConditionalGeneration      | 16  | 1.070995 |
|      BartForConditionalGeneration       |  2  | 1.068425 |
|            PLBartForCausalLM            |  8  | 1.048315 |
|            TrOCRForCausalLM             | 32  | 1.039312 |
|     PegasusForConditionalGeneration     | 32  | 1.037975 |
|             BartForCausalLM             |  4  | 1.028213 |
|           PegasusForCausalLM            | 32  | 1.026794 |
|            MBartForCausalLM             |  4  | 1.025601 |
|          BlenderbotForCausalLM          |  4  | 1.018938 |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|          AllenaiLongformerBase          |  4  | 131.06297 |
|     PegasusForConditionalGeneration     | 32  | 72.004557 |
|          MobileBertForMaskedLM          | 128 | 69.578578 |
|      MBartForConditionalGeneration      |  2  | 69.297979 |
|     MobileBertForQuestionAnswering      | 128 | 68.647646 |
|     M2M100ForConditionalGeneration      | 16  | 68.046361 |
|       MT5ForConditionalGeneration       | 16  | 65.270222 |
|          BlenderbotForCausalLM          |  4  | 58.208999 |
|             XGLMForCausalLM             |  8  | 54.766042 |
|                 T5Small                 |  4  | 54.153298 |
|       T5ForConditionalGeneration        |  4  | 53.95943  |
| BlenderbotSmallForConditionalGeneration | 64  | 51.548942 |
|      BartForConditionalGeneration       |  2  | 49.819502 |
|          DebertaV2ForMaskedLM           |  2  | 49.024262 |
|    MegatronBertForQuestionAnswering     |  8  | 47.073073 |
|         MegatronBertForCausalLM         |  4  | 47.041723 |
|            YituTechConvBert             | 16  | 46.914022 |
|            XLNetLMHeadModel             |  8  | 44.08648  |
|     PLBartForConditionalGeneration      |  4  | 43.283625 |
|             OPTForCausalLM              |  2  | 38.989868 |
|           PegasusForCausalLM            | 32  | 37.435609 |
|            MBartForCausalLM             |  4  | 35.851997 |
|       DebertaForQuestionAnswering       | 16  | 33.099808 |
|           DebertaForMaskedLM            |  8  | 32.807795 |
|            TrOCRForCausalLM             | 32  | 32.309106 |
|      DebertaV2ForQuestionAnswering      |  1  | 31.730247 |
|      GPT2ForSequenceClassification      |  4  | 31.121734 |
|           RobertaForCausalLM            | 16  | 29.083286 |
|       RobertaForQuestionAnswering       | 16  | 28.610483 |
|                CamemBert                | 16  | 28.395726 |
|           ElectraForCausalLM            | 32  | 27.760837 |
|           LayoutLMForMaskedLM           | 16  | 27.265638 |
|       ElectraForQuestionAnswering       | 64  | 27.245594 |
|             BertForMaskedLM             | 16  | 27.080217 |
|        BertForQuestionAnswering         | 16  | 26.948394 |
|       AlbertForQuestionAnswering        |  4  | 26.834068 |
|            AlbertForMaskedLM            |  4  | 26.751742 |
|    LayoutLMForSequenceClassification    | 16  | 26.580595 |
|       BlenderbotSmallForCausalLM        | 64  | 25.779593 |
|     DistilBertForQuestionAnswering      | 256 | 25.217592 |
|               DistillGPT2               | 16  | 25.191254 |
|          DistilBertForMaskedLM          | 128 | 24.671998 |
|             BartForCausalLM             |  4  | 24.665201 |
|         Speech2Text2ForCausalLM         | 256 | 23.642391 |
|            PLBartForCausalLM            |  8  | 23.29451  |
|               GoogleFnet                | 16  | 21.226946 |
+-----------------------------------------+-----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       AlbertForQuestionAnswering        |  4  | 0.993963 |
|            AlbertForMaskedLM            |  4  | 0.99352  |
|     DistilBertForQuestionAnswering      | 256 | 0.993136 |
|           RobertaForCausalLM            | 16  | 0.992568 |
|            TrOCRForCausalLM             | 32  | 0.992256 |
|          DistilBertForMaskedLM          | 128 | 0.991941 |
|             OPTForCausalLM              |  2  | 0.991665 |
|           ElectraForCausalLM            | 32  | 0.991423 |
|       ElectraForQuestionAnswering       | 64  | 0.99099  |
|                CamemBert                | 16  | 0.990977 |
|               GoogleFnet                | 16  | 0.990971 |
|             BertForMaskedLM             | 16  | 0.990918 |
|               DistillGPT2               | 16  | 0.990608 |
|            PLBartForCausalLM            |  8  | 0.990518 |
|           LayoutLMForMaskedLM           | 16  | 0.990469 |
|            MBartForCausalLM             |  4  | 0.989775 |
|    MegatronBertForQuestionAnswering     |  8  | 0.989616 |
|            YituTechConvBert             | 16  | 0.989115 |
|     PegasusForConditionalGeneration     | 32  | 0.988983 |
|       DebertaForQuestionAnswering       | 16  | 0.988924 |
|        BertForQuestionAnswering         | 16  | 0.98873  |
|       RobertaForQuestionAnswering       | 16  | 0.988568 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.988158 |
|     PLBartForConditionalGeneration      |  4  | 0.987844 |
|         Speech2Text2ForCausalLM         | 256 | 0.987707 |
|           PegasusForCausalLM            | 32  | 0.987011 |
|      GPT2ForSequenceClassification      |  4  | 0.986912 |
|             BartForCausalLM             |  4  | 0.986631 |
|       BlenderbotSmallForCausalLM        | 64  | 0.986533 |
|          BlenderbotForCausalLM          |  4  | 0.985932 |
|           DebertaForMaskedLM            |  8  | 0.985799 |
|      MBartForConditionalGeneration      |  2  | 0.985738 |
|    LayoutLMForSequenceClassification    | 16  | 0.985599 |
|         MegatronBertForCausalLM         |  4  | 0.98513  |
|      BartForConditionalGeneration       |  2  | 0.983389 |
|       MT5ForConditionalGeneration       | 16  | 0.982809 |
|          MobileBertForMaskedLM          | 128 | 0.982511 |
|       T5ForConditionalGeneration        |  4  | 0.982305 |
|            XLNetLMHeadModel             |  8  | 0.982086 |
|                 T5Small                 |  4  | 0.981163 |
|     M2M100ForConditionalGeneration      | 16  | 0.977846 |
|          DebertaV2ForMaskedLM           |  2  | 0.976298 |
|     MobileBertForQuestionAnswering      | 128 | 0.974999 |
|             XGLMForCausalLM             |  8  | 0.972923 |
|          AllenaiLongformerBase          |  4  | 0.971491 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.869487 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|            AlbertForMaskedLM            |  4  | 2640.804665 |
|       AlbertForQuestionAnswering        |  4  | 2626.379504 |
|            XLNetLMHeadModel             |  8  | 1296.469938 |
|     PegasusForConditionalGeneration     | 32  | 981.682803  |
|            TrOCRForCausalLM             | 32  | 960.242513  |
|     DistilBertForQuestionAnswering      | 256 | 875.699726  |
|    MegatronBertForQuestionAnswering     |  8  | 775.794679  |
|            MBartForCausalLM             |  4  | 676.896862  |
|      MBartForConditionalGeneration      |  2  | 670.930277  |
|          BlenderbotForCausalLM          |  4  |  664.48555  |
|          DistilBertForMaskedLM          | 128 | 660.951249  |
|           RobertaForCausalLM            | 16  | 603.774616  |
|          DebertaV2ForMaskedLM           |  2  | 600.144223  |
|             OPTForCausalLM              |  2  | 594.195139  |
|     M2M100ForConditionalGeneration      | 16  | 593.319703  |
|      BartForConditionalGeneration       |  2  | 588.833137  |
|            YituTechConvBert             | 16  | 581.421588  |
|                CamemBert                | 16  | 554.162626  |
|             BertForMaskedLM             | 16  | 551.596373  |
|           LayoutLMForMaskedLM           | 16  | 550.012883  |
|          AllenaiLongformerBase          |  4  | 532.978186  |
|       DebertaForQuestionAnswering       | 16  | 525.945896  |
|             BartForCausalLM             |  4  | 517.404875  |
|            PLBartForCausalLM            |  8  | 506.909877  |
| BlenderbotSmallForConditionalGeneration | 64  | 485.699875  |
|           PegasusForCausalLM            | 32  | 481.792783  |
|     PLBartForConditionalGeneration      |  4  | 476.104366  |
|         MegatronBertForCausalLM         |  4  | 446.324222  |
|    LayoutLMForSequenceClassification    | 16  | 443.518025  |
|        BertForQuestionAnswering         | 16  |  440.6403   |
|       RobertaForQuestionAnswering       | 16  | 432.169399  |
|               GoogleFnet                | 16  | 403.909358  |
|          MobileBertForMaskedLM          | 128 | 392.733528  |
|               DistillGPT2               | 16  | 379.882268  |
|             XGLMForCausalLM             |  8  | 375.988131  |
|           DebertaForMaskedLM            |  8  | 369.541328  |
|       T5ForConditionalGeneration        |  4  | 339.608185  |
|       ElectraForQuestionAnswering       | 64  | 334.775593  |
|                 T5Small                 |  4  | 332.682543  |
|         Speech2Text2ForCausalLM         | 256 | 277.996872  |
|       BlenderbotSmallForCausalLM        | 64  | 273.786896  |
|      GPT2ForSequenceClassification      |  4  | 273.267206  |
|           ElectraForCausalLM            | 32  | 256.646071  |
|     MobileBertForQuestionAnswering      | 128 | 244.033923  |
|       MT5ForConditionalGeneration       | 16  | 221.986909  |
|      DebertaV2ForQuestionAnswering      |  1  | 221.696305  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           fbnetc_100            | 512  | 3.879795 |
|           mnasnet_100           | 512  | 3.798906 |
|            lcnet_050            | 256  | 3.753176 |
|         mobilenetv2_100         | 128  | 3.687864 |
|      mobilenetv3_large_100      | 512  | 3.581484 |
|          spnasnet_100           | 128  | 3.46906  |
|            fbnetv3_b            | 256  | 3.421015 |
|           regnety_002           | 1024 | 3.276789 |
|           rexnet_100            | 256  | 3.040837 |
|       tf_efficientnet_b0        | 128  | 2.892381 |
|            tinynet_a            | 128  | 2.769235 |
|        ese_vovnet19b_dw         | 256  | 2.609788 |
|          botnet26t_256          | 128  | 2.585717 |
|          pnasnet5large          |  16  | 2.544517 |
|            hrnet_w18            | 128  | 2.517856 |
|           res2next50            | 128  | 2.399876 |
|       eca_botnext26ts_256       | 128  | 2.333417 |
|          ghostnet_100           | 512  | 2.319189 |
|       gluon_inception_v3        | 256  | 2.318574 |
|          inception_v3           | 128  | 2.267142 |
|           resnest101e           |  64  | 2.252139 |
|        adv_inception_v3         | 128  | 2.240964 |
|        eca_halonext26ts         | 128  | 2.221168 |
|             dla102              | 128  | 2.21133  |
|        res2net101_26w_4s        | 128  | 2.11613  |
|        res2net50_14w_8s         | 128  | 2.113766 |
|            repvgg_a2            | 128  | 2.108568 |
|          cspdarknet53           |  64  | 2.064894 |
|            nfnet_l0             | 128  | 2.001923 |
|        convmixer_768_32         |  32  | 1.977674 |
|            gernet_l             | 128  | 1.909601 |
|           dm_nfnet_f0           | 128  | 1.879865 |
|           tf_mixnet_l           | 128  | 1.818829 |
|           selecsls42b           | 128  | 1.777902 |
|        sebotnet33ts_256         |  64  | 1.760514 |
|         visformer_small         | 128  | 1.676484 |
|         poolformer_m36          |  64  | 1.654793 |
|            mixnet_l             | 128  | 1.650077 |
|           volo_d1_224           |  64  | 1.635111 |
|     swsl_resnext101_32x16d      |  32  | 1.576709 |
|             dpn107              |  64  | 1.510938 |
|            levit_128            | 1024 | 1.457835 |
|           mobilevit_s           |  64  | 1.420586 |
|          gmlp_s16_224           | 128  | 1.274531 |
|          resmlp_12_224          | 128  | 1.18841  |
|      xcit_large_24_p8_224       |  16  | 1.180867 |
|           convit_base           |  64  | 1.167376 |
|          gmixer_24_224          | 128  | 1.140758 |
|          cait_m36_384           |  4   | 1.126853 |
|  swin_base_patch4_window7_224   |  64  | 1.118157 |
|        tnt_s_patch16_224        | 128  | 1.087423 |
|        twins_pcpvt_base         | 128  | 1.072447 |
|          convnext_base          |  64  | 1.056998 |
|          mixer_b16_224          | 128  | 1.054297 |
|      beit_base_patch16_224      |  64  | 1.041885 |
| deit_base_distilled_patch16_224 |  64  | 1.028777 |
|      vit_base_patch16_224       |  64  | 1.025421 |
|            pit_b_224            |  64  | 1.02453  |
|          jx_nest_base           |  32  | 0.992438 |
|         crossvit_9_240          | 256  | 0.973399 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 8  |     pass      |
|           dm_nfnet_f0           | 8  |     pass      |
|            mixnet_l             | 8  |     pass      |
|       eca_botnext26ts_256       | 8  |     pass      |
|         crossvit_9_240          | 8  |     pass      |
| deit_base_distilled_patch16_224 | 8  |     pass      |
|          convnext_base          | 8  |     pass      |
|           regnety_002           | 8  |     pass      |
|           convit_base           | 8  |     pass      |
|          gmlp_s16_224           | 8  |     pass      |
|          cait_m36_384           | 8  |     pass      |
|             dpn107              | 8  |     pass      |
|      vit_base_patch16_224       | 8  |     pass      |
|          cspdarknet53           | 8  |     pass      |
|        eca_halonext26ts         | 8  |     pass      |
|        ese_vovnet19b_dw         | 8  |     pass      |
|           fbnetc_100            | 8  |     pass      |
|            fbnetv3_b            | 8  |     pass      |
|            gernet_l             | 8  |     pass      |
|          botnet26t_256          | 8  |     pass      |
|       gluon_inception_v3        | 8  |     pass      |
|          gmixer_24_224          | 8  |     pass      |
|          jx_nest_base           | 8  |     pass      |
|        convmixer_768_32         | 8  |     pass      |
|        twins_pcpvt_base         | 8  |     pass      |
|         poolformer_m36          | 8  |     pass      |
|            lcnet_050            | 8  |     pass      |
|          mixer_b16_224          | 8  |     pass      |
|        tnt_s_patch16_224        | 8  |     pass      |
|           mnasnet_100           | 8  |     pass      |
|         mobilenetv2_100         | 8  |     pass      |
|      mobilenetv3_large_100      | 8  |     pass      |
|           mobilevit_s           | 8  |     pass      |
|            nfnet_l0             | 8  |     pass      |
|            pit_b_224            | 8  |     pass      |
|      beit_base_patch16_224      | 8  |     pass      |
|            repvgg_a2            | 8  |     pass      |
|  swin_base_patch4_window7_224   | 8  |     pass      |
|          inception_v3           | 8  |     pass      |
|           tf_mixnet_l           | 8  |     pass      |
|       tf_efficientnet_b0        | 8  |     pass      |
|            tinynet_a            | 8  |     pass      |
|          spnasnet_100           | 8  |     pass      |
|        sebotnet33ts_256         | 8  |     pass      |
|          resmlp_12_224          | 8  |     pass      |
|         coat_lite_mini          | 8  |  fail_to_run  |
|           resnest101e           | 8  | fail_accuracy |
|            levit_128            | 8  | fail_accuracy |
|          ghostnet_100           | 8  | fail_accuracy |
|          pnasnet5large          | 8  | fail_accuracy |
|        res2net101_26w_4s        | 8  | fail_accuracy |
|        res2net50_14w_8s         | 8  | fail_accuracy |
|             dla102              | 8  | fail_accuracy |
|     swsl_resnext101_32x16d      | 8  | fail_accuracy |
|           rexnet_100            | 8  | fail_accuracy |
|           selecsls42b           | 8  | fail_accuracy |
|           res2next50            | 8  | fail_accuracy |
|            hrnet_w18            | 8  | fail_accuracy |
|           volo_d1_224           | 8  | fail_accuracy |
|         visformer_small         | 8  | fail_accuracy |
|      xcit_large_24_p8_224       | 8  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|  swin_base_patch4_window7_224   |  64  | 130.584259 |
|          pnasnet5large          |  16  | 108.227887 |
|           tf_mixnet_l           | 128  | 98.262406  |
|           mobilevit_s           |  64  | 97.881345  |
|      xcit_large_24_p8_224       |  16  | 95.381978  |
|          cait_m36_384           |  4   | 93.823955  |
|        twins_pcpvt_base         | 128  | 93.702966  |
|          jx_nest_base           |  32  | 88.683194  |
|             dpn107              |  64  |  87.15626  |
|        tnt_s_patch16_224        | 128  | 82.312241  |
|           volo_d1_224           |  64  | 79.219728  |
|         crossvit_9_240          | 256  | 76.953992  |
|            levit_128            | 1024 | 76.006246  |
|        res2net50_14w_8s         | 128  | 73.287554  |
|           rexnet_100            | 256  | 73.113123  |
|        eca_halonext26ts         | 128  | 72.219219  |
|            mixnet_l             | 128  |  72.11244  |
|         poolformer_m36          |  64  |  70.41945  |
|        sebotnet33ts_256         |  64  |  70.22647  |
|          ghostnet_100           | 512  | 67.292931  |
|           dm_nfnet_f0           | 128  | 59.811387  |
|           convit_base           |  64  | 58.820424  |
|            hrnet_w18            | 128  | 58.413424  |
|       eca_botnext26ts_256       | 128  | 57.045736  |
|          convnext_base          |  64  | 54.652663  |
|        res2net101_26w_4s        | 128  | 52.849483  |
|       tf_efficientnet_b0        | 128  | 49.085116  |
|            nfnet_l0             | 128  | 46.576298  |
|            pit_b_224            |  64  | 45.776051  |
|          gmixer_24_224          | 128  |  44.19184  |
|          gmlp_s16_224           | 128  | 43.253068  |
|       gluon_inception_v3        | 256  | 43.158218  |
|           resnest101e           |  64  | 42.952378  |
|          botnet26t_256          | 128  |  42.53002  |
|            fbnetv3_b            | 256  | 42.156752  |
|           res2next50            | 128  | 41.972079  |
|          inception_v3           | 128  | 41.751959  |
|            tinynet_a            | 128  | 41.704833  |
|        adv_inception_v3         | 128  | 41.630993  |
|         visformer_small         | 128  | 35.387619  |
|             dla102              | 128  | 32.710493  |
|          cspdarknet53           |  64  | 32.043146  |
|      vit_base_patch16_224       |  64  | 30.401243  |
| deit_base_distilled_patch16_224 |  64  | 30.379749  |
|          mixer_b16_224          | 128  | 30.117474  |
|      mobilenetv3_large_100      | 512  | 29.562992  |
|        ese_vovnet19b_dw         | 256  | 28.788384  |
|      beit_base_patch16_224      |  64  | 25.545625  |
|           regnety_002           | 1024 | 21.642316  |
|        convmixer_768_32         |  32  | 21.352065  |
|          resmlp_12_224          | 128  |  20.85114  |
|            lcnet_050            | 256  | 17.331933  |
|           selecsls42b           | 128  | 17.294946  |
|            repvgg_a2            | 128  | 17.203864  |
|     swsl_resnext101_32x16d      |  32  | 16.077145  |
|         mobilenetv2_100         | 128  | 12.914336  |
|          spnasnet_100           | 128  | 11.404854  |
|            gernet_l             | 128  | 11.040951  |
|           fbnetc_100            | 512  |  9.874881  |
|           mnasnet_100           | 512  |  9.381413  |
+---------------------------------+------+------------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.997572 |
|           fbnetc_100            | 512  | 0.996957 |
|           mnasnet_100           | 512  | 0.996501 |
|            fbnetv3_b            | 256  | 0.996193 |
|      mobilenetv3_large_100      | 512  | 0.996179 |
|          ghostnet_100           | 512  | 0.995879 |
|          convnext_base          |  64  | 0.995783 |
|           regnety_002           | 1024 | 0.995633 |
|            levit_128            | 1024 | 0.994847 |
|           dm_nfnet_f0           | 128  | 0.994766 |
|            nfnet_l0             | 128  | 0.994476 |
|        res2net101_26w_4s        | 128  | 0.994432 |
|       eca_botnext26ts_256       | 128  | 0.994391 |
|           rexnet_100            | 256  | 0.994233 |
|             dpn107              |  64  | 0.994061 |
|       gluon_inception_v3        | 256  | 0.993989 |
|        eca_halonext26ts         | 128  | 0.993923 |
|          mixer_b16_224          | 128  | 0.993442 |
|           tf_mixnet_l           | 128  | 0.993218 |
|        res2net50_14w_8s         | 128  | 0.993164 |
|        twins_pcpvt_base         | 128  | 0.993138 |
|      xcit_large_24_p8_224       |  16  |  0.9931  |
|             dla102              | 128  | 0.993071 |
|           res2next50            | 128  | 0.993014 |
|        convmixer_768_32         |  32  | 0.993011 |
|            mixnet_l             | 128  | 0.992822 |
|           convit_base           |  64  | 0.992787 |
|          gmixer_24_224          | 128  | 0.992706 |
|          gmlp_s16_224           | 128  |  0.9926  |
|          botnet26t_256          | 128  | 0.99257  |
|       tf_efficientnet_b0        | 128  | 0.992247 |
|         visformer_small         | 128  | 0.992049 |
|          pnasnet5large          |  16  | 0.991931 |
|      beit_base_patch16_224      |  64  | 0.991839 |
|            gernet_l             | 128  | 0.991657 |
|           resnest101e           |  64  | 0.99154  |
|           mobilevit_s           |  64  | 0.991068 |
|        sebotnet33ts_256         |  64  | 0.990974 |
|        adv_inception_v3         | 128  | 0.990748 |
|          inception_v3           | 128  | 0.990303 |
|           selecsls42b           | 128  | 0.989837 |
|         mobilenetv2_100         | 128  | 0.989594 |
|        tnt_s_patch16_224        | 128  | 0.989501 |
|          spnasnet_100           | 128  | 0.989376 |
|            pit_b_224            |  64  | 0.989361 |
|          resmlp_12_224          | 128  | 0.989169 |
|      vit_base_patch16_224       |  64  | 0.988904 |
|          cait_m36_384           |  4   | 0.988792 |
| deit_base_distilled_patch16_224 |  64  | 0.988721 |
|  swin_base_patch4_window7_224   |  64  |  0.9885  |
|            tinynet_a            | 128  | 0.98834  |
|         poolformer_m36          |  64  | 0.988106 |
|     swsl_resnext101_32x16d      |  32  | 0.98778  |
|            hrnet_w18            | 128  | 0.986704 |
|            lcnet_050            | 256  | 0.985669 |
|            repvgg_a2            | 128  | 0.983819 |
|           volo_d1_224           |  64  | 0.98374  |
|          jx_nest_base           |  32  | 0.983363 |
|          cspdarknet53           |  64  | 0.98167  |
|         crossvit_9_240          | 256  | 0.974275 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+-------------+
|              name               |  bs  |  inductor   |
+---------------------------------+------+-------------+
|      xcit_large_24_p8_224       |  16  | 1455.285955 |
|          convnext_base          |  64  | 1151.212639 |
|          cait_m36_384           |  4   | 1103.220856 |
|          mixer_b16_224          | 128  | 1030.801524 |
|           dm_nfnet_f0           | 128  | 930.500764  |
|           convit_base           |  64  | 922.583457  |
|             dpn107              |  64  | 914.826536  |
|  swin_base_patch4_window7_224   |  64  | 835.339093  |
|        twins_pcpvt_base         | 128  | 828.604317  |
|        tnt_s_patch16_224        | 128  | 826.876523  |
|       gluon_inception_v3        | 256  | 793.158058  |
| deit_base_distilled_patch16_224 |  64  | 679.344193  |
|      beit_base_patch16_224      |  64  | 677.920045  |
|      vit_base_patch16_224       |  64  | 677.844502  |
|        res2net101_26w_4s        | 128  | 638.600184  |
|     swsl_resnext101_32x16d      |  32  | 626.871449  |
|            nfnet_l0             | 128  | 597.419676  |
|            levit_128            | 1024 | 565.357126  |
|          gmixer_24_224          | 128  | 559.452832  |
|          gmlp_s16_224           | 128  | 553.156228  |
|            pit_b_224            |  64  | 550.515468  |
|        ese_vovnet19b_dw         | 256  | 546.976153  |
|          jx_nest_base           |  32  | 539.111227  |
|         crossvit_9_240          | 256  |  522.91748  |
|             dla102              | 128  | 520.917325  |
|           resnest101e           |  64  | 475.620705  |
|         poolformer_m36          |  64  | 467.841786  |
|            hrnet_w18            | 128  | 438.972907  |
|        convmixer_768_32         |  32  | 435.212971  |
|           volo_d1_224           |  64  | 433.766083  |
|          inception_v3           | 128  | 400.034297  |
|        adv_inception_v3         | 128  | 399.534664  |
|        res2net50_14w_8s         | 128  | 397.743963  |
|         visformer_small         | 128  |  383.74711  |
|            mixnet_l             | 128  | 367.553325  |
|          ghostnet_100           | 512  | 360.385778  |
|           tf_mixnet_l           | 128  | 356.393154  |
|           res2next50            | 128  | 354.649235  |
|          pnasnet5large          |  16  | 348.820059  |
|            repvgg_a2            | 128  | 326.634554  |
|        eca_halonext26ts         | 128  | 311.298553  |
|           fbnetc_100            | 512  | 307.998976  |
|       eca_botnext26ts_256       | 128  | 289.527799  |
|            gernet_l             | 128  | 287.423457  |
|           regnety_002           | 1024 |  280.91239  |
|        sebotnet33ts_256         |  64  | 279.088574  |
|          botnet26t_256          | 128  | 273.889068  |
|           mobilevit_s           |  64  | 263.554181  |
|          resmlp_12_224          | 128  | 261.821334  |
|           mnasnet_100           | 512  | 261.120677  |
|          cspdarknet53           |  64  | 257.383905  |
|            fbnetv3_b            | 256  | 243.688452  |
|      mobilenetv3_large_100      | 512  | 230.803881  |
|           selecsls42b           | 128  | 229.601218  |
|           rexnet_100            | 256  | 228.686087  |
|       tf_efficientnet_b0        | 128  | 120.029682  |
|            tinynet_a            | 128  |  85.189293  |
|         mobilenetv2_100         | 128  |  73.608491  |
|          spnasnet_100           | 128  |  66.923873  |
|            lcnet_050            | 256  |  27.528934  |
+---------------------------------+------+-------------+

@zxd1997066
Copy link
Contributor

[dynamic] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-04-22 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 7706cd7d12781b8c537dd045745738d60c6c31f1
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 86%, 68/79 | 100%, 46/46 | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.52x    |    1.19x    |    1.48x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   25.84    |    26.53    |    34.43    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.86x    |    0.81x    |    0.82x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 63.116204 |
|     pyhpc_equation_of_state     |    1    | 23.612421 |
|         basic_gnn_sage          |    1    | 3.551246  |
|          basic_gnn_gin          |    1    | 3.465752  |
|          squeezenet1_1          |    1    | 3.355715  |
|     functorch_maml_omniglot     |    1    | 3.344942  |
|          basic_gnn_gcn          |    1    | 2.844917  |
|           timm_nfnet            |    1    | 2.759966  |
|          maml_omniglot          |    5    |  2.73319  |
|            resnet18             |    1    | 2.182681  |
|              dcgan              |    1    | 2.158719  |
|      functorch_dp_cifar10       |    1    | 2.113204  |
|         opacus_cifar10          |    1    | 2.083039  |
|       shufflenet_v2_x1_0        |    1    | 2.078097  |
|          timm_resnest           |    1    | 1.966143  |
|          mobilenet_v2           |    1    | 1.864142  |
|          lennard_jones          |    1    | 1.855511  |
|            resnet50             |    1    | 1.778053  |
|           mnasnet1_0            |    1    | 1.759888  |
|       mobilenet_v3_large        |    1    | 1.732745  |
|            resnet152            |    1    | 1.658468  |
|         phlippe_resnet          |    1    | 1.653415  |
|           densenet121           |    1    | 1.650297  |
|        timm_efficientnet        |    1    | 1.620505  |
|           timm_vovnet           |    1    | 1.596951  |
|         LearningToPaint         |    1    | 1.570736  |
|        phlippe_densenet         |    1    | 1.500156  |
|      doctr_reco_predictor       |    1    | 1.494572  |
|           timm_regnet           |    1    |   1.479   |
|         resnext50_32x4d         |    1    |  1.47352  |
|              vgg16              |    1    | 1.434315  |
|        basic_gnn_edgecnn        |    1    | 1.395968  |
|              llama              |    1    | 1.390323  |
|             yolov3              |    1    | 1.374637  |
|             alexnet             |    1    | 1.349795  |
|          BERT_pytorch           |    1    | 1.291443  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.291329  |
|       doctr_det_predictor       |    1    | 1.282455  |
|            hf_Albert            |    1    | 1.280157  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.278154  |
|             hf_GPT2             |    1    | 1.247362  |
|              maml               |    1    | 1.242979  |
|               drq               |    1    | 1.227931  |
|          hf_GPT2_large          |    1    | 1.221635  |
|            moondream            |    1    | 1.219563  |
|          fastNLP_Bert           |    1    | 1.210529  |
|     timm_vision_transformer     |    1    | 1.202181  |
|         pytorch_stargan         |   16    | 1.189584  |
|          hf_Bert_large          |    1    | 1.176764  |
|  timm_vision_transformer_large  |    1    | 1.170115  |
|              dlrm               |    1    | 1.169325  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.153841  |
|             hf_Bert             |    1    |  1.15259  |
|           hf_BigBird            |    1    |  1.15235  |
|      torch_multimodal_clip      |    1    | 1.139975  |
|          hf_DistilBert          |    1    | 1.134847  |
|             hf_Bart             |    1    | 1.096735  |
|       speech_transformer        |    1    | 1.069051  |
|        hf_distil_whisper        |    1    | 1.067498  |
| detectron2_fasterrcnn_r_101_dc5 |    1    |  1.05955  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  1.05089  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.046872  |
|        soft_actor_critic        |   256   | 1.043938  |
|          pytorch_unet           |    1    | 1.039711  |
|          hf_Longformer          |    1    | 1.018168  |
|             demucs              |    1    | 1.002129  |
|           tts_angular           |    1    |  0.99976  |
|     resnet50_quantized_qat      |    1    | 0.996369  |
|   mobilenet_v2_quantized_qat    |    1    | 0.985391  |
|     nvidia_deeprecommender      |    1    | 0.982874  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.917045  |
|           hf_Reformer           |    1    | 0.848207  |
|       Background_Matting        |    1    |  0.82838  |
|           hf_T5_large           |    1    |  0.79078  |
|              hf_T5              |    1    | 0.709653  |
|           hf_T5_base            |    1    | 0.599013  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|         basic_gnn_sage          |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|               drq               |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|              llama              |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|              hf_T5              |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|       speech_transformer        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|           timm_regnet           |    1    |        pass        |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_fpn |    1    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
|           Super_SloMo           |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    1    |    fail_to_run     |
|         resnext50_32x4d         |    1    |   fail_accuracy    |
|            resnet152            |    1    |   fail_accuracy    |
|       shufflenet_v2_x1_0        |    1    |   fail_accuracy    |
|           mnasnet1_0            |    1    |   fail_accuracy    |
|          mobilenet_v2           |    1    |   fail_accuracy    |
|            resnet18             |    1    |   fail_accuracy    |
|            resnet50             |    1    |   fail_accuracy    |
|           densenet121           |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           hf_T5_base            |    1    | 97.881643 |
|           densenet121           |    1    | 90.600831 |
|           hf_BigBird            |    1    | 75.451486 |
|           hf_T5_large           |    1    | 70.904912 |
|    detectron2_fcos_r_50_fpn     |    1    | 69.517508 |
|              maml               |    1    | 52.499209 |
|          hf_Longformer          |    1    | 51.774838 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 50.111665 |
|           timm_nfnet            |    1    | 48.335726 |
|           hf_Reformer           |    1    | 47.624311 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 43.464742 |
|        phlippe_densenet         |    1    | 42.847719 |
|       speech_transformer        |    1    | 39.221429 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 39.179126 |
|  timm_vision_transformer_large  |    1    | 35.783022 |
|      torch_multimodal_clip      |    1    | 35.663201 |
|             demucs              |    1    | 35.079727 |
|        timm_efficientnet        |    1    | 33.599742 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 33.407801 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 32.508378 |
|              hf_T5              |    1    | 32.44313  |
|       Background_Matting        |    1    | 28.990682 |
|             yolov3              |    1    | 28.903742 |
|         opacus_cifar10          |    1    | 28.637265 |
|        hf_distil_whisper        |    1    | 28.312089 |
|            moondream            |    1    | 27.289737 |
|      functorch_dp_cifar10       |    1    | 27.28962  |
|          hf_GPT2_large          |    1    | 26.181367 |
|          timm_resnest           |    1    | 25.631695 |
|          hf_Bert_large          |    1    | 25.448028 |
|       doctr_det_predictor       |    1    | 24.747611 |
|       shufflenet_v2_x1_0        |    1    | 24.165291 |
|       mobilenet_v3_large        |    1    | 24.051293 |
|              llama              |    1    | 23.709345 |
|          BERT_pytorch           |    1    | 22.716293 |
|          fastNLP_Bert           |    1    | 22.253411 |
|             hf_Bart             |    1    | 22.163814 |
|           timm_vovnet           |    1    | 21.616676 |
|           timm_regnet           |    1    | 21.487284 |
|     timm_vision_transformer     |    1    | 21.238717 |
|          pytorch_unet           |    1    | 20.628493 |
|            hf_Albert            |    1    | 19.829529 |
|         pytorch_stargan         |   16    | 19.303094 |
|             hf_GPT2             |    1    | 19.292954 |
|          hf_DistilBert          |    1    | 18.932221 |
|             hf_Bert             |    1    | 18.632444 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 17.891255 |
|          squeezenet1_1          |    1    | 16.043949 |
|            resnet152            |    1    | 15.605309 |
|              vgg16              |    1    | 14.966455 |
|      doctr_reco_predictor       |    1    | 13.626285 |
|             alexnet             |    1    | 12.596648 |
|     pyhpc_isoneutral_mixing     |    1    | 12.019994 |
|         resnext50_32x4d         |    1    | 11.500326 |
|            resnet50             |    1    | 11.446436 |
|               drq               |    1    | 10.915324 |
|              dlrm               |    1    | 10.501001 |
|            resnet18             |    1    | 10.150574 |
|          mobilenet_v2           |    1    | 10.008185 |
|           mnasnet1_0            |    1    | 9.835579  |
|     functorch_maml_omniglot     |    1    | 9.702112  |
|          basic_gnn_gcn          |    1    | 9.389265  |
|     nvidia_deeprecommender      |    1    | 9.118355  |
|          basic_gnn_gin          |    1    | 8.946713  |
|         LearningToPaint         |    1    |  8.93932  |
|        basic_gnn_edgecnn        |    1    | 8.869932  |
|     pyhpc_equation_of_state     |    1    | 8.732655  |
|          maml_omniglot          |    5    | 8.654829  |
|         phlippe_resnet          |    1    | 8.570029  |
|        soft_actor_critic        |   256   | 8.010461  |
|         basic_gnn_sage          |    1    |  7.83312  |
|          lennard_jones          |    1    | 5.735299  |
|              dcgan              |    1    | 5.666999  |
|           tts_angular           |    1    | 5.497788  |
|   mobilenet_v2_quantized_qat    |    1    | 0.097643  |
|     resnet50_quantized_qat      |    1    | 0.069865  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.988366 |
|           hf_T5_base            |    1    | 0.987802 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.983127 |
|             demucs              |    1    | 0.982636 |
|       Background_Matting        |    1    | 0.982149 |
|          hf_GPT2_large          |    1    | 0.977922 |
|          pytorch_unet           |    1    | 0.977507 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.971959 |
|        basic_gnn_edgecnn        |    1    | 0.971556 |
|       doctr_det_predictor       |    1    | 0.969908 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.963306 |
|     resnet50_quantized_qat      |    1    | 0.956571 |
|           hf_BigBird            |    1    | 0.947569 |
|         LearningToPaint         |    1    | 0.946646 |
|         pytorch_stargan         |   16    | 0.94428  |
|      doctr_reco_predictor       |    1    | 0.942491 |
|         basic_gnn_sage          |    1    | 0.94115  |
|          basic_gnn_gcn          |    1    | 0.938054 |
|          basic_gnn_gin          |    1    | 0.938023 |
|   mobilenet_v2_quantized_qat    |    1    | 0.933356 |
|      torch_multimodal_clip      |    1    | 0.924424 |
|              llama              |    1    | 0.919269 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.917634 |
|        hf_distil_whisper        |    1    | 0.914933 |
|           tts_angular           |    1    | 0.888479 |
|        soft_actor_critic        |   256   | 0.888368 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.887107 |
|         opacus_cifar10          |    1    | 0.883866 |
|        timm_efficientnet        |    1    | 0.87611  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.87491  |
|          mobilenet_v2           |    1    | 0.872272 |
|           mnasnet1_0            |    1    | 0.864081 |
|          maml_omniglot          |    5    | 0.863636 |
|          lennard_jones          |    1    | 0.862385 |
|          squeezenet1_1          |    1    | 0.859693 |
|     functorch_maml_omniglot     |    1    | 0.857759 |
|          fastNLP_Bert           |    1    | 0.854203 |
|          timm_resnest           |    1    | 0.850259 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.849641 |
|       mobilenet_v3_large        |    1    | 0.843694 |
|              dcgan              |    1    | 0.843058 |
|         phlippe_resnet          |    1    | 0.839945 |
|       shufflenet_v2_x1_0        |    1    | 0.839319 |
|     pyhpc_equation_of_state     |    1    | 0.83323  |
|            moondream            |    1    | 0.825857 |
|       speech_transformer        |    1    | 0.82453  |
|        phlippe_densenet         |    1    | 0.818094 |
|           timm_nfnet            |    1    | 0.814579 |
|          hf_Bert_large          |    1    | 0.813881 |
|         resnext50_32x4d         |    1    | 0.811431 |
|     pyhpc_isoneutral_mixing     |    1    | 0.809367 |
|          hf_Longformer          |    1    | 0.80643  |
|           hf_T5_large           |    1    | 0.806344 |
|             hf_Bert             |    1    | 0.803432 |
|     timm_vision_transformer     |    1    | 0.803297 |
|            hf_Albert            |    1    | 0.801942 |
|              maml               |    1    | 0.791618 |
|             hf_Bart             |    1    | 0.784941 |
|             yolov3              |    1    | 0.779146 |
|          BERT_pytorch           |    1    | 0.77898  |
|          hf_DistilBert          |    1    | 0.775858 |
|            resnet50             |    1    | 0.768286 |
|             hf_GPT2             |    1    | 0.767839 |
|               drq               |    1    | 0.761658 |
|           timm_regnet           |    1    | 0.761477 |
|            resnet18             |    1    | 0.759118 |
|           timm_vovnet           |    1    |  0.7561  |
|           densenet121           |    1    | 0.756026 |
|              hf_T5              |    1    |  0.7505  |
|           hf_Reformer           |    1    | 0.744719 |
|      functorch_dp_cifar10       |    1    | 0.743303 |
|             alexnet             |    1    | 0.735319 |
|  timm_vision_transformer_large  |    1    | 0.732628 |
|              vgg16              |    1    | 0.721931 |
|            resnet152            |    1    | 0.69236  |
|     nvidia_deeprecommender      |    1    | 0.672281 |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26138.179626 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11777.175686 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11199.020893 |
|          hf_GPT2_large          |    1    | 10106.54702  |
|           hf_T5_large           |    1    | 7474.795627  |
|            moondream            |    1    | 7346.960468  |
|        hf_distil_whisper        |    1    | 6992.037568  |
|       Background_Matting        |    1    | 6730.850254  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5671.931737  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  5035.41608  |
|          pytorch_unet           |    1    | 4856.569784  |
|  timm_vision_transformer_large  |    1    | 2781.515158  |
|    detectron2_fcos_r_50_fpn     |    1    | 2533.562701  |
|             demucs              |    1    | 2238.842947  |
|         pytorch_stargan         |   16    | 2017.474233  |
|          hf_Bert_large          |    1    | 1764.986501  |
|       doctr_det_predictor       |    1    | 1744.896678  |
|           hf_BigBird            |    1    | 1478.170427  |
|      torch_multimodal_clip      |    1    | 1243.274895  |
|          hf_Longformer          |    1    | 1113.440622  |
|             hf_Bart             |    1    |  884.515729  |
|              hf_T5              |    1    |  765.949696  |
|             hf_Bert             |    1    |  678.803928  |
|       speech_transformer        |    1    |  674.846432  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  629.106824  |
|            hf_Albert            |    1    |  571.742418  |
|          fastNLP_Bert           |    1    |  520.121411  |
|             yolov3              |    1    |  430.129246  |
|          hf_DistilBert          |    1    |  415.809936  |
|           hf_Reformer           |    1    |  414.874582  |
|             hf_GPT2             |    1    |  355.132069  |
|        basic_gnn_edgecnn        |    1    |  231.905791  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  209.993471  |
|              vgg16              |    1    |  191.386858  |
|           timm_regnet           |    1    |  150.232712  |
|          BERT_pytorch           |    1    |  141.64753   |
|            resnet152            |    1    |  137.673604  |
|           timm_nfnet            |    1    |  96.971396   |
|           timm_vovnet           |    1    |  80.284281   |
|              maml               |    1    |  74.597475   |
|     timm_vision_transformer     |    1    |  58.663355   |
|     nvidia_deeprecommender      |    1    |  58.122441   |
|         resnext50_32x4d         |    1    |  57.359837   |
|           tts_angular           |    1    |  54.252711   |
|            resnet50             |    1    |  51.113899   |
|           densenet121           |    1    |  45.482366   |
|          basic_gnn_gcn          |    1    |  35.661331   |
|          timm_resnest           |    1    |  33.528478   |
|      doctr_reco_predictor       |    1    |  23.475193   |
|              llama              |    1    |  22.798956   |
|            resnet18             |    1    |  22.483539   |
|             alexnet             |    1    |  22.207951   |
|     resnet50_quantized_qat      |    1    |  18.171322   |
|          basic_gnn_gin          |    1    |  16.724982   |
|         basic_gnn_sage          |    1    |  16.520064   |
|        timm_efficientnet        |    1    |  13.565358   |
|         LearningToPaint         |    1    |   9.831811   |
|           mnasnet1_0            |    1    |   7.898135   |
|          mobilenet_v2           |    1    |   7.708508   |
|       mobilenet_v3_large        |    1    |   7.326181   |
|   mobilenet_v2_quantized_qat    |    1    |   7.004462   |
|          squeezenet1_1          |    1    |   5.945694   |
|       shufflenet_v2_x1_0        |    1    |   5.581838   |
|        phlippe_densenet         |    1    |   3.431154   |
|        soft_actor_critic        |   256   |   3.388941   |
|         opacus_cifar10          |    1    |   2.563455   |
|      functorch_dp_cifar10       |    1    |   2.44066    |
|               drq               |    1    |   1.89128    |
|              dcgan              |    1    |   1.725073   |
|         phlippe_resnet          |    1    |   1.368362   |
|     functorch_maml_omniglot     |    1    |   0.858882   |
|          maml_omniglot          |    5    |   0.798563   |
|              dlrm               |    1    |   0.698415   |
|     pyhpc_equation_of_state     |    1    |   0.045116   |
|     pyhpc_isoneutral_mixing     |    1    |   0.043802   |
|          lennard_jones          |    1    |   0.038557   |
|              moco               |    0    |     0.0      |
|         DALLE2_pytorch          |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 2.033671 |
|     MobileBertForQuestionAnswering      | 1  | 1.582262 |
|            XLNetLMHeadModel             | 1  | 1.382296 |
|            YituTechConvBert             | 1  | 1.320198 |
|         Speech2Text2ForCausalLM         | 1  | 1.310896 |
|      GPT2ForSequenceClassification      | 1  | 1.309258 |
|          DistilBertForMaskedLM          | 1  | 1.297424 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.284084 |
|     DistilBertForQuestionAnswering      | 1  | 1.283192 |
|       BlenderbotSmallForCausalLM        | 1  | 1.28191  |
|       DebertaForQuestionAnswering       | 1  | 1.255937 |
|       MT5ForConditionalGeneration       | 1  | 1.24555  |
|          BlenderbotForCausalLM          | 1  | 1.245536 |
|     M2M100ForConditionalGeneration      | 1  | 1.240182 |
|           DebertaForMaskedLM            | 1  | 1.236303 |
|           PegasusForCausalLM            | 1  | 1.234446 |
|     PegasusForConditionalGeneration     | 1  | 1.233933 |
|               GoogleFnet                | 1  | 1.225219 |
|             XGLMForCausalLM             | 1  | 1.225187 |
|       AlbertForQuestionAnswering        | 1  | 1.20118  |
|            AlbertForMaskedLM            | 1  | 1.200096 |
|           ElectraForCausalLM            | 1  | 1.191699 |
|               DistillGPT2               | 1  | 1.19013  |
|        BertForQuestionAnswering         | 1  | 1.171636 |
|             BertForMaskedLM             | 1  | 1.166184 |
|    MegatronBertForQuestionAnswering     | 1  | 1.16339  |
|           RobertaForCausalLM            | 1  | 1.161601 |
|                CamemBert                | 1  | 1.161287 |
|           LayoutLMForMaskedLM           | 1  | 1.159893 |
|       ElectraForQuestionAnswering       | 1  | 1.15892  |
|    LayoutLMForSequenceClassification    | 1  | 1.15251  |
|            TrOCRForCausalLM             | 1  | 1.152354 |
|         MegatronBertForCausalLM         | 1  | 1.15116  |
|          DebertaV2ForMaskedLM           | 1  | 1.151124 |
|       RobertaForQuestionAnswering       | 1  | 1.15111  |
|      DebertaV2ForQuestionAnswering      | 1  | 1.150184 |
|     PLBartForConditionalGeneration      | 1  | 1.072964 |
|      MBartForConditionalGeneration      | 1  |  1.0645  |
|             BartForCausalLM             | 1  | 1.056435 |
|      BartForConditionalGeneration       | 1  | 1.054282 |
|             OPTForCausalLM              | 1  | 1.044365 |
|            PLBartForCausalLM            | 1  | 1.027391 |
|            MBartForCausalLM             | 1  | 1.011757 |
|          AllenaiLongformerBase          | 1  | 0.966421 |
|       T5ForConditionalGeneration        | 1  | 0.62448  |
|                 T5Small                 | 1  | 0.618171 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 58.802637 |
|          MobileBertForMaskedLM          | 1  | 43.755958 |
|     MobileBertForQuestionAnswering      | 1  | 42.438867 |
|     PegasusForConditionalGeneration     | 1  | 40.675432 |
|     M2M100ForConditionalGeneration      | 1  | 39.965952 |
|      MBartForConditionalGeneration      | 1  | 38.907718 |
|                 T5Small                 | 1  | 38.175352 |
|       T5ForConditionalGeneration        | 1  | 38.163163 |
|          BlenderbotForCausalLM          | 1  | 37.164912 |
|       MT5ForConditionalGeneration       | 1  | 35.870055 |
|            XLNetLMHeadModel             | 1  | 34.593334 |
|             XGLMForCausalLM             | 1  | 33.758503 |
|          DebertaV2ForMaskedLM           | 1  |  31.3084  |
| BlenderbotSmallForConditionalGeneration | 1  | 30.835882 |
|      DebertaV2ForQuestionAnswering      | 1  | 30.086156 |
|            YituTechConvBert             | 1  | 29.165253 |
|      BartForConditionalGeneration       | 1  | 28.582931 |
|         MegatronBertForCausalLM         | 1  | 28.532529 |
|     PLBartForConditionalGeneration      | 1  | 27.780527 |
|    MegatronBertForQuestionAnswering     | 1  | 27.156817 |
|             OPTForCausalLM              | 1  | 25.071791 |
|           PegasusForCausalLM            | 1  | 24.90574  |
|            MBartForCausalLM             | 1  | 24.21002  |
|            TrOCRForCausalLM             | 1  | 22.384093 |
|           DebertaForMaskedLM            | 1  | 22.287506 |
|       DebertaForQuestionAnswering       | 1  | 21.039706 |
|           ElectraForCausalLM            | 1  | 20.135675 |
|           RobertaForCausalLM            | 1  | 20.115043 |
|                CamemBert                | 1  | 20.097238 |
|          DistilBertForMaskedLM          | 1  | 19.467537 |
|       BlenderbotSmallForCausalLM        | 1  | 19.452367 |
|      GPT2ForSequenceClassification      | 1  | 19.053303 |
|           LayoutLMForMaskedLM           | 1  | 19.020972 |
|             BertForMaskedLM             | 1  | 18.924349 |
|         Speech2Text2ForCausalLM         | 1  | 18.890109 |
|       RobertaForQuestionAnswering       | 1  | 18.860147 |
|       ElectraForQuestionAnswering       | 1  | 18.837604 |
|            PLBartForCausalLM            | 1  | 18.626161 |
|    LayoutLMForSequenceClassification    | 1  | 18.604897 |
|             BartForCausalLM             | 1  | 18.300404 |
|     DistilBertForQuestionAnswering      | 1  | 18.238158 |
|        BertForQuestionAnswering         | 1  | 17.630552 |
|               GoogleFnet                | 1  | 17.363914 |
|               DistillGPT2               | 1  | 17.313914 |
|            AlbertForMaskedLM            | 1  | 13.568794 |
|       AlbertForQuestionAnswering        | 1  | 12.208026 |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.986133 |
|      MBartForConditionalGeneration      | 1  | 0.976308 |
|      GPT2ForSequenceClassification      | 1  | 0.954524 |
|          AllenaiLongformerBase          | 1  | 0.950394 |
|            MBartForCausalLM             | 1  | 0.92593  |
|            XLNetLMHeadModel             | 1  | 0.910362 |
|     PLBartForConditionalGeneration      | 1  | 0.909066 |
|       T5ForConditionalGeneration        | 1  | 0.905335 |
|                 T5Small                 | 1  | 0.904663 |
|            PLBartForCausalLM            | 1  | 0.903412 |
|       DebertaForQuestionAnswering       | 1  | 0.874329 |
|       RobertaForQuestionAnswering       | 1  | 0.867718 |
|      BartForConditionalGeneration       | 1  | 0.865494 |
|               GoogleFnet                | 1  | 0.85376  |
|        BertForQuestionAnswering         | 1  | 0.850243 |
|    LayoutLMForSequenceClassification    | 1  | 0.847961 |
|    MegatronBertForQuestionAnswering     | 1  | 0.840243 |
|       ElectraForQuestionAnswering       | 1  | 0.839369 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.836147 |
|               DistillGPT2               | 1  | 0.830076 |
|           DebertaForMaskedLM            | 1  | 0.819304 |
|         MegatronBertForCausalLM         | 1  | 0.818813 |
|           LayoutLMForMaskedLM           | 1  | 0.818159 |
|           RobertaForCausalLM            | 1  | 0.81395  |
|         Speech2Text2ForCausalLM         | 1  | 0.813869 |
|                CamemBert                | 1  | 0.812779 |
|             BertForMaskedLM             | 1  | 0.812679 |
|           ElectraForCausalLM            | 1  | 0.802266 |
|     DistilBertForQuestionAnswering      | 1  | 0.800392 |
|          BlenderbotForCausalLM          | 1  | 0.798094 |
|          DebertaV2ForMaskedLM           | 1  | 0.797549 |
|             BartForCausalLM             | 1  | 0.796402 |
|       MT5ForConditionalGeneration       | 1  | 0.787791 |
|            TrOCRForCausalLM             | 1  | 0.786053 |
|       BlenderbotSmallForCausalLM        | 1  |  0.7636  |
|            YituTechConvBert             | 1  | 0.758668 |
|           PegasusForCausalLM            | 1  | 0.750478 |
|          DistilBertForMaskedLM          | 1  | 0.746479 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.738668 |
|     MobileBertForQuestionAnswering      | 1  | 0.732389 |
|     PegasusForConditionalGeneration     | 1  | 0.716035 |
|     M2M100ForConditionalGeneration      | 1  | 0.713288 |
|          MobileBertForMaskedLM          | 1  | 0.70325  |
|             XGLMForCausalLM             | 1  | 0.698116 |
|            AlbertForMaskedLM            | 1  | 0.448554 |
|       AlbertForQuestionAnswering        | 1  | 0.443514 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12765.444574 |
|       AlbertForQuestionAnswering        | 1  | 12716.728526 |
|      MBartForConditionalGeneration      | 1  | 6130.292939  |
|      BartForConditionalGeneration       | 1  |  5671.96695  |
|             OPTForCausalLM              | 1  | 5209.813566  |
|          DebertaV2ForMaskedLM           | 1  | 5048.851646  |
|      DebertaV2ForQuestionAnswering      | 1  | 3974.351192  |
|            XLNetLMHeadModel             | 1  | 3102.710129  |
|            MBartForCausalLM             | 1  |  3037.76779  |
|          BlenderbotForCausalLM          | 1  | 2633.938067  |
|             BartForCausalLM             | 1  | 2547.170094  |
|                 T5Small                 | 1  | 2489.237204  |
|       T5ForConditionalGeneration        | 1  | 2485.016223  |
|          AllenaiLongformerBase          | 1  |  2399.68067  |
|     PLBartForConditionalGeneration      | 1  | 2201.391422  |
|         MegatronBertForCausalLM         | 1  | 2045.617937  |
|    MegatronBertForQuestionAnswering     | 1  | 1865.668946  |
|      GPT2ForSequenceClassification      | 1  | 1313.756117  |
|            PLBartForCausalLM            | 1  | 1215.089438  |
|             XGLMForCausalLM             | 1  |  834.384589  |
|           DebertaForMaskedLM            | 1  |  784.661003  |
|           RobertaForCausalLM            | 1  |  776.913348  |
|     M2M100ForConditionalGeneration      | 1  |  716.292943  |
|                CamemBert                | 1  |  693.627362  |
|           LayoutLMForMaskedLM           | 1  |  687.722553  |
|             BertForMaskedLM             | 1  |  686.537789  |
|            YituTechConvBert             | 1  |  675.762769  |
|     PegasusForConditionalGeneration     | 1  |  607.681874  |
|            TrOCRForCausalLM             | 1  |  586.838298  |
|       DebertaForQuestionAnswering       | 1  |  562.235716  |
|    LayoutLMForSequenceClassification    | 1  |  545.194539  |
|        BertForQuestionAnswering         | 1  |  544.439221  |
|       RobertaForQuestionAnswering       | 1  |  543.620865  |
|               DistillGPT2               | 1  |  502.07095   |
|               GoogleFnet                | 1  |  473.397664  |
|       MT5ForConditionalGeneration       | 1  |  301.956761  |
|           PegasusForCausalLM            | 1  |  300.538101  |
| BlenderbotSmallForConditionalGeneration | 1  |  146.778156  |
|           ElectraForCausalLM            | 1  |  135.719248  |
|          DistilBertForMaskedLM          | 1  |  100.132543  |
|       ElectraForQuestionAnswering       | 1  |  94.858488   |
|       BlenderbotSmallForCausalLM        | 1  |   85.24072   |
|          MobileBertForMaskedLM          | 1  |  67.105421   |
|     DistilBertForQuestionAnswering      | 1  |  64.398573   |
|     MobileBertForQuestionAnswering      | 1  |  40.779799   |
|         Speech2Text2ForCausalLM         | 1  |  18.997412   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.385521 |
|          inception_v3           | 1  | 2.226519 |
|           dm_nfnet_f0           | 1  | 2.203482 |
|       gluon_inception_v3        | 1  | 2.199309 |
|        adv_inception_v3         | 1  | 2.177662 |
|            nfnet_l0             | 1  | 2.168966 |
|            repvgg_a2            | 1  | 1.963587 |
|         mobilenetv2_100         | 1  | 1.95605  |
|            hrnet_w18            | 1  | 1.924924 |
|          spnasnet_100           | 1  | 1.867625 |
|           mnasnet_100           | 1  | 1.834467 |
|           fbnetc_100            | 1  | 1.829509 |
|            levit_128            | 1  | 1.788155 |
|            lcnet_050            | 1  | 1.780132 |
|          ghostnet_100           | 1  | 1.777557 |
|      mobilenetv3_large_100      | 1  | 1.743277 |
|           selecsls42b           | 1  | 1.717273 |
|           regnety_002           | 1  | 1.678019 |
|             dla102              | 1  | 1.675982 |
|        ese_vovnet19b_dw         | 1  | 1.658281 |
|          botnet26t_256          | 1  | 1.65237  |
|            fbnetv3_b            | 1  | 1.640967 |
|       tf_efficientnet_b0        | 1  | 1.625381 |
|       eca_botnext26ts_256       | 1  | 1.623314 |
|           rexnet_100            | 1  | 1.621754 |
|           resnest101e           | 1  | 1.601247 |
|          cspdarknet53           | 1  | 1.590638 |
|        eca_halonext26ts         | 1  | 1.518701 |
|         poolformer_m36          | 1  | 1.515128 |
|           res2next50            | 1  | 1.508229 |
|            tinynet_a            | 1  | 1.498551 |
|           volo_d1_224           | 1  | 1.482613 |
|        res2net50_14w_8s         | 1  | 1.473571 |
|        res2net101_26w_4s        | 1  | 1.455236 |
|           mobilevit_s           | 1  | 1.429657 |
|         visformer_small         | 1  | 1.390924 |
|           convit_base           | 1  | 1.381186 |
|          gmixer_24_224          | 1  | 1.357202 |
|     swsl_resnext101_32x16d      | 1  | 1.352234 |
|           tf_mixnet_l           | 1  | 1.327955 |
|        twins_pcpvt_base         | 1  | 1.307104 |
|            gernet_l             | 1  | 1.295627 |
|      beit_base_patch16_224      | 1  | 1.281236 |
|  swin_base_patch4_window7_224   | 1  | 1.27311  |
|          resmlp_12_224          | 1  | 1.261243 |
|        convmixer_768_32         | 1  | 1.244349 |
|          mixer_b16_224          | 1  | 1.205004 |
|      vit_base_patch16_224       | 1  | 1.201321 |
| deit_base_distilled_patch16_224 | 1  | 1.19545  |
|      xcit_large_24_p8_224       | 1  | 1.189618 |
|             dpn107              | 1  | 1.186483 |
|            mixnet_l             | 1  | 1.179142 |
|          jx_nest_base           | 1  | 1.166278 |
|          gmlp_s16_224           | 1  | 1.162118 |
|            pit_b_224            | 1  | 1.161372 |
|        tnt_s_patch16_224        | 1  | 1.149046 |
|          convnext_base          | 1  | 1.140961 |
|         crossvit_9_240          | 1  | 1.138592 |
|        sebotnet33ts_256         | 1  | 1.098782 |
|          cait_m36_384           | 1  | 0.970062 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|       eca_botnext26ts_256       | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          cait_m36_384           | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
|        eca_halonext26ts         | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|          botnet26t_256          | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|        convmixer_768_32         | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|        sebotnet33ts_256         | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|         coat_lite_mini          | 1  |  fail_to_run  |
|           resnest101e           | 1  | fail_accuracy |
|            levit_128            | 1  | fail_accuracy |
|          ghostnet_100           | 1  | fail_accuracy |
|          pnasnet5large          | 1  | fail_accuracy |
|        res2net101_26w_4s        | 1  | fail_accuracy |
|        res2net50_14w_8s         | 1  | fail_accuracy |
|             dla102              | 1  | fail_accuracy |
|     swsl_resnext101_32x16d      | 1  | fail_accuracy |
|           rexnet_100            | 1  | fail_accuracy |
|           selecsls42b           | 1  | fail_accuracy |
|           res2next50            | 1  | fail_accuracy |
|            hrnet_w18            | 1  | fail_accuracy |
|           volo_d1_224           | 1  | fail_accuracy |
|         visformer_small         | 1  | fail_accuracy |
|      xcit_large_24_p8_224       | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|          pnasnet5large          | 1  | 81.113034 |
|  swin_base_patch4_window7_224   | 1  | 80.220719 |
|           tf_mixnet_l           | 1  | 68.354396 |
|             dpn107              | 1  | 62.600262 |
|        twins_pcpvt_base         | 1  | 60.972683 |
|           mobilevit_s           | 1  | 58.253894 |
|          jx_nest_base           | 1  | 58.128269 |
|        res2net50_14w_8s         | 1  | 56.83583  |
|           rexnet_100            | 1  | 55.400246 |
|      xcit_large_24_p8_224       | 1  | 54.246758 |
|          cait_m36_384           | 1  |  53.661   |
|          ghostnet_100           | 1  | 52.828538 |
|            mixnet_l             | 1  | 50.740936 |
|         poolformer_m36          | 1  | 50.637302 |
|        sebotnet33ts_256         | 1  | 50.53088  |
|            levit_128            | 1  | 48.908191 |
|           dm_nfnet_f0           | 1  | 48.58063  |
|        eca_halonext26ts         | 1  | 48.415216 |
|         crossvit_9_240          | 1  | 48.152973 |
|        tnt_s_patch16_224        | 1  | 46.888117 |
|           volo_d1_224           | 1  | 43.761765 |
|       eca_botnext26ts_256       | 1  | 41.056724 |
|            hrnet_w18            | 1  | 40.160717 |
|        res2net101_26w_4s        | 1  | 39.712259 |
|       tf_efficientnet_b0        | 1  | 39.369498 |
|            nfnet_l0             | 1  | 37.73966  |
|          convnext_base          | 1  | 37.604934 |
|           resnest101e           | 1  | 37.371878 |
|        adv_inception_v3         | 1  | 35.788736 |
|          inception_v3           | 1  | 35.772918 |
|       gluon_inception_v3        | 1  | 35.755747 |
|            tinynet_a            | 1  | 34.418168 |
|           res2next50            | 1  | 34.223954 |
|            pit_b_224            | 1  | 33.915061 |
|           convit_base           | 1  | 31.016967 |
|          botnet26t_256          | 1  | 30.114502 |
|          cspdarknet53           | 1  | 27.469586 |
|            fbnetv3_b            | 1  | 26.781278 |
|             dla102              | 1  | 26.659506 |
|          gmlp_s16_224           | 1  | 26.274447 |
|          gmixer_24_224          | 1  | 25.075961 |
|         visformer_small         | 1  | 24.482877 |
|        ese_vovnet19b_dw         | 1  | 23.435643 |
|      mobilenetv3_large_100      | 1  | 23.217631 |
|      vit_base_patch16_224       | 1  | 21.522591 |
| deit_base_distilled_patch16_224 | 1  | 20.481216 |
|      beit_base_patch16_224      | 1  | 20.169978 |
|          mixer_b16_224          | 1  | 19.259522 |
|           regnety_002           | 1  | 18.852336 |
|            repvgg_a2            | 1  | 17.133852 |
|          resmlp_12_224          | 1  | 17.079403 |
|        convmixer_768_32         | 1  | 16.983372 |
|           selecsls42b           | 1  | 16.497826 |
|            lcnet_050            | 1  | 14.064905 |
|     swsl_resnext101_32x16d      | 1  | 12.556342 |
|          spnasnet_100           | 1  | 10.80031  |
|           fbnetc_100            | 1  | 10.742984 |
|            gernet_l             | 1  | 10.73914  |
|         mobilenetv2_100         | 1  | 10.446485 |
|           mnasnet_100           | 1  | 10.26034  |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  |  0.9543  |
|          pnasnet5large          | 1  | 0.928854 |
|        convmixer_768_32         | 1  | 0.913837 |
|            nfnet_l0             | 1  | 0.895249 |
|      xcit_large_24_p8_224       | 1  | 0.890686 |
|        ese_vovnet19b_dw         | 1  | 0.889906 |
|            fbnetv3_b            | 1  | 0.880707 |
|         mobilenetv2_100         | 1  | 0.876609 |
|           mnasnet_100           | 1  | 0.876483 |
|          spnasnet_100           | 1  | 0.876251 |
|       tf_efficientnet_b0        | 1  | 0.872411 |
|      mobilenetv3_large_100      | 1  | 0.868182 |
|           fbnetc_100            | 1  | 0.863447 |
|       eca_botnext26ts_256       | 1  | 0.862139 |
|            lcnet_050            | 1  | 0.859533 |
|            tinynet_a            | 1  | 0.859515 |
|           rexnet_100            | 1  | 0.858699 |
|         poolformer_m36          | 1  |  0.8553  |
|           dm_nfnet_f0           | 1  | 0.855119 |
|        eca_halonext26ts         | 1  | 0.854788 |
|           mobilevit_s           | 1  | 0.852488 |
|           tf_mixnet_l           | 1  | 0.844947 |
|          ghostnet_100           | 1  | 0.844825 |
|           regnety_002           | 1  | 0.842002 |
|          botnet26t_256          | 1  | 0.841008 |
|            mixnet_l             | 1  | 0.830525 |
|          resmlp_12_224          | 1  | 0.827824 |
|         visformer_small         | 1  | 0.817684 |
|           res2next50            | 1  | 0.816763 |
|            levit_128            | 1  | 0.808432 |
|          convnext_base          | 1  | 0.801774 |
|             dpn107              | 1  | 0.801567 |
|        sebotnet33ts_256         | 1  | 0.79905  |
|        res2net50_14w_8s         | 1  | 0.796034 |
|          gmlp_s16_224           | 1  | 0.795865 |
|          cspdarknet53           | 1  | 0.793767 |
|          gmixer_24_224          | 1  | 0.790959 |
|            hrnet_w18            | 1  | 0.787176 |
|        tnt_s_patch16_224        | 1  | 0.785795 |
|           volo_d1_224           | 1  | 0.784675 |
|         crossvit_9_240          | 1  | 0.780973 |
|           convit_base           | 1  | 0.780521 |
|        twins_pcpvt_base         | 1  | 0.780302 |
|          mixer_b16_224          | 1  | 0.776296 |
|             dla102              | 1  | 0.775883 |
|           resnest101e           | 1  | 0.772285 |
|      beit_base_patch16_224      | 1  | 0.768307 |
|          jx_nest_base           | 1  | 0.767829 |
|       gluon_inception_v3        | 1  | 0.762367 |
|          inception_v3           | 1  | 0.762319 |
|        adv_inception_v3         | 1  | 0.761916 |
| deit_base_distilled_patch16_224 | 1  | 0.760334 |
|      vit_base_patch16_224       | 1  | 0.759488 |
|            pit_b_224            | 1  | 0.75053  |
|           selecsls42b           | 1  | 0.740214 |
|        res2net101_26w_4s        | 1  | 0.739806 |
|  swin_base_patch4_window7_224   | 1  | 0.739185 |
|            gernet_l             | 1  | 0.737093 |
|            repvgg_a2            | 1  | 0.689544 |
|     swsl_resnext101_32x16d      | 1  | 0.640428 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 3671.417852 |
|      xcit_large_24_p8_224       | 1  | 1539.864207 |
|     swsl_resnext101_32x16d      | 1  | 439.681284  |
|          pnasnet5large          | 1  | 370.432622  |
|          convnext_base          | 1  | 307.937352  |
|             dpn107              | 1  | 264.304945  |
|        convmixer_768_32         | 1  | 246.804201  |
|          jx_nest_base           | 1  | 235.907328  |
|      beit_base_patch16_224      | 1  | 198.777359  |
| deit_base_distilled_patch16_224 | 1  | 198.531638  |
|  swin_base_patch4_window7_224   | 1  | 197.642732  |
|      vit_base_patch16_224       | 1  | 196.156981  |
|           convit_base           | 1  | 195.681577  |
|            pit_b_224            | 1  | 169.144879  |
|           resnest101e           | 1  | 167.003076  |
|           dm_nfnet_f0           | 1  | 159.526379  |
|          mixer_b16_224          | 1  | 141.028768  |
|         poolformer_m36          | 1  |  139.22791  |
|        res2net101_26w_4s        | 1  | 114.237085  |
|        twins_pcpvt_base         | 1  | 108.326887  |
|        tnt_s_patch16_224        | 1  |  95.91155   |
|           volo_d1_224           | 1  |  95.569581  |
|            nfnet_l0             | 1  |  95.282704  |
|             dla102              | 1  |  91.684021  |
|            hrnet_w18            | 1  |  86.304687  |
|        sebotnet33ts_256         | 1  |  85.502619  |
|          cspdarknet53           | 1  |  83.604989  |
|        adv_inception_v3         | 1  |  73.91748   |
|          inception_v3           | 1  |  73.788095  |
|       gluon_inception_v3        | 1  |  73.754775  |
|          gmlp_s16_224           | 1  |  69.332361  |
|         visformer_small         | 1  |  67.380246  |
|        res2net50_14w_8s         | 1  |  65.749644  |
|          gmixer_24_224          | 1  |  63.709811  |
|            repvgg_a2            | 1  |  63.525285  |
|           res2next50            | 1  |  60.093903  |
|            gernet_l             | 1  |  57.004061  |
|           selecsls42b           | 1  |  46.180579  |
|          botnet26t_256          | 1  |  45.014274  |
|        eca_halonext26ts         | 1  |  44.563729  |
|           mobilevit_s           | 1  |  42.364177  |
|       eca_botnext26ts_256       | 1  |  40.647688  |
|          resmlp_12_224          | 1  |  36.049733  |
|         crossvit_9_240          | 1  |  34.140978  |
|            mixnet_l             | 1  |  31.832973  |
|        ese_vovnet19b_dw         | 1  |  31.610442  |
|           tf_mixnet_l           | 1  |  30.88183   |
|            fbnetv3_b            | 1  |  16.026336  |
|       tf_efficientnet_b0        | 1  |  14.118791  |
|           rexnet_100            | 1  |  13.897849  |
|            tinynet_a            | 1  |  12.578741  |
|           fbnetc_100            | 1  |  9.519045   |
|            levit_128            | 1  |  9.368169   |
|          ghostnet_100           | 1  |  9.232378   |
|          spnasnet_100           | 1  |  8.556154   |
|           mnasnet_100           | 1  |  7.855862   |
|         mobilenetv2_100         | 1  |  7.659091   |
|      mobilenetv3_large_100      | 1  |  7.473447   |
|           regnety_002           | 1  |  6.599016   |
|            lcnet_050            | 1  |  2.558697   |
+---------------------------------+----+-------------+

@zxd1997066
Copy link
Contributor

[default] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-04-24 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main b91f83f18139a6ed2626c30979a38e533d4c7d7c
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000" 

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 85%, 67/79 | 100%, 46/46 | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.39x    |    1.34x    |    1.87x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   26.67    |    28.50    |    40.04    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.89x    |    0.98x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 11.381142 |
|          squeezenet1_1          |   16    | 3.091996  |
|       mobilenet_v3_large        |   32    | 2.961521  |
|          mobilenet_v2           |   16    | 2.918733  |
|           mnasnet1_0            |   32    | 2.822703  |
|        timm_efficientnet        |   64    | 2.806778  |
|       shufflenet_v2_x1_0        |   64    | 2.575939  |
|          timm_resnest           |   32    | 2.278035  |
|            resnet50             |   32    | 2.163858  |
|        phlippe_densenet         |   128   | 2.084609  |
|        soft_actor_critic        |   256   | 1.976317  |
|           densenet121           |   64    | 1.955215  |
|         resnext50_32x4d         |    8    | 1.935512  |
|            resnet152            |   32    | 1.923685  |
|         phlippe_resnet          |   128   | 1.905884  |
|       doctr_det_predictor       |    1    | 1.893728  |
|            resnet18             |    8    | 1.879386  |
|           timm_regnet           |   32    | 1.861978  |
|             hf_GPT2             |    1    | 1.788894  |
|           timm_nfnet            |   128   | 1.759071  |
|           timm_vovnet           |   32    | 1.655478  |
|             alexnet             |   128   | 1.592969  |
|          maml_omniglot          |    5    | 1.563123  |
|          BERT_pytorch           |    2    | 1.556142  |
|             yolov3              |    8    | 1.544169  |
|      doctr_reco_predictor       |    1    | 1.523731  |
|            moondream            |    1    | 1.494357  |
|            hf_Albert            |    1    | 1.483931  |
|          hf_Bert_large          |    1    | 1.483098  |
|          hf_GPT2_large          |    1    | 1.467148  |
|          fastNLP_Bert           |    1    | 1.446542  |
|              vgg16              |    4    | 1.442623  |
|             hf_Bert             |    1    | 1.421058  |
|        basic_gnn_edgecnn        |    1    | 1.420214  |
|          hf_Longformer          |    1    | 1.411049  |
|              dcgan              |   256   | 1.401521  |
|     functorch_maml_omniglot     |    1    | 1.394749  |
|         LearningToPaint         |   96    | 1.394521  |
|              llama              |   32    | 1.370844  |
|          basic_gnn_gcn          |    1    | 1.342161  |
|      torch_multimodal_clip      |   32    | 1.316136  |
|          hf_DistilBert          |    1    | 1.290771  |
|             hf_Bart             |    1    | 1.249306  |
|     timm_vision_transformer     |   32    | 1.229096  |
|          lennard_jones          |  1000   | 1.221197  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.220575  |
|        hf_distil_whisper        |    1    | 1.207886  |
|     nvidia_deeprecommender      |   256   | 1.199156  |
|         pytorch_stargan         |   16    | 1.187037  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.185647  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.181832  |
|           hf_T5_large           |    1    | 1.178971  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.176398  |
|          pytorch_unet           |    1    | 1.175433  |
|         basic_gnn_sage          |    1    | 1.173334  |
|           hf_BigBird            |    1    | 1.157114  |
|              dlrm               |  2048   | 1.150469  |
|    detectron2_fcos_r_50_fpn     |    1    |  1.14415  |
|          basic_gnn_gin          |    1    | 1.114836  |
|              hf_T5              |    1    | 1.111938  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.071207  |
|  timm_vision_transformer_large  |   32    | 1.051399  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.043917  |
|             demucs              |    1    | 1.016698  |
|     resnet50_quantized_qat      |   32    | 1.012948  |
|           hf_Reformer           |    1    | 1.001405  |
|   mobilenet_v2_quantized_qat    |   96    | 0.997701  |
|           tts_angular           |   64    |  0.9967   |
|       speech_transformer        |    1    | 0.987614  |
|               drq               |    1    |  0.9666   |
|           hf_T5_base            |    1    | 0.826266  |
|       Background_Matting        |    1    | 0.825365  |
|         opacus_cifar10          |   64    | 0.717701  |
|              maml               |    1    | 0.706132  |
|      functorch_dp_cifar10       |   64    |  0.66326  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.656843  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|         basic_gnn_sage          |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    4    |        pass        |
|          hf_Longformer          |    4    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    4    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    4    |        pass        |
|             demucs              |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|              dlrm               |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|           hf_T5_base            |    4    |        pass        |
|               drq               |    1    |        pass        |
|            hf_Albert            |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|             yolov3              |    4    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|          lennard_jones          |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|            moondream            |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|              llama              |    4    |        pass        |
|        hf_distil_whisper        |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|              hf_T5              |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|       speech_transformer        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_fpn |    4    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
|           Super_SloMo           |    4    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    4    |    fail_to_run     |
|         resnext50_32x4d         |    4    |   fail_accuracy    |
|            resnet152            |    4    |   fail_accuracy    |
|       shufflenet_v2_x1_0        |    4    |   fail_accuracy    |
|           mnasnet1_0            |    4    |   fail_accuracy    |
|              dcgan              |    4    |   fail_accuracy    |
|          mobilenet_v2           |    4    |   fail_accuracy    |
|            resnet18             |    4    |   fail_accuracy    |
|            resnet50             |    4    |   fail_accuracy    |
|           densenet121           |    4    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           densenet121           |   64    | 97.886561 |
|           hf_BigBird            |    1    | 76.088662 |
|    detectron2_fcos_r_50_fpn     |    1    | 70.076765 |
|           hf_T5_large           |    1    | 61.832829 |
|           timm_nfnet            |   128   | 52.854639 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 52.703295 |
|              maml               |    1    | 52.369783 |
|          hf_Longformer          |    1    | 51.134604 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 50.481881 |
|           hf_Reformer           |    1    | 47.213485 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 46.052316 |
|           hf_T5_base            |    1    | 44.272721 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 43.827034 |
|        phlippe_densenet         |   128   | 43.258844 |
|  timm_vision_transformer_large  |   32    | 41.487073 |
|      torch_multimodal_clip      |   32    | 39.240375 |
|     pyhpc_isoneutral_mixing     | 1048576 | 39.225797 |
|       speech_transformer        |    1    | 39.127635 |
|        timm_efficientnet        |   64    | 35.933434 |
|          hf_GPT2_large          |    1    | 35.702164 |
|             demucs              |    1    | 34.574273 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 33.231182 |
|            moondream            |    1    | 32.842333 |
|              hf_T5              |    1    | 31.039114 |
|             yolov3              |    8    | 30.740161 |
|        hf_distil_whisper        |    1    | 30.023929 |
|         opacus_cifar10          |   64    | 29.164461 |
|      functorch_dp_cifar10       |   64    | 27.842551 |
|          timm_resnest           |   32    | 27.133727 |
|       doctr_det_predictor       |    1    | 26.492857 |
|          hf_Bert_large          |    1    | 26.062661 |
|       shufflenet_v2_x1_0        |   64    | 25.16437  |
|       mobilenet_v3_large        |   32    | 24.751684 |
|       Background_Matting        |    1    | 23.796953 |
|          BERT_pytorch           |    2    | 23.755198 |
|          fastNLP_Bert           |    1    | 22.545453 |
|           timm_vovnet           |   32    | 22.51213  |
|             hf_Bart             |    1    | 22.241804 |
|              llama              |   32    | 22.109737 |
|           timm_regnet           |   32    | 22.052581 |
|     timm_vision_transformer     |   32    | 21.646862 |
|          pytorch_unet           |    1    | 21.259171 |
|            hf_Albert            |    1    | 20.223735 |
|             hf_GPT2             |    1    | 19.523932 |
|          hf_DistilBert          |    1    | 18.911154 |
|             hf_Bert             |    1    | 18.829215 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 18.256875 |
|          squeezenet1_1          |   16    | 16.542362 |
|         pytorch_stargan         |   16    | 16.379851 |
|            resnet152            |   32    | 15.86083  |
|              vgg16              |    4    | 14.815839 |
|      doctr_reco_predictor       |    1    | 13.469577 |
|             alexnet             |   128   | 12.830355 |
|          basic_gnn_gcn          |    1    | 11.828041 |
|         resnext50_32x4d         |    8    | 11.704408 |
|            resnet50             |   32    | 11.638226 |
|          basic_gnn_gin          |    1    | 11.264192 |
|         basic_gnn_sage          |    1    | 11.245919 |
|               drq               |    1    | 10.848902 |
|              dlrm               |  2048   | 10.687534 |
|            resnet18             |    8    | 10.350192 |
|          mobilenet_v2           |   16    | 10.14095  |
|           mnasnet1_0            |   32    | 9.887048  |
|     functorch_maml_omniglot     |    1    | 9.698448  |
|          maml_omniglot          |    5    | 9.330447  |
|        basic_gnn_edgecnn        |    1    | 9.203868  |
|     pyhpc_equation_of_state     | 1048576 | 9.162666  |
|         LearningToPaint         |   96    | 9.097672  |
|         phlippe_resnet          |   128   | 8.615795  |
|     nvidia_deeprecommender      |   256   | 8.531079  |
|        soft_actor_critic        |   256   | 6.758547  |
|          lennard_jones          |  1000   |  6.71879  |
|              dcgan              |   256   | 5.652473  |
|           tts_angular           |   64    | 5.531428  |
|   mobilenet_v2_quantized_qat    |   96    |  0.10962  |
|     resnet50_quantized_qat      |   32    | 0.078967  |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|           timm_nfnet            |   128   | 0.993104 |
|              dlrm               |  2048   | 0.988051 |
|           hf_T5_base            |    1    |  0.9878  |
|        timm_efficientnet        |   64    | 0.984811 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981966 |
|       Background_Matting        |    1    | 0.981827 |
|  timm_vision_transformer_large  |   32    | 0.979595 |
|             demucs              |    1    | 0.978769 |
|          pytorch_unet           |    1    | 0.978371 |
|          hf_GPT2_large          |    1    | 0.978224 |
|           densenet121           |   64    | 0.975462 |
|             yolov3              |    8    | 0.974938 |
|        basic_gnn_edgecnn        |    1    | 0.97364  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973037 |
|            resnet50             |   32    | 0.971763 |
|         LearningToPaint         |   96    | 0.969552 |
|           timm_vovnet           |   32    | 0.969346 |
|            resnet152            |   32    | 0.969276 |
|          timm_resnest           |   32    | 0.969275 |
|      torch_multimodal_clip      |   32    | 0.966019 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.965366 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.963223 |
|     resnet50_quantized_qat      |   32    | 0.963115 |
|     timm_vision_transformer     |   32    | 0.962912 |
|   mobilenet_v2_quantized_qat    |   96    | 0.962528 |
|           timm_regnet           |   32    | 0.962279 |
|           mnasnet1_0            |   32    | 0.959185 |
|       doctr_det_predictor       |    1    | 0.957293 |
|          mobilenet_v2           |   16    | 0.956068 |
|       mobilenet_v3_large        |   32    | 0.954096 |
|       shufflenet_v2_x1_0        |   64    | 0.952887 |
|           hf_BigBird            |    1    | 0.949243 |
|         pytorch_stargan         |   16    | 0.946384 |
|        phlippe_densenet         |   128   | 0.945346 |
|         resnext50_32x4d         |    8    | 0.943599 |
|          basic_gnn_gin          |    1    | 0.938014 |
|      doctr_reco_predictor       |    1    | 0.936772 |
|          basic_gnn_gcn          |    1    | 0.932782 |
|              llama              |   32    | 0.932234 |
|           tts_angular           |   64    | 0.92907  |
|          squeezenet1_1          |   16    | 0.919259 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.916694 |
|              dcgan              |   256   | 0.913624 |
|        hf_distil_whisper        |    1    | 0.911766 |
|     pyhpc_equation_of_state     | 1048576 | 0.907493 |
|            resnet18             |    8    | 0.899733 |
|             alexnet             |   128   | 0.895495 |
|         phlippe_resnet          |   128   | 0.894322 |
|         opacus_cifar10          |   64    | 0.890254 |
|        soft_actor_critic        |   256   | 0.883272 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.882586 |
|          lennard_jones          |  1000   | 0.863326 |
|          maml_omniglot          |    5    | 0.861622 |
|         basic_gnn_sage          |    1    | 0.858548 |
|     functorch_maml_omniglot     |    1    | 0.857912 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.84983  |
|          fastNLP_Bert           |    1    | 0.844465 |
|            moondream            |    1    | 0.824903 |
|       speech_transformer        |    1    | 0.814229 |
|              maml               |    1    | 0.811647 |
|          hf_Bert_large          |    1    | 0.809599 |
|           hf_T5_large           |    1    | 0.803123 |
|          BERT_pytorch           |    2    | 0.79986  |
|          hf_Longformer          |    1    | 0.799687 |
|             hf_Bert             |    1    | 0.797055 |
|      functorch_dp_cifar10       |   64    | 0.796236 |
|            hf_Albert            |    1    | 0.792383 |
|     nvidia_deeprecommender      |   256   | 0.775267 |
|              vgg16              |    4    | 0.772757 |
|               drq               |    1    | 0.771883 |
|             hf_Bart             |    1    | 0.766284 |
|          hf_DistilBert          |    1    | 0.763583 |
|             hf_GPT2             |    1    | 0.761386 |
|              hf_T5              |    1    | 0.759999 |
|           hf_Reformer           |    1    | 0.734753 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.689116 |
|              moco               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4497.222589 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1454.993718 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1325.472825 |
|           hf_T5_base            |    1    | 1276.666976 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1181.64614  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1092.762782 |
|          hf_GPT2_large          |    1    | 582.773629  |
|           timm_nfnet            |   128   | 569.247717  |
|           hf_T5_large           |    1    | 417.489479  |
|            moondream            |    1    | 411.653835  |
|        hf_distil_whisper        |    1    | 363.737421  |
|       Background_Matting        |    1    | 353.214884  |
|          pytorch_unet           |    1    | 236.765423  |
|           timm_regnet           |   32    | 223.584107  |
|            resnet152            |   32    | 197.161589  |
|           densenet121           |   64    |  195.97749  |
|    detectron2_fcos_r_50_fpn     |    1    | 184.426896  |
|             yolov3              |    8    | 164.460278  |
|      torch_multimodal_clip      |   32    | 154.004784  |
|             demucs              |    1    |  143.7933   |
|           hf_BigBird            |    1    | 133.354223  |
|           timm_vovnet           |   32    | 119.417927  |
|          hf_Bert_large          |    1    | 113.493279  |
|         pytorch_stargan         |   16    |  98.470207  |
|     timm_vision_transformer     |   32    |  91.870106  |
|       doctr_det_predictor       |    1    |  89.563473  |
|            resnet50             |   32    |  77.770348  |
|          hf_Longformer          |    1    |  72.741763  |
|       speech_transformer        |    1    |  55.487622  |
|             hf_Bart             |    1    |  55.010033  |
|          timm_resnest           |   32    |  54.227235  |
|        timm_efficientnet        |   64    |  50.221178  |
|              maml               |    1    |  49.265395  |
|              hf_T5              |    1    |  44.721009  |
|             alexnet             |   128   |  44.227201  |
|             hf_Bert             |    1    |  42.453809  |
|   mobilenet_v2_quantized_qat    |   96    |  39.777038  |
|         LearningToPaint         |   96    |  37.726007  |
|           hf_Reformer           |    1    |  37.619604  |
|            hf_Albert            |    1    |  37.487797  |
|          fastNLP_Bert           |    1    |  35.111563  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  34.217603  |
|              vgg16              |    4    |  33.09032   |
|     nvidia_deeprecommender      |   256   |  30.837044  |
|     pyhpc_isoneutral_mixing     | 1048576 |  29.46102   |
|          hf_DistilBert          |    1    |  27.94448   |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  27.823728  |
|     resnet50_quantized_qat      |   32    |  25.166909  |
|             hf_GPT2             |    1    |  24.733764  |
|              llama              |   32    |  23.087142  |
|         resnext50_32x4d         |    8    |  23.075974  |
|        basic_gnn_edgecnn        |    1    |   21.341    |
|          BERT_pytorch           |    2    |  21.18125   |
|           tts_angular           |   64    |  20.66614   |
|        phlippe_densenet         |   128   |  19.858236  |
|              dcgan              |   256   |  18.91205   |
|       shufflenet_v2_x1_0        |   64    |  15.563649  |
|           mnasnet1_0            |   32    |  14.901066  |
|       mobilenet_v3_large        |   32    |  13.07631   |
|          basic_gnn_gcn          |    1    |  10.154699  |
|      functorch_dp_cifar10       |   64    |   9.63284   |
|         opacus_cifar10          |   64    |  9.500214   |
|          mobilenet_v2           |   16    |  9.448476   |
|            resnet18             |    8    |  9.086077   |
|              dlrm               |  2048   |  6.854705   |
|          squeezenet1_1          |   16    |  5.552335   |
|         basic_gnn_sage          |    1    |   5.34069   |
|          basic_gnn_gin          |    1    |   5.14978   |
|         phlippe_resnet          |   128   |  4.022234   |
|      doctr_reco_predictor       |    1    |   3.55373   |
|     pyhpc_equation_of_state     | 1048576 |  1.114881   |
|               drq               |    1    |  0.936341   |
|     functorch_maml_omniglot     |    1    |  0.476441   |
|          maml_omniglot          |    5    |   0.36952   |
|        soft_actor_critic        |   256   |   0.30987   |
|          lennard_jones          |  1000   |  0.185583   |
|              moco               |    0    |     0.0     |
|         DALLE2_pytorch          |    0    |     0.0     |
|        timm_efficientdet        |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.625656 |
|     MobileBertForQuestionAnswering      | 128 | 1.977411 |
|      GPT2ForSequenceClassification      |  4  | 1.932479 |
|           ElectraForCausalLM            | 32  | 1.907951 |
|       ElectraForQuestionAnswering       | 64  | 1.818443 |
|          MobileBertForMaskedLM          | 128 | 1.667413 |
|               DistillGPT2               | 16  | 1.513761 |
|       RobertaForQuestionAnswering       | 16  | 1.449331 |
|            YituTechConvBert             | 16  | 1.424892 |
|               GoogleFnet                | 16  | 1.417889 |
|        BertForQuestionAnswering         | 16  | 1.417303 |
|           RobertaForCausalLM            | 16  | 1.403108 |
|    LayoutLMForSequenceClassification    | 16  | 1.401535 |
|    MegatronBertForQuestionAnswering     |  8  | 1.379915 |
|           LayoutLMForMaskedLM           | 16  | 1.374117 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.37352  |
|             BertForMaskedLM             | 16  | 1.363848 |
|          AllenaiLongformerBase          |  4  | 1.357802 |
|                CamemBert                | 16  | 1.356252 |
|         MegatronBertForCausalLM         |  4  | 1.345353 |
|             XGLMForCausalLM             |  8  | 1.285697 |
|     PLBartForConditionalGeneration      |  4  | 1.27844  |
|           DebertaForMaskedLM            |  8  | 1.275208 |
|       DebertaForQuestionAnswering       | 16  | 1.26436  |
|       AlbertForQuestionAnswering        |  4  | 1.244086 |
|            AlbertForMaskedLM            |  4  | 1.242817 |
| BlenderbotSmallForConditionalGeneration | 64  | 1.235309 |
|      MBartForConditionalGeneration      |  2  | 1.224784 |
|          BlenderbotForCausalLM          |  4  | 1.22294  |
|             OPTForCausalLM              |  2  | 1.212708 |
|          DistilBertForMaskedLM          | 128 | 1.193352 |
|         Speech2Text2ForCausalLM         | 256 | 1.191137 |
|     DistilBertForQuestionAnswering      | 256 | 1.178364 |
|          DebertaV2ForMaskedLM           |  2  | 1.177697 |
|       MT5ForConditionalGeneration       | 16  | 1.160527 |
|     M2M100ForConditionalGeneration      | 16  | 1.159202 |
|     PegasusForConditionalGeneration     | 32  | 1.143571 |
|       BlenderbotSmallForCausalLM        | 64  | 1.140002 |
|      BartForConditionalGeneration       |  2  | 1.131203 |
|             BartForCausalLM             |  4  | 1.12912  |
|           PegasusForCausalLM            | 32  | 1.12318  |
|            MBartForCausalLM             |  4  | 1.10696  |
|            TrOCRForCausalLM             | 32  | 1.070889 |
|            PLBartForCausalLM            |  8  | 1.060149 |
|                 T5Small                 |  4  | 1.007568 |
|       T5ForConditionalGeneration        |  4  | 1.003613 |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|          AllenaiLongformerBase          |  4  | 93.324282 |
|          MobileBertForMaskedLM          | 128 | 46.242591 |
|     MobileBertForQuestionAnswering      | 128 | 44.17412  |
|     PegasusForConditionalGeneration     | 32  | 43.159918 |
|     M2M100ForConditionalGeneration      | 16  | 42.792419 |
|       MT5ForConditionalGeneration       | 16  | 42.335188 |
|      MBartForConditionalGeneration      |  2  | 41.761301 |
|                 T5Small                 |  4  | 38.35015  |
|       T5ForConditionalGeneration        |  4  | 38.345147 |
|          BlenderbotForCausalLM          |  4  | 37.978974 |
|             XGLMForCausalLM             |  8  | 34.27415  |
|          DebertaV2ForMaskedLM           |  2  | 33.82947  |
| BlenderbotSmallForConditionalGeneration | 64  | 33.290642 |
|      DebertaV2ForQuestionAnswering      |  1  | 31.529848 |
|     PLBartForConditionalGeneration      |  4  | 30.867259 |
|            YituTechConvBert             | 16  | 30.762914 |
|      BartForConditionalGeneration       |  2  | 30.055721 |
|         MegatronBertForCausalLM         |  4  | 29.571859 |
|    MegatronBertForQuestionAnswering     |  8  | 28.10267  |
|             OPTForCausalLM              |  2  | 27.592917 |
|           PegasusForCausalLM            | 32  | 25.921296 |
|            MBartForCausalLM             |  4  |  25.1759  |
|            TrOCRForCausalLM             | 32  | 24.578353 |
|           DebertaForMaskedLM            |  8  | 23.966421 |
|       DebertaForQuestionAnswering       | 16  | 22.928547 |
|           RobertaForCausalLM            | 16  | 21.155104 |
|           ElectraForCausalLM            | 32  | 21.13363  |
|      GPT2ForSequenceClassification      |  4  | 21.021105 |
|          DistilBertForMaskedLM          | 128 | 20.978946 |
|       BlenderbotSmallForCausalLM        | 64  | 20.705701 |
|                CamemBert                | 16  | 20.690609 |
|            AlbertForMaskedLM            |  4  | 20.441124 |
|         Speech2Text2ForCausalLM         | 256 | 20.220081 |
|            PLBartForCausalLM            |  8  | 19.939996 |
|     DistilBertForQuestionAnswering      | 256 | 19.883514 |
|           LayoutLMForMaskedLM           | 16  | 19.416548 |
|             BertForMaskedLM             | 16  | 19.319255 |
|       AlbertForQuestionAnswering        |  4  | 19.227631 |
|       ElectraForQuestionAnswering       | 64  | 19.148251 |
|       RobertaForQuestionAnswering       | 16  | 19.129301 |
|    LayoutLMForSequenceClassification    | 16  | 19.029344 |
|             BartForCausalLM             |  4  | 18.908424 |
|               DistillGPT2               | 16  | 18.884836 |
|               GoogleFnet                | 16  | 18.303727 |
|        BertForQuestionAnswering         | 16  | 18.008409 |
|            XLNetLMHeadModel             |  8  | 14.687539 |
+-----------------------------------------+-----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       AlbertForQuestionAnswering        |  4  | 0.994397 |
|            AlbertForMaskedLM            |  4  | 0.994373 |
|     DistilBertForQuestionAnswering      | 256 | 0.993293 |
|             OPTForCausalLM              |  2  | 0.993124 |
|           RobertaForCausalLM            | 16  | 0.992439 |
|            TrOCRForCausalLM             | 32  | 0.992245 |
|               DistillGPT2               | 16  | 0.992125 |
|          DistilBertForMaskedLM          | 128 | 0.991473 |
|           ElectraForCausalLM            | 32  | 0.991119 |
|               GoogleFnet                | 16  | 0.990842 |
|       ElectraForQuestionAnswering       | 64  | 0.990832 |
|            PLBartForCausalLM            |  8  | 0.990615 |
|                CamemBert                | 16  | 0.990513 |
|           LayoutLMForMaskedLM           | 16  | 0.99036  |
|             BertForMaskedLM             | 16  | 0.990332 |
|            MBartForCausalLM             |  4  | 0.990096 |
|            YituTechConvBert             | 16  | 0.989037 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.988781 |
|       DebertaForQuestionAnswering       | 16  | 0.988751 |
|         Speech2Text2ForCausalLM         | 256 | 0.988719 |
|    LayoutLMForSequenceClassification    | 16  | 0.988713 |
|       RobertaForQuestionAnswering       | 16  | 0.988651 |
|    MegatronBertForQuestionAnswering     |  8  | 0.988626 |
|        BertForQuestionAnswering         | 16  | 0.988493 |
|     PLBartForConditionalGeneration      |  4  | 0.98811  |
|          MobileBertForMaskedLM          | 128 | 0.987319 |
|           PegasusForCausalLM            | 32  | 0.987192 |
|      GPT2ForSequenceClassification      |  4  | 0.986615 |
|       BlenderbotSmallForCausalLM        | 64  | 0.986355 |
|           DebertaForMaskedLM            |  8  | 0.985513 |
|             BartForCausalLM             |  4  | 0.985416 |
|            XLNetLMHeadModel             |  8  | 0.985274 |
|         MegatronBertForCausalLM         |  4  | 0.984468 |
|       T5ForConditionalGeneration        |  4  | 0.983595 |
|                 T5Small                 |  4  | 0.983312 |
|     MobileBertForQuestionAnswering      | 128 | 0.982728 |
|          AllenaiLongformerBase          |  4  | 0.979235 |
|     PegasusForConditionalGeneration     | 32  | 0.977624 |
|      MBartForConditionalGeneration      |  2  | 0.976315 |
|      BartForConditionalGeneration       |  2  | 0.968363 |
|       MT5ForConditionalGeneration       | 16  | 0.963579 |
|             XGLMForCausalLM             |  8  | 0.931748 |
|     M2M100ForConditionalGeneration      | 16  | 0.927184 |
|          DebertaV2ForMaskedLM           |  2  | 0.904541 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.869473 |
|          BlenderbotForCausalLM          |  4  | 0.843835 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|            AlbertForMaskedLM            |  4  | 2546.514995 |
|       AlbertForQuestionAnswering        |  4  | 2538.046463 |
|            XLNetLMHeadModel             |  8  | 1301.294243 |
|            TrOCRForCausalLM             | 32  |  982.98248  |
|     PegasusForConditionalGeneration     | 32  | 946.836382  |
|     DistilBertForQuestionAnswering      | 256 | 845.841662  |
|    MegatronBertForQuestionAnswering     |  8  | 751.226576  |
|      MBartForConditionalGeneration      |  2  | 672.604224  |
|            MBartForCausalLM             |  4  | 655.742685  |
|          DistilBertForMaskedLM          | 128 | 639.008386  |
|           RobertaForCausalLM            | 16  | 612.979406  |
|      BartForConditionalGeneration       |  2  | 596.548677  |
|          BlenderbotForCausalLM          |  4  |  593.08685  |
|             OPTForCausalLM              |  2  | 589.066469  |
|          DebertaV2ForMaskedLM           |  2  | 586.380067  |
|     M2M100ForConditionalGeneration      | 16  | 585.942258  |
|            YituTechConvBert             | 16  | 583.335688  |
|                CamemBert                | 16  | 568.629779  |
|             BertForMaskedLM             | 16  | 559.705411  |
|           LayoutLMForMaskedLM           | 16  | 558.287257  |
|       DebertaForQuestionAnswering       | 16  | 525.767833  |
|          AllenaiLongformerBase          |  4  | 525.642896  |
|            PLBartForCausalLM            |  8  | 520.930221  |
|             BartForCausalLM             |  4  | 498.709254  |
|     PLBartForConditionalGeneration      |  4  | 467.718329  |
| BlenderbotSmallForConditionalGeneration | 64  | 467.321681  |
|           PegasusForCausalLM            | 32  | 466.081426  |
|    LayoutLMForSequenceClassification    | 16  | 455.848458  |
|        BertForQuestionAnswering         | 16  | 450.609689  |
|         MegatronBertForCausalLM         |  4  | 440.121849  |
|       RobertaForQuestionAnswering       | 16  |  434.41424  |
|               GoogleFnet                | 16  |  401.62971  |
|       T5ForConditionalGeneration        |  4  | 397.948022  |
|                 T5Small                 |  4  | 396.888024  |
|               DistillGPT2               | 16  | 390.841434  |
|          MobileBertForMaskedLM          | 128 | 383.223094  |
|           DebertaForMaskedLM            |  8  | 350.232573  |
|             XGLMForCausalLM             |  8  | 333.074115  |
|       ElectraForQuestionAnswering       | 64  | 310.853807  |
|       BlenderbotSmallForCausalLM        | 64  | 272.863504  |
|         Speech2Text2ForCausalLM         | 256 | 259.836807  |
|      GPT2ForSequenceClassification      |  4  | 257.865069  |
|      DebertaV2ForQuestionAnswering      |  1  | 239.315891  |
|       MT5ForConditionalGeneration       | 16  | 237.854786  |
|           ElectraForCausalLM            | 32  | 231.746618  |
|     MobileBertForQuestionAnswering      | 128 | 226.136501  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           fbnetc_100            | 512  | 3.878189 |
|           mnasnet_100           | 512  | 3.835496 |
|            lcnet_050            | 256  | 3.83501  |
|         mobilenetv2_100         | 128  | 3.739399 |
|      mobilenetv3_large_100      | 512  | 3.54905  |
|          spnasnet_100           | 128  | 3.538917 |
|            fbnetv3_b            | 256  | 3.388267 |
|           regnety_002           | 1024 | 3.262662 |
|           rexnet_100            | 256  | 3.035341 |
|       tf_efficientnet_b0        | 128  | 2.882394 |
|            tinynet_a            | 128  | 2.837495 |
|          pnasnet5large          |  16  | 2.591362 |
|        ese_vovnet19b_dw         | 256  | 2.573388 |
|            hrnet_w18            | 128  | 2.512522 |
|          botnet26t_256          | 128  | 2.507726 |
|           res2next50            | 128  | 2.369534 |
|          ghostnet_100           | 512  | 2.312447 |
|       eca_botnext26ts_256       | 128  | 2.301321 |
|       gluon_inception_v3        | 256  | 2.259099 |
|          inception_v3           | 128  | 2.229241 |
|        adv_inception_v3         | 128  | 2.198825 |
|           resnest101e           |  64  | 2.192323 |
|        eca_halonext26ts         | 128  | 2.180765 |
|             dla102              | 128  | 2.162911 |
|        res2net50_14w_8s         | 128  | 2.108578 |
|        res2net101_26w_4s        | 128  | 2.086147 |
|          cspdarknet53           |  64  | 2.011799 |
|            repvgg_a2            | 128  | 2.003286 |
|            nfnet_l0             | 128  | 1.942547 |
|        convmixer_768_32         |  32  | 1.928222 |
|           tf_mixnet_l           | 128  | 1.877418 |
|            gernet_l             | 128  | 1.864062 |
|           dm_nfnet_f0           | 128  | 1.771829 |
|           selecsls42b           | 128  | 1.75518  |
|        sebotnet33ts_256         |  64  | 1.747348 |
|           volo_d1_224           |  64  | 1.692298 |
|            mixnet_l             | 128  | 1.691642 |
|           mobilevit_s           |  64  | 1.658191 |
|         poolformer_m36          |  64  | 1.633161 |
|         visformer_small         | 128  | 1.626734 |
|     swsl_resnext101_32x16d      |  32  | 1.596469 |
|           convit_base           |  64  | 1.542994 |
|             dpn107              |  64  | 1.483915 |
|            levit_128            | 1024 | 1.457185 |
|          gmlp_s16_224           | 128  | 1.393942 |
|      xcit_large_24_p8_224       |  16  | 1.331125 |
|          gmixer_24_224          | 128  | 1.324965 |
|  swin_base_patch4_window7_224   |  64  | 1.282035 |
|          mixer_b16_224          | 128  | 1.213533 |
|        twins_pcpvt_base         | 128  | 1.20996  |
|        tnt_s_patch16_224        | 128  | 1.197998 |
|          convnext_base          |  64  | 1.182683 |
|      beit_base_patch16_224      |  64  | 1.169741 |
|      vit_base_patch16_224       |  64  | 1.15023  |
| deit_base_distilled_patch16_224 |  64  | 1.149956 |
|          cait_m36_384           |  4   | 1.145679 |
|            pit_b_224            |  64  | 1.105505 |
|          jx_nest_base           |  32  | 1.089209 |
|         crossvit_9_240          | 256  | 1.04868  |
|          resmlp_12_224          | 128  | 0.745251 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 8  |     pass      |
|           dm_nfnet_f0           | 8  |     pass      |
|            mixnet_l             | 8  |     pass      |
|       eca_botnext26ts_256       | 8  |     pass      |
|         crossvit_9_240          | 8  |     pass      |
| deit_base_distilled_patch16_224 | 8  |     pass      |
|          convnext_base          | 8  |     pass      |
|           regnety_002           | 8  |     pass      |
|           convit_base           | 8  |     pass      |
|          gmlp_s16_224           | 8  |     pass      |
|          cait_m36_384           | 8  |     pass      |
|             dpn107              | 8  |     pass      |
|      vit_base_patch16_224       | 8  |     pass      |
|          cspdarknet53           | 8  |     pass      |
|        eca_halonext26ts         | 8  |     pass      |
|        ese_vovnet19b_dw         | 8  |     pass      |
|           fbnetc_100            | 8  |     pass      |
|            fbnetv3_b            | 8  |     pass      |
|            gernet_l             | 8  |     pass      |
|          botnet26t_256          | 8  |     pass      |
|       gluon_inception_v3        | 8  |     pass      |
|          gmixer_24_224          | 8  |     pass      |
|          jx_nest_base           | 8  |     pass      |
|        convmixer_768_32         | 8  |     pass      |
|        twins_pcpvt_base         | 8  |     pass      |
|         poolformer_m36          | 8  |     pass      |
|            lcnet_050            | 8  |     pass      |
|          mixer_b16_224          | 8  |     pass      |
|        tnt_s_patch16_224        | 8  |     pass      |
|           mnasnet_100           | 8  |     pass      |
|         mobilenetv2_100         | 8  |     pass      |
|      mobilenetv3_large_100      | 8  |     pass      |
|           mobilevit_s           | 8  |     pass      |
|            nfnet_l0             | 8  |     pass      |
|            pit_b_224            | 8  |     pass      |
|      beit_base_patch16_224      | 8  |     pass      |
|            repvgg_a2            | 8  |     pass      |
|  swin_base_patch4_window7_224   | 8  |     pass      |
|          inception_v3           | 8  |     pass      |
|           tf_mixnet_l           | 8  |     pass      |
|       tf_efficientnet_b0        | 8  |     pass      |
|            tinynet_a            | 8  |     pass      |
|          spnasnet_100           | 8  |     pass      |
|        sebotnet33ts_256         | 8  |     pass      |
|          resmlp_12_224          | 8  |     pass      |
|         coat_lite_mini          | 8  |  fail_to_run  |
|           resnest101e           | 8  | fail_accuracy |
|            levit_128            | 8  | fail_accuracy |
|          ghostnet_100           | 8  | fail_accuracy |
|          pnasnet5large          | 8  | fail_accuracy |
|        res2net101_26w_4s        | 8  | fail_accuracy |
|        res2net50_14w_8s         | 8  | fail_accuracy |
|             dla102              | 8  | fail_accuracy |
|     swsl_resnext101_32x16d      | 8  | fail_accuracy |
|           rexnet_100            | 8  | fail_accuracy |
|           selecsls42b           | 8  | fail_accuracy |
|           res2next50            | 8  | fail_accuracy |
|            hrnet_w18            | 8  | fail_accuracy |
|           volo_d1_224           | 8  | fail_accuracy |
|         visformer_small         | 8  | fail_accuracy |
|      xcit_large_24_p8_224       | 8  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+------+-----------+
|              name               |  bs  | inductor  |
+---------------------------------+------+-----------+
|          pnasnet5large          |  16  | 94.355032 |
|  swin_base_patch4_window7_224   |  64  | 90.13833  |
|           mobilevit_s           |  64  | 84.653041 |
|           tf_mixnet_l           | 128  | 78.776524 |
|             dpn107              |  64  | 77.663459 |
|        twins_pcpvt_base         | 128  | 70.902956 |
|           rexnet_100            | 256  | 69.73633  |
|        eca_halonext26ts         | 128  | 67.110951 |
|        sebotnet33ts_256         |  64  | 64.611345 |
|        res2net50_14w_8s         | 128  | 64.555585 |
|          jx_nest_base           |  32  | 64.095214 |
|      xcit_large_24_p8_224       |  16  | 63.921244 |
|          ghostnet_100           | 512  | 63.421382 |
|            levit_128            | 1024 | 62.278676 |
|          cait_m36_384           |  4   | 61.196852 |
|         crossvit_9_240          | 256  | 59.359167 |
|        tnt_s_patch16_224        | 128  | 58.569065 |
|            mixnet_l             | 128  | 57.310479 |
|           dm_nfnet_f0           | 128  | 55.931596 |
|         poolformer_m36          |  64  | 55.87715  |
|       eca_botnext26ts_256       | 128  | 53.311338 |
|           volo_d1_224           |  64  | 52.393085 |
|        res2net101_26w_4s        | 128  | 44.317459 |
|          convnext_base          |  64  | 44.268008 |
|       tf_efficientnet_b0        | 128  | 43.93968  |
|            nfnet_l0             | 128  | 42.996295 |
|            hrnet_w18            | 128  | 42.416997 |
|       gluon_inception_v3        | 256  | 41.132754 |
|           convit_base           |  64  | 40.16918  |
|          botnet26t_256          | 128  | 39.714888 |
|        adv_inception_v3         | 128  | 39.436388 |
|          inception_v3           | 128  | 39.337183 |
|           res2next50            | 128  | 38.133579 |
|            tinynet_a            | 128  | 37.570239 |
|            pit_b_224            |  64  | 36.229731 |
|           resnest101e           |  64  | 32.199008 |
|            fbnetv3_b            | 256  | 31.308982 |
|         visformer_small         | 128  | 29.873717 |
|          cspdarknet53           |  64  | 29.160824 |
|          gmlp_s16_224           | 128  | 28.985707 |
|        ese_vovnet19b_dw         | 256  | 28.600137 |
|             dla102              | 128  | 28.543467 |
|          gmixer_24_224          | 128  | 27.726461 |
|      mobilenetv3_large_100      | 512  | 26.558364 |
|      vit_base_patch16_224       |  64  | 22.644912 |
|      beit_base_patch16_224      |  64  | 21.571188 |
| deit_base_distilled_patch16_224 |  64  | 21.46516  |
|          mixer_b16_224          | 128  | 21.274524 |
|           regnety_002           | 1024 | 20.356346 |
|          resmlp_12_224          | 128  | 19.13671  |
|        convmixer_768_32         |  32  | 16.999859 |
|           selecsls42b           | 128  | 16.790047 |
|            repvgg_a2            | 128  | 16.566212 |
|            lcnet_050            | 256  | 14.575582 |
|     swsl_resnext101_32x16d      |  32  | 13.306207 |
|          spnasnet_100           | 128  | 10.762222 |
|         mobilenetv2_100         | 128  | 10.559087 |
|            gernet_l             | 128  | 10.388664 |
|           fbnetc_100            | 512  | 9.638315  |
|           mnasnet_100           | 512  | 9.228089  |
+---------------------------------+------+-----------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.997592 |
|           fbnetc_100            | 512  | 0.996746 |
|      mobilenetv3_large_100      | 512  | 0.996398 |
|            fbnetv3_b            | 256  | 0.996322 |
|           mnasnet_100           | 512  | 0.99565  |
|           regnety_002           | 1024 | 0.995544 |
|          ghostnet_100           | 512  | 0.995246 |
|           dm_nfnet_f0           | 128  | 0.995172 |
|       eca_botnext26ts_256       | 128  | 0.994356 |
|          convnext_base          |  64  | 0.993937 |
|            levit_128            | 1024 | 0.993851 |
|           rexnet_100            | 256  | 0.99353  |
|        eca_halonext26ts         | 128  | 0.993452 |
|        res2net101_26w_4s        | 128  | 0.993145 |
|            nfnet_l0             | 128  | 0.993137 |
|           res2next50            | 128  | 0.992934 |
|         mobilenetv2_100         | 128  | 0.992876 |
|          botnet26t_256          | 128  | 0.992654 |
|       tf_efficientnet_b0        | 128  | 0.992456 |
|           tf_mixnet_l           | 128  | 0.992449 |
|        convmixer_768_32         |  32  | 0.992329 |
|            mixnet_l             | 128  | 0.992034 |
|          cspdarknet53           |  64  | 0.991955 |
|        twins_pcpvt_base         | 128  | 0.991864 |
|            gernet_l             | 128  | 0.991781 |
|          gmlp_s16_224           | 128  | 0.991723 |
|       gluon_inception_v3        | 256  | 0.991625 |
|        sebotnet33ts_256         |  64  | 0.991389 |
|         visformer_small         | 128  | 0.991367 |
|          gmixer_24_224          | 128  | 0.990941 |
|           mobilevit_s           |  64  | 0.990621 |
|      xcit_large_24_p8_224       |  16  | 0.990613 |
|          mixer_b16_224          | 128  | 0.990531 |
|        res2net50_14w_8s         | 128  | 0.990328 |
|             dla102              | 128  | 0.989397 |
|           selecsls42b           | 128  | 0.989196 |
|           convit_base           |  64  | 0.988775 |
|  swin_base_patch4_window7_224   |  64  | 0.988711 |
|      beit_base_patch16_224      |  64  | 0.988266 |
|          pnasnet5large          |  16  | 0.987914 |
|            tinynet_a            | 128  | 0.987466 |
|          spnasnet_100           | 128  | 0.987466 |
|        tnt_s_patch16_224        | 128  | 0.98725  |
|            hrnet_w18            | 128  | 0.987219 |
|         poolformer_m36          |  64  | 0.987055 |
| deit_base_distilled_patch16_224 |  64  | 0.986584 |
|          resmlp_12_224          | 128  | 0.986297 |
|      vit_base_patch16_224       |  64  | 0.98624  |
|        adv_inception_v3         | 128  | 0.986139 |
|          inception_v3           | 128  | 0.985976 |
|             dpn107              |  64  | 0.985418 |
|           resnest101e           |  64  | 0.985121 |
|            lcnet_050            | 256  | 0.984065 |
|            pit_b_224            |  64  | 0.983375 |
|           volo_d1_224           |  64  | 0.981606 |
|            repvgg_a2            | 128  | 0.981396 |
|          jx_nest_base           |  32  | 0.980063 |
|     swsl_resnext101_32x16d      |  32  | 0.98006  |
|          cait_m36_384           |  4   | 0.97885  |
|         crossvit_9_240          | 256  | 0.967401 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+-------------+
|              name               |  bs  |  inductor   |
+---------------------------------+------+-------------+
|      xcit_large_24_p8_224       |  16  | 1340.050147 |
|          cait_m36_384           |  4   | 1136.539014 |
|          convnext_base          |  64  | 1063.838953 |
|           dm_nfnet_f0           | 128  | 1007.816663 |
|             dpn107              |  64  | 943.654637  |
|          mixer_b16_224          | 128  | 943.084099  |
|       gluon_inception_v3        | 256  | 838.188972  |
|        tnt_s_patch16_224        | 128  | 764.853867  |
|        twins_pcpvt_base         | 128  | 751.269358  |
|  swin_base_patch4_window7_224   |  64  | 740.821966  |
|           convit_base           |  64  | 719.414509  |
|        res2net101_26w_4s        | 128  | 662.996318  |
|     swsl_resnext101_32x16d      |  32  | 650.819639  |
| deit_base_distilled_patch16_224 |  64  | 640.420534  |
|      vit_base_patch16_224       |  64  |  638.07423  |
|      beit_base_patch16_224      |  64  | 633.360444  |
|            nfnet_l0             | 128  | 629.738426  |
|            levit_128            | 1024 | 571.014222  |
|        ese_vovnet19b_dw         | 256  | 562.610549  |
|             dla102              | 128  | 542.782991  |
|            pit_b_224            |  64  | 535.849147  |
|          gmlp_s16_224           | 128  | 516.874339  |
|          jx_nest_base           |  32  | 505.914239  |
|           resnest101e           |  64  | 498.534231  |
|         crossvit_9_240          | 256  | 498.253604  |
|          gmixer_24_224          | 128  | 494.957394  |
|         poolformer_m36          |  64  | 486.180466  |
|        convmixer_768_32         |  32  | 451.601922  |
|            hrnet_w18            | 128  | 440.318819  |
|          resmlp_12_224          | 128  | 432.210837  |
|           volo_d1_224           |  64  | 422.166789  |
|          inception_v3           | 128  |  420.2595   |
|        adv_inception_v3         | 128  | 419.527039  |
|        res2net50_14w_8s         | 128  | 409.959499  |
|         visformer_small         | 128  |  404.39138  |
|           res2next50            | 128  | 368.194037  |
|          ghostnet_100           | 512  | 362.030133  |
|            mixnet_l             | 128  | 359.234152  |
|          pnasnet5large          |  16  | 347.155634  |
|            repvgg_a2            | 128  | 346.716959  |
|           tf_mixnet_l           | 128  | 346.362841  |
|        eca_halonext26ts         | 128  | 320.767157  |
|           fbnetc_100            | 512  | 309.423783  |
|            gernet_l             | 128  | 304.432042  |
|       eca_botnext26ts_256       | 128  | 299.174417  |
|        sebotnet33ts_256         |  64  | 286.805387  |
|          botnet26t_256          | 128  | 286.706687  |
|           regnety_002           | 1024 | 283.657629  |
|          cspdarknet53           |  64  | 268.615155  |
|           mnasnet_100           | 512  | 259.777371  |
|            fbnetv3_b            | 256  | 247.356193  |
|           selecsls42b           | 128  | 236.758113  |
|      mobilenetv3_large_100      | 512  | 233.752281  |
|           rexnet_100            | 256  | 229.158805  |
|           mobilevit_s           |  64  | 227.347592  |
|       tf_efficientnet_b0        | 128  | 120.824464  |
|            tinynet_a            | 128  |  82.726334  |
|         mobilenetv2_100         | 128  |  72.704523  |
|          spnasnet_100           | 128  |  66.205933  |
|            lcnet_050            | 256  |  27.023361  |
+---------------------------------+------+-------------+

@zxd1997066
Copy link
Contributor

[default] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-04-24 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main b91f83f18139a6ed2626c30979a38e533d4c7d7c
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 86%, 68/79 | 100%, 46/46 | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.54x    |    1.19x    |    1.47x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   25.80    |    26.56    |    34.41    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.86x    |    0.81x    |    0.82x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 64.836681 |
|     pyhpc_equation_of_state     |    1    | 23.788546 |
|          maml_omniglot          |    5    | 3.873013  |
|         basic_gnn_sage          |    1    | 3.584538  |
|          basic_gnn_gin          |    1    | 3.481038  |
|     functorch_maml_omniglot     |    1    | 3.363808  |
|          squeezenet1_1          |    1    |  3.28367  |
|          basic_gnn_gcn          |    1    | 2.865743  |
|           timm_nfnet            |    1    | 2.777572  |
|         opacus_cifar10          |    1    | 2.245157  |
|            resnet18             |    1    | 2.174256  |
|              dcgan              |    1    |  2.12962  |
|      functorch_dp_cifar10       |    1    | 2.078445  |
|       shufflenet_v2_x1_0        |    1    | 2.064275  |
|          timm_resnest           |    1    | 1.968291  |
|          mobilenet_v2           |    1    | 1.879139  |
|          lennard_jones          |    1    | 1.801877  |
|            resnet50             |    1    | 1.768648  |
|           mnasnet1_0            |    1    | 1.746825  |
|       mobilenet_v3_large        |    1    | 1.701604  |
|         phlippe_resnet          |    1    | 1.669452  |
|            resnet152            |    1    | 1.651452  |
|           densenet121           |    1    | 1.647581  |
|        timm_efficientnet        |    1    | 1.605707  |
|           timm_vovnet           |    1    |  1.58814  |
|         LearningToPaint         |    1    |  1.56792  |
|      doctr_reco_predictor       |    1    | 1.496623  |
|         resnext50_32x4d         |    1    | 1.478457  |
|           timm_regnet           |    1    | 1.478084  |
|        phlippe_densenet         |    1    | 1.458846  |
|              vgg16              |    1    | 1.431545  |
|        basic_gnn_edgecnn        |    1    | 1.399552  |
|              llama              |    1    | 1.379003  |
|             yolov3              |    1    | 1.374561  |
|       doctr_det_predictor       |    1    | 1.363116  |
|             alexnet             |    1    |  1.36218  |
|          BERT_pytorch           |    1    | 1.294292  |
|            hf_Albert            |    1    | 1.285047  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.281663  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.277285  |
|              maml               |    1    | 1.251834  |
|             hf_GPT2             |    1    |  1.2476   |
|               drq               |    1    | 1.224278  |
|          hf_GPT2_large          |    1    | 1.223112  |
|            moondream            |    1    | 1.220414  |
|          fastNLP_Bert           |    1    | 1.207819  |
|     timm_vision_transformer     |    1    | 1.197211  |
|         pytorch_stargan         |   16    | 1.181433  |
|        soft_actor_critic        |   256   | 1.179311  |
|          hf_Bert_large          |    1    | 1.172015  |
|  timm_vision_transformer_large  |    1    | 1.169647  |
|              dlrm               |    1    | 1.156533  |
|             hf_Bert             |    1    | 1.156456  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  1.15408  |
|           hf_BigBird            |    1    | 1.153829  |
|      torch_multimodal_clip      |    1    | 1.136615  |
|          hf_DistilBert          |    1    | 1.135743  |
|             hf_Bart             |    1    | 1.102135  |
|       speech_transformer        |    1    |  1.06867  |
|        hf_distil_whisper        |    1    | 1.068003  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.059921  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.047519  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.044103  |
|          pytorch_unet           |    1    | 1.039071  |
|          hf_Longformer          |    1    | 1.015259  |
|             demucs              |    1    | 1.000521  |
|           tts_angular           |    1    | 1.000297  |
|     resnet50_quantized_qat      |    1    | 0.996245  |
|   mobilenet_v2_quantized_qat    |    1    | 0.986425  |
|     nvidia_deeprecommender      |    1    | 0.934626  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.918861  |
|           hf_Reformer           |    1    |  0.84762  |
|       Background_Matting        |    1    | 0.798272  |
|           hf_T5_large           |    1    | 0.791747  |
|              hf_T5              |    1    |  0.70592  |
|           hf_T5_base            |    1    | 0.599824  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|         basic_gnn_sage          |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|               drq               |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|              llama              |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|              hf_T5              |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|       speech_transformer        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|           timm_regnet           |    1    |        pass        |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_fpn |    1    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
|           Super_SloMo           |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    1    |    fail_to_run     |
|         resnext50_32x4d         |    1    |   fail_accuracy    |
|            resnet152            |    1    |   fail_accuracy    |
|       shufflenet_v2_x1_0        |    1    |   fail_accuracy    |
|           mnasnet1_0            |    1    |   fail_accuracy    |
|          mobilenet_v2           |    1    |   fail_accuracy    |
|            resnet18             |    1    |   fail_accuracy    |
|            resnet50             |    1    |   fail_accuracy    |
|           densenet121           |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           hf_T5_base            |    1    | 99.59301  |
|           densenet121           |    1    | 90.565526 |
|           hf_BigBird            |    1    | 75.406243 |
|           hf_T5_large           |    1    | 70.962283 |
|    detectron2_fcos_r_50_fpn     |    1    | 69.622274 |
|              maml               |    1    | 52.49393  |
|          hf_Longformer          |    1    | 51.53361  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 49.896418 |
|           timm_nfnet            |    1    | 48.394692 |
|           hf_Reformer           |    1    | 47.707499 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 43.533301 |
|        phlippe_densenet         |    1    | 42.839908 |
|       speech_transformer        |    1    | 39.153188 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 39.104128 |
|  timm_vision_transformer_large  |    1    | 36.04118  |
|      torch_multimodal_clip      |    1    | 35.708939 |
|             demucs              |    1    | 35.077267 |
|        timm_efficientnet        |    1    | 33.642652 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 33.35182  |
|              hf_T5              |    1    | 32.507357 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 31.703904 |
|       Background_Matting        |    1    | 30.459074 |
|             yolov3              |    1    | 28.870139 |
|         opacus_cifar10          |    1    | 28.652444 |
|        hf_distil_whisper        |    1    | 28.197365 |
|          hf_GPT2_large          |    1    | 27.422653 |
|      functorch_dp_cifar10       |    1    | 27.356727 |
|            moondream            |    1    | 26.885854 |
|          timm_resnest           |    1    | 25.602367 |
|          hf_Bert_large          |    1    | 25.50339  |
|       doctr_det_predictor       |    1    | 24.233582 |
|       shufflenet_v2_x1_0        |    1    | 24.122254 |
|       mobilenet_v3_large        |    1    | 24.083279 |
|              llama              |    1    | 23.758222 |
|          BERT_pytorch           |    1    | 22.741696 |
|             hf_Bart             |    1    | 22.182116 |
|          fastNLP_Bert           |    1    | 22.100431 |
|           timm_vovnet           |    1    | 21.645096 |
|           timm_regnet           |    1    | 21.297812 |
|     timm_vision_transformer     |    1    | 21.234835 |
|          pytorch_unet           |    1    | 20.746018 |
|            hf_Albert            |    1    | 19.818691 |
|             hf_GPT2             |    1    | 19.321286 |
|          hf_DistilBert          |    1    | 18.918863 |
|             hf_Bert             |    1    | 18.652033 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 17.898314 |
|          squeezenet1_1          |    1    | 16.036627 |
|            resnet152            |    1    | 15.559981 |
|              vgg16              |    1    | 14.958484 |
|         pytorch_stargan         |   16    | 14.625137 |
|      doctr_reco_predictor       |    1    | 13.60853  |
|             alexnet             |    1    | 12.63502  |
|     pyhpc_isoneutral_mixing     |    1    | 11.992269 |
|         resnext50_32x4d         |    1    | 11.480588 |
|            resnet50             |    1    | 11.459312 |
|               drq               |    1    | 10.914949 |
|              dlrm               |    1    | 10.518867 |
|            resnet18             |    1    | 10.140359 |
|          mobilenet_v2           |    1    | 10.009013 |
|           mnasnet1_0            |    1    | 9.824366  |
|     functorch_maml_omniglot     |    1    | 9.725826  |
|          basic_gnn_gcn          |    1    | 9.385162  |
|          maml_omniglot          |    5    | 9.343808  |
|     nvidia_deeprecommender      |    1    | 9.129407  |
|         LearningToPaint         |    1    | 8.935623  |
|          basic_gnn_gin          |    1    | 8.934672  |
|        basic_gnn_edgecnn        |    1    | 8.877048  |
|     pyhpc_equation_of_state     |    1    | 8.711456  |
|         phlippe_resnet          |    1    |  8.57111  |
|         basic_gnn_sage          |    1    | 7.837492  |
|        soft_actor_critic        |   256   | 6.790817  |
|          lennard_jones          |    1    |  5.75129  |
|              dcgan              |    1    |  5.67369  |
|           tts_angular           |    1    | 5.491635  |
|   mobilenet_v2_quantized_qat    |    1    | 0.096932  |
|     resnet50_quantized_qat      |    1    | 0.070857  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.988462 |
|           hf_T5_base            |    1    | 0.987529 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.982786 |
|       Background_Matting        |    1    | 0.982299 |
|             demucs              |    1    | 0.982003 |
|          pytorch_unet           |    1    | 0.977584 |
|          hf_GPT2_large          |    1    | 0.97683  |
|        basic_gnn_edgecnn        |    1    | 0.972503 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.970973 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.963313 |
|     resnet50_quantized_qat      |    1    | 0.955856 |
|       doctr_det_predictor       |    1    | 0.955658 |
|         LearningToPaint         |    1    | 0.948148 |
|           hf_BigBird            |    1    | 0.947417 |
|         pytorch_stargan         |   16    | 0.946026 |
|          basic_gnn_gin          |    1    | 0.942669 |
|      doctr_reco_predictor       |    1    | 0.941623 |
|          basic_gnn_gcn          |    1    | 0.93754  |
|         basic_gnn_sage          |    1    | 0.936649 |
|   mobilenet_v2_quantized_qat    |    1    | 0.931286 |
|      torch_multimodal_clip      |    1    | 0.926868 |
|              llama              |    1    | 0.919334 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.916225 |
|        hf_distil_whisper        |    1    | 0.915464 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.886957 |
|           tts_angular           |    1    | 0.885216 |
|        soft_actor_critic        |   256   | 0.884004 |
|         opacus_cifar10          |    1    | 0.88254  |
|        timm_efficientnet        |    1    | 0.876797 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.875824 |
|          mobilenet_v2           |    1    | 0.867004 |
|          squeezenet1_1          |    1    | 0.863837 |
|          lennard_jones          |    1    | 0.861554 |
|          maml_omniglot          |    5    | 0.860991 |
|           mnasnet1_0            |    1    | 0.860102 |
|     functorch_maml_omniglot     |    1    | 0.856988 |
|          fastNLP_Bert           |    1    | 0.854464 |
|              dcgan              |    1    | 0.853211 |
|          timm_resnest           |    1    | 0.853013 |
|       mobilenet_v3_large        |    1    | 0.850194 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.849789 |
|         phlippe_resnet          |    1    | 0.839566 |
|       shufflenet_v2_x1_0        |    1    | 0.836243 |
|     pyhpc_equation_of_state     |    1    | 0.832552 |
|            moondream            |    1    | 0.825235 |
|       speech_transformer        |    1    | 0.824245 |
|        phlippe_densenet         |    1    | 0.818244 |
|          hf_Bert_large          |    1    | 0.813773 |
|           timm_nfnet            |    1    | 0.81308  |
|         resnext50_32x4d         |    1    | 0.812572 |
|     pyhpc_isoneutral_mixing     |    1    | 0.810855 |
|          hf_Longformer          |    1    | 0.808926 |
|     timm_vision_transformer     |    1    | 0.805701 |
|           hf_T5_large           |    1    | 0.804425 |
|             hf_Bert             |    1    | 0.80319  |
|            hf_Albert            |    1    | 0.801361 |
|              maml               |    1    | 0.800276 |
|             hf_Bart             |    1    | 0.784513 |
|             yolov3              |    1    | 0.779206 |
|          BERT_pytorch           |    1    | 0.779146 |
|          hf_DistilBert          |    1    | 0.776056 |
|            resnet50             |    1    | 0.768126 |
|             hf_GPT2             |    1    | 0.76812  |
|               drq               |    1    | 0.762398 |
|           timm_regnet           |    1    | 0.761079 |
|            resnet18             |    1    | 0.758282 |
|           timm_vovnet           |    1    | 0.758262 |
|           densenet121           |    1    | 0.756924 |
|              hf_T5              |    1    | 0.747336 |
|           hf_Reformer           |    1    | 0.743715 |
|      functorch_dp_cifar10       |    1    | 0.743202 |
|  timm_vision_transformer_large  |    1    | 0.732487 |
|             alexnet             |    1    | 0.732454 |
|              vgg16              |    1    | 0.722546 |
|            resnet152            |    1    | 0.692908 |
|     nvidia_deeprecommender      |    1    | 0.672499 |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26178.054299 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11788.253352 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11202.681354 |
|          hf_GPT2_large          |    1    | 10098.191645 |
|           hf_T5_large           |    1    | 7480.562244  |
|            moondream            |    1    | 7345.451415  |
|       Background_Matting        |    1    | 6980.566107  |
|        hf_distil_whisper        |    1    | 6950.916865  |
| detectron2_fasterrcnn_r_101_dc5 |    1    |  5666.63074  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 5052.152964  |
|          pytorch_unet           |    1    | 4856.804935  |
|  timm_vision_transformer_large  |    1    | 2779.210976  |
|    detectron2_fcos_r_50_fpn     |    1    | 2538.336766  |
|             demucs              |    1    | 2368.675069  |
|         pytorch_stargan         |   16    | 2032.129812  |
|          hf_Bert_large          |    1    | 1757.728038  |
|       doctr_det_predictor       |    1    | 1640.073426  |
|           hf_BigBird            |    1    | 1472.152631  |
|      torch_multimodal_clip      |    1    | 1244.374207  |
|          hf_Longformer          |    1    | 1114.976497  |
|             hf_Bart             |    1    |  881.599298  |
|              hf_T5              |    1    |  768.057861  |
|             hf_Bert             |    1    |  677.878741  |
|       speech_transformer        |    1    |  674.92059   |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  627.683169  |
|            hf_Albert            |    1    |  571.332932  |
|          fastNLP_Bert           |    1    |  520.889811  |
|             yolov3              |    1    |  430.437904  |
|           hf_Reformer           |    1    |  415.595526  |
|          hf_DistilBert          |    1    |  415.564674  |
|             hf_GPT2             |    1    |  355.263388  |
|        basic_gnn_edgecnn        |    1    |  232.284875  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  209.522964  |
|              vgg16              |    1    |  191.200776  |
|           timm_regnet           |    1    |  150.140076  |
|          BERT_pytorch           |    1    |  141.336565  |
|            resnet152            |    1    |  138.224041  |
|           timm_nfnet            |    1    |  96.603052   |
|           timm_vovnet           |    1    |  80.099245   |
|              maml               |    1    |  74.143184   |
|     timm_vision_transformer     |    1    |  58.674017   |
|     nvidia_deeprecommender      |    1    |  58.157965   |
|         resnext50_32x4d         |    1    |  57.280549   |
|           tts_angular           |    1    |  54.135569   |
|            resnet50             |    1    |  50.905386   |
|           densenet121           |    1    |  45.375549   |
|          basic_gnn_gcn          |    1    |  35.666822   |
|          timm_resnest           |    1    |  33.478882   |
|      doctr_reco_predictor       |    1    |  23.330501   |
|              llama              |    1    |  22.820339   |
|            resnet18             |    1    |  22.438048   |
|             alexnet             |    1    |  22.201599   |
|     resnet50_quantized_qat      |    1    |  18.198251   |
|          basic_gnn_gin          |    1    |  16.751794   |
|         basic_gnn_sage          |    1    |  16.551188   |
|        timm_efficientnet        |    1    |  13.575586   |
|         LearningToPaint         |    1    |   9.845946   |
|           mnasnet1_0            |    1    |   7.889271   |
|          mobilenet_v2           |    1    |   7.642326   |
|       mobilenet_v3_large        |    1    |   7.475033   |
|   mobilenet_v2_quantized_qat    |    1    |   6.968785   |
|          squeezenet1_1          |    1    |   5.936627   |
|       shufflenet_v2_x1_0        |    1    |   5.551174   |
|        phlippe_densenet         |    1    |   3.359685   |
|        soft_actor_critic        |   256   |   3.001661   |
|      functorch_dp_cifar10       |    1    |   2.45949    |
|         opacus_cifar10          |    1    |   2.390685   |
|               drq               |    1    |   1.891541   |
|              dcgan              |    1    |   1.735071   |
|         phlippe_resnet          |    1    |   1.340804   |
|     functorch_maml_omniglot     |    1    |   0.855369   |
|              dlrm               |    1    |   0.705562   |
|          maml_omniglot          |    5    |   0.567008   |
|     pyhpc_equation_of_state     |    1    |   0.043446   |
|     pyhpc_isoneutral_mixing     |    1    |   0.042994   |
|          lennard_jones          |    1    |   0.037674   |
|              moco               |    0    |     0.0      |
|         DALLE2_pytorch          |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 2.022237 |
|     MobileBertForQuestionAnswering      | 1  | 1.565184 |
|            XLNetLMHeadModel             | 1  | 1.382437 |
|            YituTechConvBert             | 1  | 1.324849 |
|         Speech2Text2ForCausalLM         | 1  | 1.316339 |
|      GPT2ForSequenceClassification      | 1  | 1.301636 |
|          DistilBertForMaskedLM          | 1  | 1.291831 |
|     DistilBertForQuestionAnswering      | 1  | 1.284367 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.279999 |
|       BlenderbotSmallForCausalLM        | 1  | 1.272099 |
|          BlenderbotForCausalLM          | 1  | 1.248728 |
|       DebertaForQuestionAnswering       | 1  | 1.247172 |
|       MT5ForConditionalGeneration       | 1  | 1.244782 |
|     M2M100ForConditionalGeneration      | 1  | 1.241863 |
|           PegasusForCausalLM            | 1  | 1.235469 |
|     PegasusForConditionalGeneration     | 1  | 1.234234 |
|             XGLMForCausalLM             | 1  | 1.233965 |
|           DebertaForMaskedLM            | 1  | 1.231343 |
|               GoogleFnet                | 1  | 1.226521 |
|            AlbertForMaskedLM            | 1  | 1.204066 |
|       AlbertForQuestionAnswering        | 1  | 1.202261 |
|           ElectraForCausalLM            | 1  | 1.195439 |
|               DistillGPT2               | 1  | 1.18601  |
|        BertForQuestionAnswering         | 1  | 1.166605 |
|             BertForMaskedLM             | 1  | 1.164237 |
|    LayoutLMForSequenceClassification    | 1  | 1.163678 |
|    MegatronBertForQuestionAnswering     | 1  | 1.162858 |
|           RobertaForCausalLM            | 1  | 1.162464 |
|                CamemBert                | 1  | 1.160208 |
|       RobertaForQuestionAnswering       | 1  | 1.159397 |
|           LayoutLMForMaskedLM           | 1  | 1.159281 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.155451 |
|            TrOCRForCausalLM             | 1  | 1.155365 |
|          DebertaV2ForMaskedLM           | 1  | 1.150897 |
|         MegatronBertForCausalLM         | 1  | 1.148558 |
|       ElectraForQuestionAnswering       | 1  | 1.145349 |
|     PLBartForConditionalGeneration      | 1  | 1.088225 |
|      MBartForConditionalGeneration      | 1  | 1.067858 |
|             BartForCausalLM             | 1  | 1.055097 |
|      BartForConditionalGeneration       | 1  | 1.044944 |
|             OPTForCausalLM              | 1  | 1.028895 |
|            PLBartForCausalLM            | 1  | 1.024993 |
|            MBartForCausalLM             | 1  | 1.012442 |
|          AllenaiLongformerBase          | 1  | 0.969429 |
|                 T5Small                 | 1  | 0.620168 |
|       T5ForConditionalGeneration        | 1  | 0.614044 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 58.59442  |
|          MobileBertForMaskedLM          | 1  | 43.970498 |
|     MobileBertForQuestionAnswering      | 1  | 42.786866 |
|     PegasusForConditionalGeneration     | 1  | 40.640757 |
|     M2M100ForConditionalGeneration      | 1  | 39.946781 |
|      MBartForConditionalGeneration      | 1  | 39.337321 |
|                 T5Small                 | 1  | 38.361529 |
|       T5ForConditionalGeneration        | 1  | 38.329249 |
|          BlenderbotForCausalLM          | 1  | 37.177714 |
|       MT5ForConditionalGeneration       | 1  | 35.895476 |
|            XLNetLMHeadModel             | 1  | 34.363716 |
|             XGLMForCausalLM             | 1  | 33.837404 |
|          DebertaV2ForMaskedLM           | 1  | 31.310704 |
| BlenderbotSmallForConditionalGeneration | 1  | 30.68504  |
|      DebertaV2ForQuestionAnswering      | 1  | 30.011133 |
|            YituTechConvBert             | 1  | 29.174345 |
|      BartForConditionalGeneration       | 1  | 28.804158 |
|         MegatronBertForCausalLM         | 1  | 28.492866 |
|     PLBartForConditionalGeneration      | 1  | 27.823048 |
|    MegatronBertForQuestionAnswering     | 1  | 27.115461 |
|             OPTForCausalLM              | 1  | 25.040831 |
|           PegasusForCausalLM            | 1  | 24.708539 |
|            MBartForCausalLM             | 1  | 24.24729  |
|            TrOCRForCausalLM             | 1  | 22.465096 |
|           DebertaForMaskedLM            | 1  | 22.226633 |
|       DebertaForQuestionAnswering       | 1  | 21.086495 |
|           ElectraForCausalLM            | 1  | 20.238536 |
|           RobertaForCausalLM            | 1  | 20.111168 |
|                CamemBert                | 1  | 20.058113 |
|          DistilBertForMaskedLM          | 1  | 19.508218 |
|       BlenderbotSmallForCausalLM        | 1  | 19.493637 |
|      GPT2ForSequenceClassification      | 1  | 19.145142 |
|       ElectraForQuestionAnswering       | 1  | 19.028483 |
|       RobertaForQuestionAnswering       | 1  | 18.999056 |
|           LayoutLMForMaskedLM           | 1  | 18.972895 |
|         Speech2Text2ForCausalLM         | 1  | 18.972539 |
|             BertForMaskedLM             | 1  | 18.912908 |
|            PLBartForCausalLM            | 1  | 18.597208 |
|    LayoutLMForSequenceClassification    | 1  | 18.530603 |
|     DistilBertForQuestionAnswering      | 1  | 18.208556 |
|             BartForCausalLM             | 1  | 18.125469 |
|        BertForQuestionAnswering         | 1  | 17.616636 |
|               GoogleFnet                | 1  | 17.371851 |
|               DistillGPT2               | 1  |  17.3366  |
|            AlbertForMaskedLM            | 1  | 13.764339 |
|       AlbertForQuestionAnswering        | 1  | 12.391946 |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.98642  |
|      MBartForConditionalGeneration      | 1  | 0.974885 |
|       T5ForConditionalGeneration        | 1  | 0.953584 |
|      GPT2ForSequenceClassification      | 1  | 0.952786 |
|          AllenaiLongformerBase          | 1  | 0.950108 |
|            MBartForCausalLM             | 1  | 0.926311 |
|     PLBartForConditionalGeneration      | 1  | 0.909981 |
|            XLNetLMHeadModel             | 1  | 0.906801 |
|                 T5Small                 | 1  | 0.905458 |
|            PLBartForCausalLM            | 1  | 0.902164 |
|       DebertaForQuestionAnswering       | 1  | 0.873124 |
|      BartForConditionalGeneration       | 1  | 0.864055 |
|               GoogleFnet                | 1  | 0.853852 |
|       RobertaForQuestionAnswering       | 1  | 0.850844 |
|        BertForQuestionAnswering         | 1  | 0.848289 |
|    LayoutLMForSequenceClassification    | 1  | 0.847698 |
|    MegatronBertForQuestionAnswering     | 1  | 0.840342 |
|       ElectraForQuestionAnswering       | 1  | 0.839739 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.836422 |
|               DistillGPT2               | 1  | 0.829612 |
|           DebertaForMaskedLM            | 1  | 0.82364  |
|         MegatronBertForCausalLM         | 1  | 0.819128 |
|           LayoutLMForMaskedLM           | 1  | 0.818719 |
|         Speech2Text2ForCausalLM         | 1  | 0.815561 |
|             BertForMaskedLM             | 1  | 0.814855 |
|                CamemBert                | 1  | 0.813343 |
|           RobertaForCausalLM            | 1  | 0.803774 |
|           ElectraForCausalLM            | 1  | 0.80262  |
|     DistilBertForQuestionAnswering      | 1  | 0.800875 |
|             BartForCausalLM             | 1  | 0.799746 |
|          BlenderbotForCausalLM          | 1  | 0.798022 |
|          DebertaV2ForMaskedLM           | 1  | 0.79592  |
|       MT5ForConditionalGeneration       | 1  | 0.787504 |
|            TrOCRForCausalLM             | 1  | 0.78742  |
|            YituTechConvBert             | 1  | 0.768073 |
|       BlenderbotSmallForCausalLM        | 1  | 0.762418 |
|           PegasusForCausalLM            | 1  | 0.749807 |
|          DistilBertForMaskedLM          | 1  | 0.745166 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.738556 |
|     MobileBertForQuestionAnswering      | 1  | 0.732331 |
|     PegasusForConditionalGeneration     | 1  | 0.715298 |
|     M2M100ForConditionalGeneration      | 1  | 0.711922 |
|          MobileBertForMaskedLM          | 1  | 0.703162 |
|             XGLMForCausalLM             | 1  | 0.698087 |
|            AlbertForMaskedLM            | 1  | 0.44788  |
|       AlbertForQuestionAnswering        | 1  | 0.442859 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12736.647774 |
|       AlbertForQuestionAnswering        | 1  | 12696.620941 |
|      MBartForConditionalGeneration      | 1  | 6138.748912  |
|      BartForConditionalGeneration       | 1  | 5706.513016  |
|             OPTForCausalLM              | 1  | 5217.861386  |
|          DebertaV2ForMaskedLM           | 1  | 5046.813139  |
|      DebertaV2ForQuestionAnswering      | 1  | 3965.132805  |
|            XLNetLMHeadModel             | 1  | 3119.198697  |
|            MBartForCausalLM             | 1  | 3045.887936  |
|          BlenderbotForCausalLM          | 1  | 2630.792054  |
|             BartForCausalLM             | 1  | 2550.714352  |
|       T5ForConditionalGeneration        | 1  |  2501.68347  |
|                 T5Small                 | 1  | 2495.775447  |
|          AllenaiLongformerBase          | 1  | 2403.982541  |
|     PLBartForConditionalGeneration      | 1  | 2187.960384  |
|         MegatronBertForCausalLM         | 1  | 2048.836128  |
|    MegatronBertForQuestionAnswering     | 1  | 1866.527626  |
|      GPT2ForSequenceClassification      | 1  | 1321.980487  |
|            PLBartForCausalLM            | 1  | 1217.682102  |
|             XGLMForCausalLM             | 1  |  834.882551  |
|           DebertaForMaskedLM            | 1  |  789.045302  |
|           RobertaForCausalLM            | 1  |  779.458286  |
|     M2M100ForConditionalGeneration      | 1  |  716.31942   |
|                CamemBert                | 1  |  697.002971  |
|             BertForMaskedLM             | 1  |  686.388304  |
|           LayoutLMForMaskedLM           | 1  |  684.81221   |
|            YituTechConvBert             | 1  |  673.31869   |
|     PegasusForConditionalGeneration     | 1  |  608.102978  |
|            TrOCRForCausalLM             | 1  |  587.128465  |
|       DebertaForQuestionAnswering       | 1  |  565.019551  |
|        BertForQuestionAnswering         | 1  |  547.221371  |
|    LayoutLMForSequenceClassification    | 1  |  544.51228   |
|       RobertaForQuestionAnswering       | 1  |  544.284431  |
|               DistillGPT2               | 1  |  503.523847  |
|               GoogleFnet                | 1  |  473.546457  |
|       MT5ForConditionalGeneration       | 1  |  301.326347  |
|           PegasusForCausalLM            | 1  |  300.58579   |
| BlenderbotSmallForConditionalGeneration | 1  |  146.395113  |
|           ElectraForCausalLM            | 1  |  135.086777  |
|          DistilBertForMaskedLM          | 1  |  100.714995  |
|       ElectraForQuestionAnswering       | 1  |  95.837439   |
|       BlenderbotSmallForCausalLM        | 1  |  85.029399   |
|          MobileBertForMaskedLM          | 1  |  67.466303   |
|     DistilBertForQuestionAnswering      | 1  |  64.300438   |
|     MobileBertForQuestionAnswering      | 1  |  40.900468   |
|         Speech2Text2ForCausalLM         | 1  |  18.995919   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.342391 |
|          inception_v3           | 1  | 2.203406 |
|           dm_nfnet_f0           | 1  | 2.200404 |
|       gluon_inception_v3        | 1  | 2.186986 |
|            nfnet_l0             | 1  | 2.177047 |
|        adv_inception_v3         | 1  | 2.172885 |
|            repvgg_a2            | 1  | 1.960173 |
|         mobilenetv2_100         | 1  | 1.953248 |
|            hrnet_w18            | 1  | 1.917405 |
|          spnasnet_100           | 1  | 1.883354 |
|           mnasnet_100           | 1  | 1.840407 |
|           fbnetc_100            | 1  | 1.836202 |
|            levit_128            | 1  | 1.808523 |
|          ghostnet_100           | 1  | 1.792935 |
|            lcnet_050            | 1  | 1.772094 |
|           selecsls42b           | 1  | 1.761478 |
|      mobilenetv3_large_100      | 1  | 1.754406 |
|           regnety_002           | 1  | 1.689097 |
|             dla102              | 1  | 1.684051 |
|       tf_efficientnet_b0        | 1  | 1.681406 |
|          botnet26t_256          | 1  | 1.655068 |
|        ese_vovnet19b_dw         | 1  | 1.64901  |
|           rexnet_100            | 1  | 1.645056 |
|            fbnetv3_b            | 1  | 1.636234 |
|           resnest101e           | 1  | 1.60926  |
|       eca_botnext26ts_256       | 1  | 1.603472 |
|          cspdarknet53           | 1  | 1.59755  |
|            tinynet_a            | 1  | 1.524736 |
|         poolformer_m36          | 1  | 1.512409 |
|           res2next50            | 1  | 1.508233 |
|        eca_halonext26ts         | 1  | 1.505954 |
|        res2net50_14w_8s         | 1  | 1.468619 |
|           volo_d1_224           | 1  | 1.466711 |
|        res2net101_26w_4s        | 1  | 1.454838 |
|           mobilevit_s           | 1  | 1.445994 |
|         visformer_small         | 1  |  1.3886  |
|           convit_base           | 1  | 1.378149 |
|          gmixer_24_224          | 1  | 1.351984 |
|     swsl_resnext101_32x16d      | 1  | 1.349493 |
|           tf_mixnet_l           | 1  | 1.328998 |
|        twins_pcpvt_base         | 1  | 1.292235 |
|            gernet_l             | 1  | 1.290307 |
|      beit_base_patch16_224      | 1  | 1.279693 |
|          resmlp_12_224          | 1  | 1.272482 |
|  swin_base_patch4_window7_224   | 1  | 1.253196 |
|        convmixer_768_32         | 1  | 1.241785 |
|          mixer_b16_224          | 1  | 1.20529  |
|             dpn107              | 1  | 1.201511 |
| deit_base_distilled_patch16_224 | 1  | 1.195211 |
|      vit_base_patch16_224       | 1  | 1.193331 |
|      xcit_large_24_p8_224       | 1  | 1.187548 |
|            mixnet_l             | 1  | 1.176329 |
|          jx_nest_base           | 1  | 1.157731 |
|            pit_b_224            | 1  | 1.153081 |
|        tnt_s_patch16_224        | 1  | 1.136103 |
|          convnext_base          | 1  | 1.127415 |
|         crossvit_9_240          | 1  | 1.121787 |
|          gmlp_s16_224           | 1  | 1.11907  |
|        sebotnet33ts_256         | 1  | 1.086082 |
|          cait_m36_384           | 1  | 0.971262 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|       eca_botnext26ts_256       | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          cait_m36_384           | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
|        eca_halonext26ts         | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|          botnet26t_256          | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|        convmixer_768_32         | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|        sebotnet33ts_256         | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|         coat_lite_mini          | 1  |  fail_to_run  |
|           resnest101e           | 1  | fail_accuracy |
|            levit_128            | 1  | fail_accuracy |
|          ghostnet_100           | 1  | fail_accuracy |
|          pnasnet5large          | 1  | fail_accuracy |
|        res2net101_26w_4s        | 1  | fail_accuracy |
|        res2net50_14w_8s         | 1  | fail_accuracy |
|             dla102              | 1  | fail_accuracy |
|     swsl_resnext101_32x16d      | 1  | fail_accuracy |
|           rexnet_100            | 1  | fail_accuracy |
|           selecsls42b           | 1  | fail_accuracy |
|           res2next50            | 1  | fail_accuracy |
|            hrnet_w18            | 1  | fail_accuracy |
|           volo_d1_224           | 1  | fail_accuracy |
|         visformer_small         | 1  | fail_accuracy |
|      xcit_large_24_p8_224       | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|          pnasnet5large          | 1  | 80.806648 |
|  swin_base_patch4_window7_224   | 1  | 80.235296 |
|           tf_mixnet_l           | 1  | 68.412894 |
|             dpn107              | 1  | 62.517242 |
|        twins_pcpvt_base         | 1  | 60.887812 |
|           mobilevit_s           | 1  | 58.48696  |
|          jx_nest_base           | 1  | 57.945048 |
|        res2net50_14w_8s         | 1  | 56.723887 |
|           rexnet_100            | 1  | 55.279185 |
|      xcit_large_24_p8_224       | 1  | 54.076557 |
|          cait_m36_384           | 1  | 53.682891 |
|          ghostnet_100           | 1  | 52.748746 |
|            mixnet_l             | 1  | 50.830275 |
|        sebotnet33ts_256         | 1  | 50.543141 |
|         poolformer_m36          | 1  | 50.408489 |
|            levit_128            | 1  | 49.084695 |
|           dm_nfnet_f0           | 1  | 48.423752 |
|        eca_halonext26ts         | 1  | 48.416245 |
|         crossvit_9_240          | 1  | 48.086705 |
|        tnt_s_patch16_224        | 1  | 46.890346 |
|           volo_d1_224           | 1  | 43.689336 |
|       eca_botnext26ts_256       | 1  | 41.095969 |
|            hrnet_w18            | 1  | 40.300047 |
|        res2net101_26w_4s        | 1  | 39.611812 |
|       tf_efficientnet_b0        | 1  | 39.267584 |
|            nfnet_l0             | 1  | 37.725647 |
|          convnext_base          | 1  | 37.594869 |
|           resnest101e           | 1  | 37.427867 |
|        adv_inception_v3         | 1  |  35.7195  |
|          inception_v3           | 1  |  35.7115  |
|       gluon_inception_v3        | 1  | 35.696646 |
|            tinynet_a            | 1  | 34.283764 |
|           res2next50            | 1  | 34.184985 |
|            pit_b_224            | 1  | 33.873778 |
|           convit_base           | 1  | 31.033424 |
|          botnet26t_256          | 1  | 30.116727 |
|          cspdarknet53           | 1  | 27.421471 |
|            fbnetv3_b            | 1  | 26.792021 |
|             dla102              | 1  | 26.76231  |
|          gmlp_s16_224           | 1  | 26.324569 |
|          gmixer_24_224          | 1  | 25.106852 |
|         visformer_small         | 1  | 24.425603 |
|        ese_vovnet19b_dw         | 1  | 23.420705 |
|      mobilenetv3_large_100      | 1  | 23.088823 |
|      vit_base_patch16_224       | 1  | 21.472136 |
| deit_base_distilled_patch16_224 | 1  | 20.403049 |
|      beit_base_patch16_224      | 1  | 20.15114  |
|          mixer_b16_224          | 1  | 19.243657 |
|           regnety_002           | 1  | 18.968917 |
|            repvgg_a2            | 1  | 17.133628 |
|        convmixer_768_32         | 1  | 17.109042 |
|          resmlp_12_224          | 1  | 17.064039 |
|           selecsls42b           | 1  | 16.460178 |
|            lcnet_050            | 1  | 14.081843 |
|     swsl_resnext101_32x16d      | 1  | 12.522865 |
|           fbnetc_100            | 1  | 10.86117  |
|            gernet_l             | 1  | 10.851653 |
|          spnasnet_100           | 1  | 10.799458 |
|         mobilenetv2_100         | 1  | 10.331441 |
|           mnasnet_100           | 1  | 10.154863 |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.947153 |
|          pnasnet5large          | 1  | 0.929077 |
|        convmixer_768_32         | 1  | 0.918536 |
|            nfnet_l0             | 1  | 0.89723  |
|      xcit_large_24_p8_224       | 1  | 0.890178 |
|         mobilenetv2_100         | 1  | 0.885597 |
|        ese_vovnet19b_dw         | 1  | 0.884432 |
|            fbnetv3_b            | 1  | 0.877833 |
|          spnasnet_100           | 1  | 0.875682 |
|           mnasnet_100           | 1  | 0.875196 |
|       tf_efficientnet_b0        | 1  | 0.87269  |
|           fbnetc_100            | 1  | 0.869816 |
|      mobilenetv3_large_100      | 1  | 0.868529 |
|       eca_botnext26ts_256       | 1  | 0.862945 |
|           rexnet_100            | 1  | 0.861025 |
|            lcnet_050            | 1  | 0.860152 |
|            tinynet_a            | 1  | 0.856804 |
|         poolformer_m36          | 1  | 0.855421 |
|           dm_nfnet_f0           | 1  | 0.854553 |
|           mobilevit_s           | 1  | 0.850717 |
|        eca_halonext26ts         | 1  | 0.847223 |
|          ghostnet_100           | 1  | 0.846037 |
|           tf_mixnet_l           | 1  | 0.844628 |
|          botnet26t_256          | 1  | 0.844071 |
|           regnety_002           | 1  | 0.84357  |
|            mixnet_l             | 1  | 0.839789 |
|          resmlp_12_224          | 1  | 0.82809  |
|         visformer_small         | 1  | 0.817461 |
|           res2next50            | 1  | 0.816018 |
|            levit_128            | 1  | 0.807568 |
|             dpn107              | 1  | 0.80208  |
|          convnext_base          | 1  | 0.801555 |
|        sebotnet33ts_256         | 1  | 0.799759 |
|          gmixer_24_224          | 1  | 0.798064 |
|            hrnet_w18            | 1  | 0.79798  |
|          gmlp_s16_224           | 1  | 0.79494  |
|        res2net50_14w_8s         | 1  | 0.794904 |
|          cspdarknet53           | 1  | 0.794349 |
|        tnt_s_patch16_224        | 1  | 0.786531 |
|           volo_d1_224           | 1  | 0.785874 |
|           convit_base           | 1  | 0.781294 |
|         crossvit_9_240          | 1  | 0.778773 |
|          mixer_b16_224          | 1  | 0.777661 |
|        twins_pcpvt_base         | 1  | 0.776471 |
|             dla102              | 1  |  0.7744  |
|          jx_nest_base           | 1  | 0.773493 |
|           resnest101e           | 1  | 0.773057 |
|      beit_base_patch16_224      | 1  | 0.771629 |
|          inception_v3           | 1  | 0.762994 |
|        adv_inception_v3         | 1  | 0.762507 |
| deit_base_distilled_patch16_224 | 1  | 0.762244 |
|       gluon_inception_v3        | 1  | 0.761432 |
|      vit_base_patch16_224       | 1  | 0.759429 |
|            pit_b_224            | 1  | 0.754623 |
|  swin_base_patch4_window7_224   | 1  | 0.738957 |
|           selecsls42b           | 1  | 0.738423 |
|        res2net101_26w_4s        | 1  | 0.738386 |
|            gernet_l             | 1  | 0.735982 |
|            repvgg_a2            | 1  | 0.689091 |
|     swsl_resnext101_32x16d      | 1  | 0.640534 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 3681.27369  |
|      xcit_large_24_p8_224       | 1  | 1536.508757 |
|     swsl_resnext101_32x16d      | 1  | 439.641347  |
|          pnasnet5large          | 1  |  369.9716   |
|          convnext_base          | 1  |  311.02988  |
|             dpn107              | 1  | 260.578589  |
|        convmixer_768_32         | 1  | 247.125977  |
|          jx_nest_base           | 1  | 236.855842  |
|  swin_base_patch4_window7_224   | 1  | 199.574444  |
|      beit_base_patch16_224      | 1  | 197.843515  |
| deit_base_distilled_patch16_224 | 1  | 197.421342  |
|      vit_base_patch16_224       | 1  | 197.007226  |
|           convit_base           | 1  | 196.032279  |
|            pit_b_224            | 1  | 170.423757  |
|           resnest101e           | 1  |  165.53062  |
|           dm_nfnet_f0           | 1  | 159.086489  |
|          mixer_b16_224          | 1  | 140.593204  |
|         poolformer_m36          | 1  | 138.729607  |
|        res2net101_26w_4s        | 1  | 113.782948  |
|        twins_pcpvt_base         | 1  | 108.532177  |
|           volo_d1_224           | 1  |  96.069759  |
|        tnt_s_patch16_224        | 1  |  95.859272  |
|            nfnet_l0             | 1  |  95.027696  |
|             dla102              | 1  |  90.940452  |
|            hrnet_w18            | 1  |  86.274679  |
|        sebotnet33ts_256         | 1  |  85.620634  |
|          cspdarknet53           | 1  |  83.228434  |
|          inception_v3           | 1  |  73.891372  |
|       gluon_inception_v3        | 1  |  73.845903  |
|        adv_inception_v3         | 1  |  73.564933  |
|          gmlp_s16_224           | 1  |  71.57598   |
|         visformer_small         | 1  |  67.277701  |
|        res2net50_14w_8s         | 1  |  65.561696  |
|          gmixer_24_224          | 1  |  63.599658  |
|            repvgg_a2            | 1  |  63.596562  |
|           res2next50            | 1  |  59.983251  |
|            gernet_l             | 1  |  57.115341  |
|           selecsls42b           | 1  |  44.869202  |
|          botnet26t_256          | 1  |  44.852936  |
|        eca_halonext26ts         | 1  |  44.838773  |
|           mobilevit_s           | 1  |  42.685734  |
|       eca_botnext26ts_256       | 1  |  40.623721  |
|          resmlp_12_224          | 1  |  35.952014  |
|         crossvit_9_240          | 1  |  34.608964  |
|        ese_vovnet19b_dw         | 1  |  32.018345  |
|            mixnet_l             | 1  |  31.837985  |
|           tf_mixnet_l           | 1  |  30.553789  |
|            fbnetv3_b            | 1  |  16.067251  |
|       tf_efficientnet_b0        | 1  |  13.963444  |
|           rexnet_100            | 1  |  13.700749  |
|            tinynet_a            | 1  |  12.454811  |
|           fbnetc_100            | 1  |  9.429198   |
|            levit_128            | 1  |  9.266905   |
|          ghostnet_100           | 1  |   9.18921   |
|          spnasnet_100           | 1  |  8.455408   |
|           mnasnet_100           | 1  |  7.841671   |
|         mobilenetv2_100         | 1  |  7.640656   |
|      mobilenetv3_large_100      | 1  |  7.434393   |
|           regnety_002           | 1  |  6.558924   |
|            lcnet_050            | 1  |  2.555948   |
+---------------------------------+----+-------------+

@WeizhuoZhang-intel
Copy link
Contributor

[amp] Performance Dashboard for amp precision -- Single-Socket Multi-threads (2024-04-27 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 7478b7f1cac9686f00edf3db4667cf86d2421531
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c7i.metal-24xl
CPU Model Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory 192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS Ubuntu 22.04.3 LTS
Kernel 6.2.0-1017-aws
Microcode 0x2b0004d0
GCC gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.8.18
OpenSSL OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000" 

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 78%, 61/78 | 100%, 46/46 | 73%, 44/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.74x    |    2.05x    |    2.23x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   21.72    |    33.42    |    38.71    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.91x    |    0.97x    |    0.98x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 12.443958 |
|          timm_resnest           |   32    | 4.345165  |
|            resnet50             |   32    | 4.092874  |
|          squeezenet1_1          |   16    | 3.973014  |
|           mnasnet1_0            |   32    | 3.511731  |
|         phlippe_resnet          |   128   | 3.496461  |
|          mobilenet_v2           |   16    |  3.47719  |
|              vgg16              |    4    | 3.368533  |
|         resnext50_32x4d         |    8    | 3.252932  |
|             alexnet             |   128   | 3.222426  |
|            resnet152            |   32    | 3.202378  |
|            resnet18             |    8    | 3.077132  |
|           timm_vovnet           |   32    | 3.075005  |
|             yolov3              |    8    | 3.065514  |
|       mobilenet_v3_large        |   32    | 2.906846  |
|       shufflenet_v2_x1_0        |   64    | 2.646128  |
|        soft_actor_critic        |   256   | 2.454883  |
|             hf_GPT2             |    1    | 2.411202  |
|        phlippe_densenet         |   128   | 2.306272  |
|           timm_regnet           |   32    | 2.296116  |
|           densenet121           |   64    | 2.226555  |
|          lennard_jones          |  1000   | 2.199729  |
|           timm_nfnet            |   128   |  2.16691  |
|             hf_Bert             |    1    | 2.146987  |
|          pytorch_unet           |    1    | 2.115179  |
|          hf_DistilBert          |    1    | 2.026763  |
|        timm_efficientnet        |   64    | 2.021669  |
|     functorch_maml_omniglot     |    1    | 2.019915  |
|          BERT_pytorch           |    2    |  1.97511  |
|               drq               |    1    | 1.970714  |
|          hf_Bert_large          |    1    | 1.954537  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.952944  |
|           hf_T5_base            |    1    |  1.93816  |
|              dcgan              |   256   | 1.919307  |
|         LearningToPaint         |   96    | 1.874912  |
|             hf_Bart             |    1    | 1.868308  |
|      doctr_reco_predictor       |    1    | 1.814022  |
|          fastNLP_Bert           |    1    | 1.757387  |
|              hf_T5              |    1    | 1.731027  |
|            moondream            |    1    | 1.658769  |
|       Background_Matting        |    1    | 1.608443  |
|            hf_Albert            |    1    | 1.586983  |
|           hf_T5_large           |    1    | 1.569735  |
|          hf_GPT2_large          |    1    | 1.565244  |
|     timm_vision_transformer     |   32    | 1.502825  |
|        basic_gnn_edgecnn        |    1    | 1.495397  |
|        hf_distil_whisper        |    1    |  1.42594  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.392385  |
|          maml_omniglot          |    5    | 1.366753  |
|          hf_Longformer          |    1    |  1.36553  |
|         pytorch_stargan         |   16    | 1.267031  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.265462  |
|       speech_transformer        |    1    |   1.261   |
|          basic_gnn_gin          |    1    |  1.2509   |
|           hf_Reformer           |    1    | 1.242098  |
|          basic_gnn_gcn          |    1    |  1.23789  |
|         basic_gnn_sage          |    1    | 1.218857  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.203038  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.153921  |
|     nvidia_deeprecommender      |   256   | 1.136797  |
|      torch_multimodal_clip      |   32    | 1.104064  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.091065  |
|  timm_vision_transformer_large  |   32    | 1.081248  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.072568  |
|              dlrm               |  2048   | 1.068416  |
|             demucs              |    1    | 1.068372  |
|           hf_BigBird            |    1    | 1.062681  |
|   mobilenet_v2_quantized_qat    |   96    | 1.006512  |
|           tts_angular           |   64    | 0.990197  |
|     resnet50_quantized_qat      |   32    |  0.98052  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.860113  |
|              maml               |    1    | 0.758617  |
|         opacus_cifar10          |   64    | 0.484367  |
|      functorch_dp_cifar10       |   64    | 0.449953  |
|        timm_efficientdet        |    0    |    0.0    |
|       doctr_det_predictor       |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+--------------------------------+---------+--------------------+
|              name              |   bs    |      inductor      |
+--------------------------------+---------+--------------------+
|          hf_T5_large           |    4    |  pass_due_to_skip  |
|       Background_Matting       |    1    |  pass_due_to_skip  |
| timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|         hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml              |    1    |  pass_due_to_skip  |
|         basic_gnn_gin          |    1    |        pass        |
|              dlrm              |    4    |        pass        |
|    detectron2_fcos_r_50_fpn    |    4    |        pass        |
|             demucs             |    1    |        pass        |
|             dcgan              |    4    |        pass        |
|         basic_gnn_sage         |    1    |        pass        |
|       basic_gnn_edgecnn        |    1    |        pass        |
|         basic_gnn_gcn          |    1    |        pass        |
|              drq               |    1    |        pass        |
|        LearningToPaint         |    4    |        pass        |
|      functorch_dp_cifar10      |    4    |        pass        |
|       hf_distil_whisper        |    4    |        pass        |
|           hf_T5_base           |    4    |        pass        |
|             yolov3             |    4    |        pass        |
|          fastNLP_Bert          |    4    |        pass        |
|        pytorch_stargan         |   16    |        pass        |
|    functorch_maml_omniglot     |    1    |        pass        |
|           hf_Albert            |    4    |        pass        |
|            hf_Bart             |    4    |        pass        |
|             hf_T5              |    4    |        pass        |
|            hf_Bert             |    4    |        pass        |
|          hf_Reformer           |    4    |        pass        |
|         hf_Longformer          |    4    |        pass        |
|            hf_GPT2             |    2    |        pass        |
|         hf_Bert_large          |    4    |        pass        |
|           hf_BigBird           |    4    |        pass        |
|         hf_DistilBert          |    4    |        pass        |
|         maml_omniglot          |    5    |        pass        |
|            alexnet             |    4    |        pass        |
|             vgg16              |    4    |        pass        |
|          BERT_pytorch          |    4    |        pass        |
|         opacus_cifar10         |    4    |        pass        |
|    pyhpc_isoneutral_mixing     |    4    |        pass        |
|    pyhpc_equation_of_state     |    4    |        pass        |
|         phlippe_resnet         |    4    |        pass        |
|        phlippe_densenet        |    4    |        pass        |
|       mobilenet_v3_large       |    4    |        pass        |
|          tts_angular           |    4    |        pass        |
|           moondream            |    4    |        pass        |
|  pytorch_CycleGAN_and_pix2pix  |    1    |        pass        |
|   mobilenet_v2_quantized_qat   |    4    |        pass        |
|             llama              |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy | 1048576 |        pass        |
|     nvidia_deeprecommender     |    4    |        pass        |
|         lennard_jones          |    4    |        pass        |
|          pytorch_unet          |    2    |        pass        |
|     resnet50_quantized_qat     |    4    |        pass        |
|       soft_actor_critic        |   256   |        pass        |
|       speech_transformer       |    1    |        pass        |
|         squeezenet1_1          |    4    |        pass        |
|       timm_efficientnet        |    4    |        pass        |
|           timm_nfnet           |    4    |        pass        |
|          timm_regnet           |    4    |        pass        |
|          timm_resnest          |    4    |        pass        |
|    timm_vision_transformer     |    4    |        pass        |
|          timm_vovnet           |    4    |        pass        |
|     torch_multimodal_clip      |    4    |        pass        |
|       timm_efficientdet        |    0    | model_fail_to_load |
|              moco              |    0    | model_fail_to_load |
|         DALLE2_pytorch         |    0    | model_fail_to_load |
|          Super_SloMo           |    4    |    fail_to_run     |
|        vision_maskrcnn         |    1    |    fail_to_run     |
|          densenet121           |    4    |   fail_accuracy    |
|       shufflenet_v2_x1_0       |    4    |   fail_accuracy    |
|        resnext50_32x4d         |    4    |   fail_accuracy    |
|          mobilenet_v2          |    4    |   fail_accuracy    |
|           mnasnet1_0           |    4    |   fail_accuracy    |
|            resnet50            |    4    |   fail_accuracy    |
|           resnet152            |    4    |   fail_accuracy    |
|      doctr_reco_predictor      |    4    |   fail_accuracy    |
|            resnet18            |    4    |   fail_accuracy    |
|      doctr_det_predictor       |    0    | eager_fail_to_run  |
+--------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           densenet121           |   64    | 84.058569 |
|           hf_BigBird            |    1    | 67.410947 |
|    detectron2_fcos_r_50_fpn     |    1    | 58.137217 |
|  timm_vision_transformer_large  |   32    | 52.077556 |
|           hf_T5_large           |    1    | 51.391132 |
|              maml               |    1    | 48.37673  |
|           timm_nfnet            |   128   | 46.417903 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 42.911842 |
|          hf_Longformer          |    1    | 39.859007 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 39.84215  |
|            moondream            |    1    | 37.676833 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 37.504928 |
|        phlippe_densenet         |   128   | 37.291949 |
|           hf_Reformer           |    1    | 37.029975 |
|          hf_GPT2_large          |    1    | 35.503019 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 34.807904 |
|       speech_transformer        |    1    | 34.009784 |
|           hf_T5_base            |    1    | 33.860636 |
|      torch_multimodal_clip      |   32    | 32.93908  |
|        timm_efficientnet        |   64    | 32.163766 |
|             yolov3              |    8    | 30.076911 |
|             demucs              |    1    | 30.045565 |
|          BERT_pytorch           |    2    | 27.905339 |
|        hf_distil_whisper        |    1    | 27.599541 |
|         opacus_cifar10          |   64    | 26.004131 |
|      functorch_dp_cifar10       |   64    | 24.555924 |
|              hf_T5              |    1    | 24.298928 |
|     timm_vision_transformer     |   32    | 23.49826  |
|          timm_resnest           |   32    | 22.568316 |
|       mobilenet_v3_large        |   32    | 22.346015 |
|       shufflenet_v2_x1_0        |   64    | 22.008793 |
|           timm_regnet           |   32    | 21.711434 |
|     pyhpc_isoneutral_mixing     | 1048576 | 21.624104 |
|          hf_Bert_large          |    1    | 20.950362 |
|             hf_GPT2             |    1    | 19.831419 |
|       Background_Matting        |    1    | 19.109667 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 18.859299 |
|           timm_vovnet           |   32    | 18.676733 |
|            resnet152            |   32    | 18.07133  |
|          fastNLP_Bert           |    1    | 17.080437 |
|         pytorch_stargan         |   16    | 16.910746 |
|          pytorch_unet           |    1    | 16.87021  |
|             hf_Bart             |    1    | 16.814419 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 15.279722 |
|            hf_Albert            |    1    | 15.025107 |
|             hf_Bert             |    1    | 14.565242 |
|          squeezenet1_1          |   16    | 14.511585 |
|          hf_DistilBert          |    1    | 12.508266 |
|      doctr_reco_predictor       |    1    | 11.132333 |
|              vgg16              |    4    | 10.954144 |
|          mobilenet_v2           |   16    | 10.845487 |
|          basic_gnn_gcn          |    1    | 10.843225 |
|         resnext50_32x4d         |    8    | 10.805189 |
|            resnet50             |   32    | 10.733394 |
|         basic_gnn_sage          |    1    | 10.137744 |
|             alexnet             |   128   |   9.192   |
|               drq               |    1    | 9.000507  |
|           mnasnet1_0            |   32    | 8.805682  |
|     nvidia_deeprecommender      |   256   | 8.447046  |
|            resnet18             |    8    | 8.373816  |
|     functorch_maml_omniglot     |    1    | 7.977215  |
|          maml_omniglot          |    5    | 7.965409  |
|              dlrm               |  2048   | 7.963641  |
|          basic_gnn_gin          |    1    |  7.59445  |
|         LearningToPaint         |   96    | 7.547208  |
|         phlippe_resnet          |   128   | 7.115796  |
|     pyhpc_equation_of_state     | 1048576 | 7.101214  |
|        basic_gnn_edgecnn        |    1    | 6.861923  |
|        soft_actor_critic        |   256   | 6.696272  |
|          lennard_jones          |  1000   | 5.603674  |
|              dcgan              |   256   | 5.168648  |
|           tts_angular           |   64    | 4.666296  |
|   mobilenet_v2_quantized_qat    |   96    |  0.23188  |
|     resnet50_quantized_qat      |   32    |  0.20425  |
|       doctr_det_predictor       |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|           timm_nfnet            |   128   | 0.988014 |
|              dlrm               |  2048   | 0.987367 |
|             demucs              |    1    | 0.983412 |
|           hf_T5_base            |    1    | 0.98254  |
|           timm_regnet           |   32    | 0.979416 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.978199 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.978131 |
|        timm_efficientnet        |   64    | 0.975973 |
|      torch_multimodal_clip      |   32    | 0.975085 |
|            resnet152            |   32    | 0.972515 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.969601 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.969369 |
|         LearningToPaint         |   96    | 0.967344 |
|          pytorch_unet           |    1    | 0.967318 |
|             yolov3              |    8    | 0.96723  |
|           densenet121           |   64    | 0.967036 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.963977 |
|           timm_vovnet           |   32    | 0.963814 |
|        basic_gnn_edgecnn        |    1    | 0.963557 |
|            resnet50             |   32    | 0.961785 |
|           hf_BigBird            |    1    | 0.957794 |
|          timm_resnest           |   32    | 0.955527 |
|       Background_Matting        |    1    | 0.955372 |
|     timm_vision_transformer     |   32    | 0.954272 |
|   mobilenet_v2_quantized_qat    |   96    | 0.953036 |
|     resnet50_quantized_qat      |   32    | 0.953014 |
|             alexnet             |   128   | 0.952048 |
|          basic_gnn_gcn          |    1    | 0.943246 |
|           mnasnet1_0            |   32    | 0.942711 |
|          mobilenet_v2           |   16    | 0.940186 |
|       mobilenet_v3_large        |   32    | 0.937378 |
|       shufflenet_v2_x1_0        |   64    | 0.936845 |
|            moondream            |    1    | 0.935516 |
|     pyhpc_equation_of_state     | 1048576 | 0.935387 |
|         resnext50_32x4d         |    8    | 0.930586 |
|         basic_gnn_sage          |    1    | 0.930233 |
|          basic_gnn_gin          |    1    | 0.930023 |
|             hf_Bert             |    1    | 0.927942 |
|      doctr_reco_predictor       |    1    | 0.926154 |
|  timm_vision_transformer_large  |   32    | 0.923829 |
|         pytorch_stargan         |   16    | 0.923586 |
|       speech_transformer        |    1    | 0.923555 |
|     nvidia_deeprecommender      |   256   | 0.920466 |
|          hf_GPT2_large          |    1    | 0.918953 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.918947 |
|          fastNLP_Bert           |    1    | 0.918214 |
|        phlippe_densenet         |   128   | 0.916798 |
|          hf_Bert_large          |    1    | 0.915128 |
|            hf_Albert            |    1    | 0.914251 |
|          BERT_pytorch           |    2    | 0.910974 |
|             hf_GPT2             |    1    |  0.9059  |
|               drq               |    1    | 0.902339 |
|          squeezenet1_1          |   16    | 0.901563 |
|            resnet18             |    8    | 0.901316 |
|              dcgan              |   256   | 0.900726 |
|          hf_Longformer          |    1    | 0.896783 |
|           tts_angular           |   64    | 0.89667  |
|          hf_DistilBert          |    1    | 0.895351 |
|             hf_Bart             |    1    | 0.893053 |
|              hf_T5              |    1    | 0.892732 |
|         opacus_cifar10          |   64    | 0.891639 |
|        soft_actor_critic        |   256   | 0.888734 |
|        hf_distil_whisper        |    1    | 0.885167 |
|              vgg16              |    4    | 0.880879 |
|         phlippe_resnet          |   128   | 0.879552 |
|      functorch_dp_cifar10       |   64    | 0.875581 |
|          lennard_jones          |  1000   | 0.865979 |
|          maml_omniglot          |    5    | 0.859935 |
|     functorch_maml_omniglot     |    1    | 0.857143 |
|           hf_T5_large           |    1    | 0.847504 |
|           hf_Reformer           |    1    | 0.815948 |
|              maml               |    1    | 0.78321  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.761603 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.566371 |
|        timm_efficientdet        |    0    |   0.0    |
|       doctr_det_predictor       |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|  timm_vision_transformer_large  |   32    | 845.699632 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 782.915448 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 781.019706 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 703.35663  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 671.085919 |
|           hf_T5_base            |    1    | 428.742135 |
|          hf_GPT2_large          |    1    | 151.512865 |
|           hf_T5_large           |    1    | 127.578293 |
|           timm_nfnet            |   128   | 123.882346 |
|            moondream            |    1    | 115.729319 |
|        hf_distil_whisper        |    1    | 92.548188  |
|    detectron2_fcos_r_50_fpn     |    1    | 83.174362  |
|           hf_BigBird            |    1    | 81.081857  |
|       Background_Matting        |    1    | 67.052779  |
|      torch_multimodal_clip      |   32    | 57.578416  |
|          pytorch_unet           |    1    | 57.289921  |
|           densenet121           |   64    | 54.655925  |
|             demucs              |    1    |  51.99695  |
|           timm_regnet           |   32    | 45.160416  |
|              maml               |    1    | 42.649871  |
|          hf_Longformer          |    1    | 38.390296  |
|          hf_Bert_large          |    1    | 34.874046  |
|            resnet152            |   32    | 33.628964  |
|             yolov3              |    8    | 30.889313  |
|   mobilenet_v2_quantized_qat    |   96    | 30.296809  |
|     pyhpc_isoneutral_mixing     | 1048576 | 29.531509  |
|        timm_efficientnet        |   64    | 29.283341  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 28.093707  |
|       speech_transformer        |    1    | 23.065781  |
|           hf_Reformer           |    1    | 21.414858  |
|     nvidia_deeprecommender      |   256   | 21.016112  |
|     timm_vision_transformer     |   32    | 20.430165  |
|         pytorch_stargan         |   16    | 18.406796  |
|             hf_Bart             |    1    | 18.353533  |
|              hf_T5              |    1    | 18.178575  |
|           timm_vovnet           |   32    | 17.317528  |
|          fastNLP_Bert           |    1    | 17.157346  |
|         opacus_cifar10          |   64    | 14.625862  |
|      functorch_dp_cifar10       |   64    | 14.578798  |
|            hf_Albert            |    1    | 14.526756  |
|             hf_Bert             |    1    | 13.892222  |
|            resnet50             |   32    | 12.631592  |
|          BERT_pytorch           |    2    | 12.342717  |
|             hf_GPT2             |    1    | 10.892478  |
|     resnet50_quantized_qat      |   32    | 10.270025  |
|          timm_resnest           |   32    |  9.558193  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  9.469662  |
|       shufflenet_v2_x1_0        |   64    |  8.487955  |
|          hf_DistilBert          |    1    |  8.134544  |
|         LearningToPaint         |   96    |  8.072396  |
|           tts_angular           |   64    |  7.880127  |
|       mobilenet_v3_large        |   32    |  7.764363  |
|         resnext50_32x4d         |    8    |  7.654608  |
|        basic_gnn_edgecnn        |    1    |  7.229437  |
|          basic_gnn_gcn          |    1    |  7.137742  |
|        phlippe_densenet         |   128   |  7.051913  |
|              vgg16              |    4    |  6.765352  |
|             alexnet             |   128   |  6.003276  |
|           mnasnet1_0            |   32    |  5.786484  |
|          mobilenet_v2           |   16    |  4.359233  |
|              dlrm               |  2048   |  4.113164  |
|         basic_gnn_sage          |    1    |  3.88467   |
|          basic_gnn_gin          |    1    |  3.381753  |
|          squeezenet1_1          |   16    |  2.995056  |
|            resnet18             |    8    |  2.94439   |
|      doctr_reco_predictor       |    1    |  2.851076  |
|              dcgan              |   256   |  2.673818  |
|         phlippe_resnet          |   128   |  1.540983  |
|     pyhpc_equation_of_state     | 1048576 |  1.462089  |
|               drq               |    1    |  0.753767  |
|          maml_omniglot          |    5    |  0.486133  |
|     functorch_maml_omniglot     |    1    |  0.354601  |
|        soft_actor_critic        |   256   |  0.237116  |
|          lennard_jones          |  1000   |  0.137356  |
|       doctr_det_predictor       |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
|         DALLE2_pytorch          |    0    |    0.0     |
|              moco               |    0    |    0.0     |
+---------------------------------+---------+------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|            XLNetLMHeadModel             |  8  | 11.774862 |
|       ElectraForQuestionAnswering       | 64  | 4.166614  |
|           ElectraForCausalLM            | 32  | 3.546239  |
|          MobileBertForMaskedLM          | 128 | 3.319789  |
|     MobileBertForQuestionAnswering      | 128 | 3.214801  |
|    LayoutLMForSequenceClassification    | 16  | 3.053942  |
|       RobertaForQuestionAnswering       | 16  | 3.050891  |
|           RobertaForCausalLM            | 16  | 3.005414  |
|             BertForMaskedLM             | 16  | 2.975271  |
|        BertForQuestionAnswering         | 16  | 2.964658  |
|           LayoutLMForMaskedLM           | 16  |  2.95167  |
|                CamemBert                | 16  | 2.921472  |
|       T5ForConditionalGeneration        |  4  | 2.638325  |
|                 T5Small                 |  4  | 2.469584  |
|            YituTechConvBert             | 16  | 2.411148  |
|         MegatronBertForCausalLM         |  4  | 2.180937  |
|    MegatronBertForQuestionAnswering     |  8  | 2.180529  |
|       MT5ForConditionalGeneration       | 16  | 2.177398  |
|               DistillGPT2               | 16  |  2.08638  |
|             OPTForCausalLM              |  2  | 1.943174  |
|         Speech2Text2ForCausalLM         | 256 | 1.892963  |
|      DebertaV2ForQuestionAnswering      |  1  | 1.886981  |
|       BlenderbotSmallForCausalLM        | 64  | 1.874694  |
|      GPT2ForSequenceClassification      |  4  | 1.850502  |
|          DistilBertForMaskedLM          | 128 | 1.826833  |
|           DebertaForMaskedLM            |  8  | 1.803341  |
|             XGLMForCausalLM             |  8  | 1.776776  |
|            PLBartForCausalLM            |  8  | 1.729647  |
| BlenderbotSmallForConditionalGeneration | 64  | 1.689789  |
|          BlenderbotForCausalLM          |  4  | 1.671721  |
|     PLBartForConditionalGeneration      |  4  | 1.628082  |
|            MBartForCausalLM             |  4  |  1.62709  |
|            TrOCRForCausalLM             | 32  |  1.60904  |
|     DistilBertForQuestionAnswering      | 256 | 1.598619  |
|      MBartForConditionalGeneration      |  2  | 1.510366  |
|       DebertaForQuestionAnswering       | 16  |  1.50853  |
|           PegasusForCausalLM            | 32  | 1.501486  |
|     PegasusForConditionalGeneration     | 32  | 1.498671  |
|     M2M100ForConditionalGeneration      | 16  | 1.447903  |
|      BartForConditionalGeneration       |  2  | 1.432586  |
|             BartForCausalLM             |  4  | 1.427808  |
|               GoogleFnet                | 16  | 1.347446  |
|            AlbertForMaskedLM            |  4  | 1.316131  |
|       AlbertForQuestionAnswering        |  4  | 1.313311  |
|          AllenaiLongformerBase          |  4  | 1.051358  |
|          DebertaV2ForMaskedLM           |  2  | 0.866624  |
+-----------------------------------------+-----+-----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 100.808843 |
|          MobileBertForMaskedLM          | 128 | 58.625619  |
|     M2M100ForConditionalGeneration      | 16  | 56.642587  |
|     MobileBertForQuestionAnswering      | 128 |  56.60743  |
|     PegasusForConditionalGeneration     | 32  | 56.388547  |
|      MBartForConditionalGeneration      |  2  | 54.449449  |
|          BlenderbotForCausalLM          |  4  | 50.695166  |
|       MT5ForConditionalGeneration       | 16  | 49.193567  |
|             XGLMForCausalLM             |  8  | 46.327102  |
|          DebertaV2ForMaskedLM           |  2  | 41.950282  |
|       T5ForConditionalGeneration        |  4  | 40.210126  |
|                 T5Small                 |  4  |  40.03964  |
|      BartForConditionalGeneration       |  2  | 38.490385  |
|            XLNetLMHeadModel             |  8  | 38.214855  |
|         MegatronBertForCausalLM         |  4  | 37.875503  |
| BlenderbotSmallForConditionalGeneration | 64  | 37.696995  |
|    MegatronBertForQuestionAnswering     |  8  | 37.346326  |
|            YituTechConvBert             | 16  | 36.399873  |
|     PLBartForConditionalGeneration      |  4  | 33.498661  |
|             OPTForCausalLM              |  2  | 33.137675  |
|           PegasusForCausalLM            | 32  | 31.127456  |
|            MBartForCausalLM             |  4  | 29.876615  |
|      GPT2ForSequenceClassification      |  4  |  27.99927  |
|       DebertaForQuestionAnswering       | 16  | 26.474354  |
|      DebertaV2ForQuestionAnswering      |  1  | 26.151529  |
|           DebertaForMaskedLM            |  8  | 26.017072  |
|            TrOCRForCausalLM             | 32  | 25.932217  |
|               DistillGPT2               | 16  | 23.358327  |
|           RobertaForCausalLM            | 16  | 22.756811  |
|            AlbertForMaskedLM            |  4  | 22.608382  |
|       AlbertForQuestionAnswering        |  4  | 22.510629  |
|       RobertaForQuestionAnswering       | 16  | 22.117833  |
|                CamemBert                | 16  |  22.0824   |
|    LayoutLMForSequenceClassification    | 16  | 21.820922  |
|           ElectraForCausalLM            | 32  | 21.435958  |
|           LayoutLMForMaskedLM           | 16  | 21.240305  |
|             BertForMaskedLM             | 16  | 20.981901  |
|        BertForQuestionAnswering         | 16  |  20.83665  |
|       ElectraForQuestionAnswering       | 64  | 20.774808  |
|       BlenderbotSmallForCausalLM        | 64  | 20.703746  |
|          DistilBertForMaskedLM          | 128 | 20.330927  |
|     DistilBertForQuestionAnswering      | 256 | 20.143036  |
|             BartForCausalLM             |  4  | 20.001899  |
|         Speech2Text2ForCausalLM         | 256 | 19.118864  |
|            PLBartForCausalLM            |  8  | 18.958578  |
|               GoogleFnet                | 16  | 17.329742  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            PLBartForCausalLM            |  8  | 0.991631 |
|            AlbertForMaskedLM            |  4  | 0.99158  |
|               GoogleFnet                | 16  | 0.991277 |
|       AlbertForQuestionAnswering        |  4  | 0.990885 |
|          DistilBertForMaskedLM          | 128 | 0.990169 |
|           ElectraForCausalLM            | 32  | 0.990168 |
|               DistillGPT2               | 16  | 0.989637 |
|       ElectraForQuestionAnswering       | 64  | 0.987559 |
|                CamemBert                | 16  | 0.987545 |
|            YituTechConvBert             | 16  | 0.987308 |
|             BertForMaskedLM             | 16  | 0.986943 |
|       BlenderbotSmallForCausalLM        | 64  | 0.986871 |
|             OPTForCausalLM              |  2  | 0.986723 |
|         Speech2Text2ForCausalLM         | 256 | 0.986576 |
|     DistilBertForQuestionAnswering      | 256 | 0.986139 |
|           RobertaForCausalLM            | 16  | 0.985147 |
|           LayoutLMForMaskedLM           | 16  | 0.984646 |
|            TrOCRForCausalLM             | 32  | 0.983897 |
|       DebertaForQuestionAnswering       | 16  | 0.98233  |
| BlenderbotSmallForConditionalGeneration | 64  | 0.982082 |
|        BertForQuestionAnswering         | 16  | 0.981246 |
|       RobertaForQuestionAnswering       | 16  | 0.980423 |
|    LayoutLMForSequenceClassification    | 16  | 0.980089 |
|          MobileBertForMaskedLM          | 128 | 0.978504 |
|       MT5ForConditionalGeneration       | 16  | 0.977966 |
|      GPT2ForSequenceClassification      |  4  | 0.97776  |
|                 T5Small                 |  4  | 0.972016 |
|             BartForCausalLM             |  4  | 0.969741 |
|          AllenaiLongformerBase          |  4  | 0.96833  |
|       T5ForConditionalGeneration        |  4  | 0.968235 |
|           DebertaForMaskedLM            |  8  | 0.968121 |
|           PegasusForCausalLM            | 32  | 0.967489 |
|     MobileBertForQuestionAnswering      | 128 | 0.96716  |
|     PLBartForConditionalGeneration      |  4  | 0.964722 |
|    MegatronBertForQuestionAnswering     |  8  | 0.954454 |
|            XLNetLMHeadModel             |  8  | 0.951057 |
|         MegatronBertForCausalLM         |  4  | 0.945238 |
|             XGLMForCausalLM             |  8  | 0.938992 |
|          BlenderbotForCausalLM          |  4  | 0.936034 |
|      MBartForConditionalGeneration      |  2  | 0.932417 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.928998 |
|     PegasusForConditionalGeneration     | 32  | 0.918433 |
|     M2M100ForConditionalGeneration      | 16  | 0.915244 |
|      BartForConditionalGeneration       |  2  | 0.913877 |
|            MBartForCausalLM             |  4  | 0.90739  |
|          DebertaV2ForMaskedLM           |  2  | 0.885394 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 947.595933 |
|            AlbertForMaskedLM            |  4  | 532.920554 |
|       AlbertForQuestionAnswering        |  4  | 527.923774 |
|            XLNetLMHeadModel             |  8  | 395.520053 |
|               GoogleFnet                | 16  | 274.88109  |
|          DebertaV2ForMaskedLM           |  2  | 212.697871 |
|             OPTForCausalLM              |  2  | 206.777547 |
|            MBartForCausalLM             |  4  | 176.638114 |
|            TrOCRForCausalLM             | 32  | 173.069754 |
|      MBartForConditionalGeneration      |  2  | 172.197306 |
|     PegasusForConditionalGeneration     | 32  | 165.116272 |
|    MegatronBertForQuestionAnswering     |  8  | 150.993884 |
|     DistilBertForQuestionAnswering      | 256 | 145.64915  |
|            PLBartForCausalLM            |  8  | 135.112405 |
|      BartForConditionalGeneration       |  2  | 132.78853  |
|     PLBartForConditionalGeneration      |  4  | 123.046904 |
|          BlenderbotForCausalLM          |  4  | 121.962769 |
|     M2M100ForConditionalGeneration      | 16  | 121.153403 |
|            YituTechConvBert             | 16  | 118.633218 |
|       DebertaForQuestionAnswering       | 16  | 116.592905 |
|                 T5Small                 |  4  | 114.637252 |
|       T5ForConditionalGeneration        |  4  | 105.480373 |
|          DistilBertForMaskedLM          | 128 | 104.77977  |
| BlenderbotSmallForConditionalGeneration | 64  | 103.985256 |
|           RobertaForCausalLM            | 16  | 103.550087 |
|             BartForCausalLM             |  4  | 96.237102  |
|                CamemBert                | 16  | 93.637738  |
|             BertForMaskedLM             | 16  |  93.63184  |
|           LayoutLMForMaskedLM           | 16  | 92.884102  |
|          MobileBertForMaskedLM          | 128 |  86.94387  |
|         MegatronBertForCausalLM         |  4  | 84.778388  |
|               DistillGPT2               | 16  | 82.262531  |
|           PegasusForCausalLM            | 32  | 79.966046  |
|             XGLMForCausalLM             |  8  | 76.361216  |
|        BertForQuestionAnswering         | 16  | 75.653226  |
|    LayoutLMForSequenceClassification    | 16  | 73.742454  |
|       RobertaForQuestionAnswering       | 16  | 73.650218  |
|           DebertaForMaskedLM            |  8  | 73.448815  |
|      GPT2ForSequenceClassification      |  4  | 70.116451  |
|       ElectraForQuestionAnswering       | 64  | 61.019157  |
|         Speech2Text2ForCausalLM         | 256 | 58.827853  |
|       MT5ForConditionalGeneration       | 16  | 58.053492  |
|           ElectraForCausalLM            | 32  | 57.733673  |
|      DebertaV2ForQuestionAnswering      |  1  | 56.173805  |
|       BlenderbotSmallForCausalLM        | 64  |  53.62636  |
|     MobileBertForQuestionAnswering      | 128 | 50.823048  |
+-----------------------------------------+-----+------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           mnasnet_100           | 512  | 4.622594 |
|           fbnetc_100            | 512  | 4.621306 |
|           resnest101e           |  64  | 4.558618 |
|      mobilenetv3_large_100      | 512  | 4.25095  |
|            lcnet_050            | 256  | 4.221768 |
|           regnety_002           | 1024 | 4.200456 |
|         mobilenetv2_100         | 128  | 4.127641 |
|        ese_vovnet19b_dw         | 256  | 4.108087 |
|          cspdarknet53           |  64  | 3.908167 |
|          botnet26t_256          | 128  | 3.870786 |
|           res2next50            | 128  | 3.84389  |
|          spnasnet_100           | 128  | 3.820652 |
|       gluon_inception_v3        | 256  | 3.707281 |
|          inception_v3           | 128  | 3.700533 |
|            hrnet_w18            | 128  | 3.675333 |
|        res2net101_26w_4s        | 128  | 3.671879 |
|        adv_inception_v3         | 128  | 3.624968 |
|             dla102              | 128  | 3.58833  |
|        res2net50_14w_8s         | 128  | 3.539038 |
|          pnasnet5large          |  16  | 3.49944  |
|            fbnetv3_b            | 256  | 3.49361  |
|           rexnet_100            | 256  | 3.459037 |
|            gernet_l             | 128  | 3.421828 |
|     swsl_resnext101_32x16d      |  32  | 3.265739 |
|            nfnet_l0             | 128  | 3.152725 |
|            tinynet_a            | 128  | 3.036878 |
|       eca_botnext26ts_256       | 128  | 2.990669 |
|        eca_halonext26ts         | 128  | 2.708999 |
|           volo_d1_224           |  64  | 2.622637 |
|           dm_nfnet_f0           | 128  | 2.520862 |
|           selecsls42b           | 128  | 2.519809 |
|            repvgg_a2            | 128  | 2.494434 |
|       tf_efficientnet_b0        | 128  | 2.375544 |
|          ghostnet_100           | 512  | 2.345429 |
|         visformer_small         | 128  | 2.26516  |
|         poolformer_m36          |  64  | 2.067349 |
|        convmixer_768_32         |  32  | 1.98886  |
|          convnext_base          |  64  | 1.876291 |
|             dpn107              |  64  | 1.874312 |
|            levit_128            | 1024 | 1.861652 |
|      xcit_large_24_p8_224       |  16  | 1.763508 |
|           tf_mixnet_l           | 128  | 1.706511 |
|            mixnet_l             | 128  | 1.678824 |
|          gmlp_s16_224           | 128  | 1.644142 |
|  swin_base_patch4_window7_224   |  64  | 1.606953 |
|        twins_pcpvt_base         | 128  | 1.587443 |
|           mobilevit_s           |  64  | 1.520327 |
|        sebotnet33ts_256         |  64  | 1.499806 |
| deit_base_distilled_patch16_224 |  64  | 1.468761 |
|      beit_base_patch16_224      |  64  | 1.450584 |
|          mixer_b16_224          | 128  | 1.383889 |
|      vit_base_patch16_224       |  64  | 1.362157 |
|           convit_base           |  64  | 1.351873 |
|            pit_b_224            |  64  | 1.332813 |
|         crossvit_9_240          | 256  | 1.332419 |
|          gmixer_24_224          | 128  | 1.321483 |
|        tnt_s_patch16_224        | 128  | 1.232425 |
|          jx_nest_base           |  32  | 1.113842 |
|          resmlp_12_224          | 128  | 1.094632 |
|          cait_m36_384           |  4   | 0.838237 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 8  |     pass      |
|          jx_nest_base           | 8  |     pass      |
|          cspdarknet53           | 8  |     pass      |
| deit_base_distilled_patch16_224 | 8  |     pass      |
|        convmixer_768_32         | 8  |     pass      |
|           convit_base           | 8  |     pass      |
|          gmlp_s16_224           | 8  |     pass      |
|          inception_v3           | 8  |     pass      |
|           dm_nfnet_f0           | 8  |     pass      |
|             dpn107              | 8  |     pass      |
|       eca_botnext26ts_256       | 8  |     pass      |
|         crossvit_9_240          | 8  |     pass      |
|        eca_halonext26ts         | 8  |     pass      |
|        ese_vovnet19b_dw         | 8  |     pass      |
|           fbnetc_100            | 8  |     pass      |
|            fbnetv3_b            | 8  |     pass      |
|            gernet_l             | 8  |     pass      |
|          botnet26t_256          | 8  |     pass      |
|       gluon_inception_v3        | 8  |     pass      |
|          gmixer_24_224          | 8  |     pass      |
|          convnext_base          | 8  |     pass      |
|            mixnet_l             | 8  |     pass      |
|          cait_m36_384           | 8  |     pass      |
|      vit_base_patch16_224       | 8  |     pass      |
|           regnety_002           | 8  |     pass      |
|          mixer_b16_224          | 8  |     pass      |
|        twins_pcpvt_base         | 8  |     pass      |
|           mnasnet_100           | 8  |     pass      |
|         mobilenetv2_100         | 8  |     pass      |
|      mobilenetv3_large_100      | 8  |     pass      |
|           mobilevit_s           | 8  |     pass      |
|            nfnet_l0             | 8  |     pass      |
|            pit_b_224            | 8  |     pass      |
|         poolformer_m36          | 8  |     pass      |
|      beit_base_patch16_224      | 8  |     pass      |
|            repvgg_a2            | 8  |     pass      |
|        sebotnet33ts_256         | 8  |     pass      |
|          spnasnet_100           | 8  |     pass      |
|  swin_base_patch4_window7_224   | 8  |     pass      |
|       tf_efficientnet_b0        | 8  |     pass      |
|           tf_mixnet_l           | 8  |     pass      |
|            tinynet_a            | 8  |     pass      |
|        tnt_s_patch16_224        | 8  |     pass      |
|          resmlp_12_224          | 8  |     pass      |
|            lcnet_050            | 8  |  fail_to_run  |
|         coat_lite_mini          | 8  |  fail_to_run  |
|            levit_128            | 8  | fail_accuracy |
|          ghostnet_100           | 8  | fail_accuracy |
|           resnest101e           | 8  | fail_accuracy |
|          pnasnet5large          | 8  | fail_accuracy |
|        res2net101_26w_4s        | 8  | fail_accuracy |
|        res2net50_14w_8s         | 8  | fail_accuracy |
|             dla102              | 8  | fail_accuracy |
|           volo_d1_224           | 8  | fail_accuracy |
|           rexnet_100            | 8  | fail_accuracy |
|           selecsls42b           | 8  | fail_accuracy |
|           res2next50            | 8  | fail_accuracy |
|         visformer_small         | 8  | fail_accuracy |
|            hrnet_w18            | 8  | fail_accuracy |
|     swsl_resnext101_32x16d      | 8  | fail_accuracy |
|      xcit_large_24_p8_224       | 8  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+------+-----------+
|              name               |  bs  | inductor  |
+---------------------------------+------+-----------+
|  swin_base_patch4_window7_224   |  64  | 91.309225 |
|          pnasnet5large          |  16  | 88.44868  |
|           tf_mixnet_l           | 128  | 78.581046 |
|          cait_m36_384           |  4   | 76.354171 |
|      xcit_large_24_p8_224       |  16  | 74.785856 |
|             dpn107              |  64  | 70.47647  |
|          jx_nest_base           |  32  | 67.165675 |
|        tnt_s_patch16_224        | 128  | 64.801121 |
|            levit_128            | 1024 | 60.72735  |
|        twins_pcpvt_base         | 128  | 60.495047 |
|         crossvit_9_240          | 256  | 59.361634 |
|           mobilevit_s           |  64  | 59.225914 |
|        res2net50_14w_8s         | 128  | 58.716555 |
|           rexnet_100            | 256  | 58.465336 |
|            mixnet_l             | 128  | 57.982928 |
|        sebotnet33ts_256         |  64  | 56.603192 |
|        eca_halonext26ts         | 128  | 56.153419 |
|         poolformer_m36          |  64  | 55.669137 |
|           volo_d1_224           |  64  | 53.210645 |
|          ghostnet_100           | 512  | 52.684546 |
|            hrnet_w18            | 128  | 52.228669 |
|           dm_nfnet_f0           | 128  | 47.997233 |
|           resnest101e           |  64  | 47.649611 |
|         visformer_small         | 128  | 47.411549 |
|       eca_botnext26ts_256       | 128  | 45.351368 |
|           convit_base           |  64  | 44.829644 |
|        res2net101_26w_4s        | 128  | 44.009608 |
|          convnext_base          |  64  | 42.840493 |
|       tf_efficientnet_b0        | 128  | 40.028811 |
|            nfnet_l0             | 128  | 37.470189 |
|       gluon_inception_v3        | 256  | 35.202772 |
|            fbnetv3_b            | 256  | 34.837226 |
|            tinynet_a            | 128  | 34.367262 |
|          botnet26t_256          | 128  | 34.264065 |
|        adv_inception_v3         | 128  | 34.053806 |
|          inception_v3           | 128  | 34.038214 |
|          gmixer_24_224          | 128  | 33.935297 |
|           res2next50            | 128  | 33.927013 |
|            pit_b_224            |  64  | 31.677666 |
|          gmlp_s16_224           | 128  | 30.899696 |
|             dla102              | 128  | 27.649597 |
|          cspdarknet53           |  64  | 26.478872 |
| deit_base_distilled_patch16_224 |  64  | 26.155787 |
|      vit_base_patch16_224       |  64  | 24.484742 |
|      mobilenetv3_large_100      | 512  | 24.421407 |
|        ese_vovnet19b_dw         | 256  | 22.581012 |
|      beit_base_patch16_224      |  64  | 20.394902 |
|          mixer_b16_224          | 128  | 19.86463  |
|        convmixer_768_32         |  32  | 19.137797 |
|           regnety_002           | 1024 | 19.009553 |
|          resmlp_12_224          | 128  | 18.47641  |
|            repvgg_a2            | 128  | 15.310874 |
|     swsl_resnext101_32x16d      |  32  | 14.493492 |
|           selecsls42b           | 128  | 14.428248 |
|            lcnet_050            | 256  | 14.203367 |
|         mobilenetv2_100         | 128  | 11.359194 |
|            gernet_l             | 128  | 10.125306 |
|          spnasnet_100           | 128  | 10.077286 |
|           fbnetc_100            | 512  | 10.015267 |
|           mnasnet_100           | 512  | 9.189981  |
+---------------------------------+------+-----------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.995902 |
|           fbnetc_100            | 512  | 0.993949 |
|           mnasnet_100           | 512  | 0.993814 |
|           rexnet_100            | 256  | 0.993556 |
|       gluon_inception_v3        | 256  | 0.992657 |
|      mobilenetv3_large_100      | 512  | 0.992457 |
|            fbnetv3_b            | 256  | 0.992377 |
|           dm_nfnet_f0           | 128  | 0.99228  |
|            levit_128            | 1024 | 0.992258 |
|          ghostnet_100           | 512  | 0.991689 |
|            nfnet_l0             | 128  | 0.989797 |
|           convit_base           |  64  | 0.989759 |
|           res2next50            | 128  | 0.989733 |
|             dpn107              |  64  | 0.989399 |
|       eca_botnext26ts_256       | 128  | 0.989071 |
|        res2net101_26w_4s        | 128  | 0.989009 |
|           resnest101e           |  64  | 0.988858 |
|          mixer_b16_224          | 128  | 0.988833 |
|        eca_halonext26ts         | 128  | 0.988788 |
|             dla102              | 128  | 0.988282 |
|           tf_mixnet_l           | 128  | 0.987781 |
|        twins_pcpvt_base         | 128  | 0.987727 |
|        adv_inception_v3         | 128  | 0.987717 |
|           regnety_002           | 1024 | 0.987472 |
|          inception_v3           | 128  | 0.98744  |
|      xcit_large_24_p8_224       |  16  | 0.987393 |
|         visformer_small         | 128  | 0.987229 |
|        res2net50_14w_8s         | 128  | 0.987118 |
|       tf_efficientnet_b0        | 128  | 0.987043 |
|        convmixer_768_32         |  32  | 0.98688  |
|          gmixer_24_224          | 128  | 0.986492 |
|            mixnet_l             | 128  | 0.98627  |
|          cspdarknet53           |  64  | 0.986021 |
|          botnet26t_256          | 128  | 0.985654 |
|            gernet_l             | 128  | 0.985518 |
|      beit_base_patch16_224      |  64  | 0.985357 |
|          gmlp_s16_224           | 128  | 0.985187 |
|         mobilenetv2_100         | 128  | 0.985123 |
|          convnext_base          |  64  | 0.984649 |
| deit_base_distilled_patch16_224 |  64  | 0.984555 |
|        tnt_s_patch16_224        | 128  | 0.984446 |
|            hrnet_w18            | 128  | 0.984415 |
|          pnasnet5large          |  16  | 0.984365 |
|      vit_base_patch16_224       |  64  | 0.98427  |
|            tinynet_a            | 128  | 0.983887 |
|         crossvit_9_240          | 256  | 0.983796 |
|         poolformer_m36          |  64  | 0.98321  |
|            pit_b_224            |  64  | 0.981869 |
|  swin_base_patch4_window7_224   |  64  | 0.981771 |
|           selecsls42b           | 128  | 0.981686 |
|          resmlp_12_224          | 128  | 0.981395 |
|           mobilevit_s           |  64  | 0.980998 |
|          spnasnet_100           | 128  | 0.980348 |
|           volo_d1_224           |  64  | 0.979141 |
|     swsl_resnext101_32x16d      |  32  | 0.979044 |
|            lcnet_050            | 256  | 0.976209 |
|          cait_m36_384           |  4   | 0.972826 |
|          jx_nest_base           |  32  | 0.972324 |
|            repvgg_a2            | 128  | 0.971041 |
|        sebotnet33ts_256         |  64  | 0.836896 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          cait_m36_384           |  4   | 469.833826 |
|      xcit_large_24_p8_224       |  16  | 273.563107 |
|           convit_base           |  64  | 248.870111 |
|        tnt_s_patch16_224        | 128  | 246.209255 |
|             dpn107              |  64  | 227.076416 |
|           dm_nfnet_f0           | 128  | 214.071541 |
|            levit_128            | 1024 | 213.872789 |
|       gluon_inception_v3        | 256  | 181.678702 |
|        ese_vovnet19b_dw         | 256  | 174.171413 |
|          convnext_base          |  64  | 169.461232 |
|          mixer_b16_224          | 128  | 166.975409 |
|            nfnet_l0             | 128  | 157.187485 |
|         poolformer_m36          |  64  | 153.655254 |
|        twins_pcpvt_base         | 128  | 145.954963 |
|  swin_base_patch4_window7_224   |  64  | 140.369901 |
|           tf_mixnet_l           | 128  | 138.270186 |
|            mixnet_l             | 128  | 135.613909 |
|         crossvit_9_240          | 256  | 134.604067 |
|          ghostnet_100           | 512  | 127.077122 |
|           volo_d1_224           |  64  | 125.007425 |
|           mobilevit_s           |  64  | 124.616277 |
|          jx_nest_base           |  32  | 121.416443 |
|        sebotnet33ts_256         |  64  | 115.891835 |
|          gmixer_24_224          | 128  | 111.338789 |
|      vit_base_patch16_224       |  64  | 109.320731 |
|          pnasnet5large          |  16  | 108.097818 |
|      beit_base_patch16_224      |  64  | 107.808669 |
|        eca_halonext26ts         | 128  | 106.026821 |
|          resmlp_12_224          | 128  | 106.012806 |
|        res2net101_26w_4s        | 128  | 105.48364  |
|            pit_b_224            |  64  | 103.198939 |
|          gmlp_s16_224           | 128  | 102.057087 |
|        convmixer_768_32         |  32  | 102.043938 |
| deit_base_distilled_patch16_224 |  64  | 101.92877  |
|           fbnetc_100            | 512  | 99.242029  |
|       eca_botnext26ts_256       | 128  | 95.320762  |
|     swsl_resnext101_32x16d      |  32  | 95.000497  |
|           regnety_002           | 1024 | 88.287078  |
|             dla102              | 128  | 88.268387  |
|          inception_v3           | 128  |  88.26447  |
|        adv_inception_v3         | 128  | 87.513356  |
|         visformer_small         | 128  | 86.924606  |
|            hrnet_w18            | 128  |  86.50523  |
|        res2net50_14w_8s         | 128  | 85.989228  |
|           rexnet_100            | 256  | 84.497408  |
|            fbnetv3_b            | 256  | 84.273823  |
|           mnasnet_100           | 512  | 83.003964  |
|           res2next50            | 128  | 81.599129  |
|      mobilenetv3_large_100      | 512  | 81.105758  |
|           resnest101e           |  64  | 78.542101  |
|          botnet26t_256          | 128  | 73.708855  |
|       tf_efficientnet_b0        | 128  | 56.775496  |
|          cspdarknet53           |  64  | 51.689701  |
|            repvgg_a2            | 128  | 48.775566  |
|            gernet_l             | 128  | 38.559239  |
|            tinynet_a            | 128  | 37.220749  |
|           selecsls42b           | 128  | 35.038166  |
|         mobilenetv2_100         | 128  | 22.860358  |
|          spnasnet_100           | 128  | 19.959799  |
|            lcnet_050            | 256  |  8.190327  |
+---------------------------------+------+------------+

@WeizhuoZhang-intel
Copy link
Contributor

[amp] Performance Dashboard for amp precision -- Single-core Single-thread (2024-04-27 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 7478b7f1cac9686f00edf3db4667cf86d2421531
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c7i.metal-24xl
CPU Model Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory 192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS Ubuntu 22.04.3 LTS
Kernel 6.2.0-1017-aws
Microcode 0x2b0004d0
GCC gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.8.18
OpenSSL OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 80%, 63/79 | 98%, 45/46  | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.40x    |    1.77x    |    2.54x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   18.39    |    21.27    |    25.84    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.89x    |    0.90x    |    0.86x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 49.828601 |
|     pyhpc_equation_of_state     |    1    | 20.966484 |
|              dcgan              |    1    | 9.440673  |
|          squeezenet1_1          |    1    | 8.533793  |
|          lennard_jones          |    1    | 6.192456  |
|          timm_resnest           |    1    | 6.070967  |
|            resnet18             |    1    |  5.7357   |
|         opacus_cifar10          |    1    |  5.56847  |
|      functorch_dp_cifar10       |    1    | 5.499795  |
|            resnet50             |    1    | 5.257349  |
|           timm_nfnet            |    1    | 4.955201  |
|         resnext50_32x4d         |    1    | 4.915313  |
|              vgg16              |    1    | 4.906899  |
|         LearningToPaint         |    1    | 4.851726  |
|          mobilenet_v2           |    1    | 4.747005  |
|           mnasnet1_0            |    1    | 4.413679  |
|            resnet152            |    1    | 4.323552  |
|           timm_vovnet           |    1    | 4.234655  |
|     nvidia_deeprecommender      |    1    | 4.097899  |
|             alexnet             |    1    |  4.01877  |
|             yolov3              |    1    | 3.911425  |
|      doctr_reco_predictor       |    1    | 3.860266  |
|       mobilenet_v3_large        |    1    | 3.727164  |
|              llama              |    1    | 3.588445  |
|       shufflenet_v2_x1_0        |    1    | 3.449608  |
|     functorch_maml_omniglot     |    1    | 3.322825  |
|           densenet121           |    1    | 3.220455  |
|           timm_regnet           |    1    | 3.094248  |
|         phlippe_resnet          |    1    | 2.939302  |
|              dlrm               |    1    |  2.93558  |
|          basic_gnn_gcn          |    1    | 2.780107  |
|        phlippe_densenet         |    1    | 2.647227  |
|    detectron2_fcos_r_50_fpn     |    1    | 2.620495  |
|          maml_omniglot          |    5    | 2.611445  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 2.474838  |
|               drq               |    1    | 2.454359  |
|        timm_efficientnet        |    1    | 2.377067  |
|          BERT_pytorch           |    1    |  2.15372  |
|          pytorch_unet           |    1    | 2.146055  |
| detectron2_fasterrcnn_r_101_dc5 |    1    |  2.07594  |
| detectron2_fasterrcnn_r_101_c4  |    1    |  1.99886  |
|          basic_gnn_gin          |    1    | 1.983735  |
|        soft_actor_critic        |   256   |  1.9178   |
|       Background_Matting        |    1    | 1.846673  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.840449  |
|         basic_gnn_sage          |    1    | 1.823059  |
|     timm_vision_transformer     |    1    | 1.812759  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.787001  |
|          hf_Bert_large          |    1    | 1.729726  |
|             hf_Bert             |    1    | 1.685112  |
|             hf_GPT2             |    1    | 1.674442  |
|        basic_gnn_edgecnn        |    1    |  1.65568  |
|          hf_DistilBert          |    1    | 1.640617  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  1.59622  |
|         pytorch_stargan         |   16    | 1.580523  |
|  timm_vision_transformer_large  |    1    | 1.538612  |
|             hf_Bart             |    1    |  1.50022  |
|          hf_GPT2_large          |    1    | 1.448646  |
|           hf_T5_base            |    1    |  1.42816  |
|        hf_distil_whisper        |    1    | 1.328366  |
|           hf_BigBird            |    1    | 1.311553  |
|            moondream            |    1    | 1.282627  |
|       speech_transformer        |    1    | 1.255005  |
|           hf_T5_large           |    1    | 1.214387  |
|            hf_Albert            |    1    | 1.170107  |
|          fastNLP_Bert           |    1    | 1.145831  |
|              hf_T5              |    1    | 1.064986  |
|           hf_Reformer           |    1    | 1.040109  |
|             demucs              |    1    | 1.029498  |
|      torch_multimodal_clip      |    1    | 1.027574  |
|           tts_angular           |    1    | 0.997963  |
|     resnet50_quantized_qat      |    1    | 0.986989  |
|   mobilenet_v2_quantized_qat    |    1    | 0.981308  |
|              maml               |    1    | 0.973683  |
|          hf_Longformer          |    1    | 0.902847  |
|       doctr_det_predictor       |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
|              moco               |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|               drq               |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|       speech_transformer        |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|              llama              |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|          timm_resnest           |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
|        timm_efficientdet        |    0    | model_fail_to_load |
|         vision_maskrcnn         |    1    |    fail_to_run     |
|           Super_SloMo           |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_101_fpn |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    1    |    fail_to_run     |
|          mobilenet_v2           |    1    |   fail_accuracy    |
|           mnasnet1_0            |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_c4  |    1    |   fail_accuracy    |
|           densenet121           |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    1    |   fail_accuracy    |
|       shufflenet_v2_x1_0        |    1    |   fail_accuracy    |
|         resnext50_32x4d         |    1    |   fail_accuracy    |
|            resnet50             |    1    |   fail_accuracy    |
|            resnet18             |    1    |   fail_accuracy    |
|            resnet152            |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    1    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    1    |   fail_accuracy    |
|       doctr_det_predictor       |    0    | eager_fail_to_run  |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           densenet121           |    1    | 69.838742 |
|           hf_BigBird            |    1    | 65.472902 |
|    detectron2_fcos_r_50_fpn     |    1    | 50.905778 |
|              maml               |    1    | 46.771026 |
|           hf_T5_large           |    1    | 46.676691 |
|          hf_Longformer          |    1    | 39.859488 |
|           timm_nfnet            |    1    | 36.967188 |
|            moondream            |    1    | 36.523224 |
|           hf_Reformer           |    1    | 36.029543 |
|       speech_transformer        |    1    | 33.104973 |
|      torch_multimodal_clip      |    1    | 32.93901  |
|        phlippe_densenet         |    1    | 32.69792  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 30.064274 |
|  timm_vision_transformer_large  |    1    | 29.264727 |
|             demucs              |    1    | 28.784854 |
|          hf_GPT2_large          |    1    | 28.40904  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 28.295901 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 27.323895 |
|        timm_efficientnet        |    1    | 25.734029 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 25.248223 |
|        hf_distil_whisper        |    1    | 23.888354 |
|              hf_T5              |    1    | 23.645762 |
|             yolov3              |    1    | 22.578992 |
|         opacus_cifar10          |    1    | 21.50073  |
|              llama              |    1    | 20.986111 |
|      functorch_dp_cifar10       |    1    | 20.414051 |
|          hf_Bert_large          |    1    | 19.678667 |
|          timm_resnest           |    1    | 19.253065 |
|             hf_GPT2             |    1    | 19.016189 |
|       shufflenet_v2_x1_0        |    1    | 18.364766 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 18.107526 |
|     timm_vision_transformer     |    1    | 17.277328 |
|       mobilenet_v3_large        |    1    | 17.256032 |
|          BERT_pytorch           |    1    | 16.93074  |
|          fastNLP_Bert           |    1    | 16.620021 |
|             hf_Bart             |    1    | 16.247946 |
|           timm_regnet           |    1    | 16.18165  |
|           timm_vovnet           |    1    | 15.981745 |
|         pytorch_stargan         |   16    | 15.84908  |
|           hf_T5_base            |    1    | 15.464989 |
|            hf_Albert            |    1    | 14.457252 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 14.237593 |
|             hf_Bert             |    1    | 13.833935 |
|          squeezenet1_1          |    1    | 12.575902 |
|       Background_Matting        |    1    | 12.507939 |
|          hf_DistilBert          |    1    | 11.899531 |
|            resnet152            |    1    | 11.246682 |
|      doctr_reco_predictor       |    1    | 10.752941 |
|              vgg16              |    1    | 9.442189  |
|     pyhpc_isoneutral_mixing     |    1    | 8.966687  |
|               drq               |    1    | 8.740537  |
|          basic_gnn_gcn          |    1    | 8.489893  |
|          pytorch_unet           |    1    | 8.412606  |
|            resnet50             |    1    | 8.130059  |
|         resnext50_32x4d         |    1    | 8.073149  |
|             alexnet             |    1    | 8.016773  |
|          maml_omniglot          |    5    | 7.750731  |
|     functorch_maml_omniglot     |    1    | 7.750597  |
|     nvidia_deeprecommender      |    1    | 7.565936  |
|              dlrm               |    1    | 7.557279  |
|         basic_gnn_sage          |    1    |  7.24813  |
|          mobilenet_v2           |    1    | 7.196967  |
|            resnet18             |    1    |  7.18553  |
|           mnasnet1_0            |    1    | 6.946687  |
|     pyhpc_equation_of_state     |    1    | 6.615588  |
|        soft_actor_critic        |   256   | 6.489252  |
|        basic_gnn_edgecnn        |    1    | 6.272225  |
|         LearningToPaint         |    1    |  6.22719  |
|         phlippe_resnet          |    1    | 5.995195  |
|          basic_gnn_gin          |    1    | 5.530621  |
|          lennard_jones          |    1    | 4.683965  |
|              dcgan              |    1    | 4.612112  |
|           tts_angular           |    1    | 4.495216  |
|   mobilenet_v2_quantized_qat    |    1    | 0.191807  |
|     resnet50_quantized_qat      |    1    | 0.174251  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|       doctr_det_predictor       |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.98692  |
|             demucs              |    1    | 0.982405 |
|           hf_T5_large           |    1    | 0.98033  |
|           hf_T5_base            |    1    | 0.979179 |
|  detectron2_fasterrcnn_r_50_c4  |    1    |  0.9769  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.974556 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.970923 |
|       Background_Matting        |    1    | 0.970456 |
|          pytorch_unet           |    1    | 0.967826 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.964605 |
|              llama              |    1    | 0.96452  |
|    detectron2_fcos_r_50_fpn     |    1    | 0.962505 |
|           hf_BigBird            |    1    | 0.960355 |
|        basic_gnn_edgecnn        |    1    | 0.960164 |
|         LearningToPaint         |    1    | 0.95427  |
|     resnet50_quantized_qat      |    1    | 0.94513  |
|      doctr_reco_predictor       |    1    | 0.944111 |
|      torch_multimodal_clip      |    1    | 0.941528 |
|          basic_gnn_gcn          |    1    | 0.940315 |
|              hf_T5              |    1    | 0.934472 |
|         basic_gnn_sage          |    1    | 0.930791 |
|          basic_gnn_gin          |    1    | 0.930683 |
|        hf_distil_whisper        |    1    | 0.928629 |
|       speech_transformer        |    1    | 0.927148 |
|         pytorch_stargan         |   16    | 0.923746 |
|          fastNLP_Bert           |    1    | 0.922403 |
|             hf_Bert             |    1    | 0.91957  |
|          hf_GPT2_large          |    1    | 0.918819 |
|            hf_Albert            |    1    | 0.915938 |
|   mobilenet_v2_quantized_qat    |    1    | 0.914585 |
|          BERT_pytorch           |    1    | 0.910241 |
|          hf_DistilBert          |    1    | 0.909229 |
|             hf_GPT2             |    1    | 0.908424 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.90819  |
|          hf_Longformer          |    1    | 0.897553 |
|          hf_Bert_large          |    1    | 0.895822 |
|         opacus_cifar10          |    1    | 0.894837 |
|               drq               |    1    | 0.893453 |
|             hf_Bart             |    1    | 0.890358 |
|           tts_angular           |    1    | 0.890125 |
|        soft_actor_critic        |   256   | 0.887798 |
|          mobilenet_v2           |    1    | 0.875023 |
|           timm_nfnet            |    1    | 0.873196 |
|        timm_efficientnet        |    1    | 0.872116 |
|  timm_vision_transformer_large  |    1    | 0.869408 |
|          squeezenet1_1          |    1    | 0.867648 |
|            moondream            |    1    | 0.864693 |
|       mobilenet_v3_large        |    1    | 0.862018 |
|           mnasnet1_0            |    1    | 0.861807 |
|          lennard_jones          |    1    | 0.861111 |
|              dcgan              |    1    | 0.860643 |
|              vgg16              |    1    | 0.860399 |
|          maml_omniglot          |    5    | 0.859706 |
|     functorch_maml_omniglot     |    1    | 0.858766 |
|     timm_vision_transformer     |    1    | 0.856637 |
|         phlippe_resnet          |    1    | 0.84743  |
|       shufflenet_v2_x1_0        |    1    | 0.845052 |
|          timm_resnest           |    1    | 0.844205 |
|             alexnet             |    1    | 0.843366 |
|      functorch_dp_cifar10       |    1    | 0.839982 |
|     nvidia_deeprecommender      |    1    | 0.837071 |
|     pyhpc_equation_of_state     |    1    | 0.833613 |
|           hf_Reformer           |    1    | 0.831033 |
|         resnext50_32x4d         |    1    | 0.828364 |
|             yolov3              |    1    | 0.828231 |
|            resnet18             |    1    | 0.811124 |
|            resnet50             |    1    | 0.808672 |
|     pyhpc_isoneutral_mixing     |    1    | 0.80858  |
|           timm_vovnet           |    1    | 0.807799 |
|        phlippe_densenet         |    1    | 0.807017 |
|           densenet121           |    1    | 0.802644 |
|              maml               |    1    | 0.794512 |
|           timm_regnet           |    1    | 0.793914 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.763422 |
|            resnet152            |    1    | 0.759933 |
|        timm_efficientdet        |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|       doctr_det_predictor       |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|           hf_T5_base            |    1    | 9783.72824  |
|          hf_GPT2_large          |    1    | 3041.937894 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 2913.886533 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2901.175214 |
|            moondream            |    1    | 2439.478399 |
|           hf_T5_large           |    1    | 2180.167034 |
|        hf_distil_whisper        |    1    | 1982.765005 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1758.876977 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1740.536324 |
|          pytorch_unet           |    1    | 1502.767747 |
|       Background_Matting        |    1    | 1462.367276 |
|             demucs              |    1    | 1201.644335 |
|  timm_vision_transformer_large  |    1    | 958.298881  |
|    detectron2_fcos_r_50_fpn     |    1    | 706.783515  |
|           hf_BigBird            |    1    | 516.727672  |
|          hf_Longformer          |    1    |  488.92735  |
|      torch_multimodal_clip      |    1    | 454.275153  |
|          hf_Bert_large          |    1    | 439.726162  |
|             hf_Bart             |    1    | 240.054156  |
|              hf_T5              |    1    | 235.049702  |
|         pytorch_stargan         |   16    | 234.428494  |
|          fastNLP_Bert           |    1    | 220.559134  |
|            hf_Albert            |    1    | 192.839198  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 182.953337  |
|       speech_transformer        |    1    | 177.054787  |
|             hf_Bert             |    1    | 171.418507  |
|           hf_Reformer           |    1    | 169.117118  |
|             hf_GPT2             |    1    | 129.131413  |
|          hf_DistilBert          |    1    | 102.033127  |
|        basic_gnn_edgecnn        |    1    |  89.017607  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  86.347656  |
|             yolov3              |    1    |  70.00464   |
|          BERT_pytorch           |    1    |  50.131716  |
|              maml               |    1    |  48.38877   |
|              vgg16              |    1    |  40.345697  |
|     nvidia_deeprecommender      |    1    |  38.931343  |
|           timm_nfnet            |    1    |  35.706769  |
|           timm_regnet           |    1    |  35.496561  |
|           tts_angular           |    1    |  30.036513  |
|          basic_gnn_gcn          |    1    |  29.201184  |
|            resnet152            |    1    |  28.776414  |
|     timm_vision_transformer     |    1    |  18.553444  |
|         basic_gnn_sage          |    1    |  15.902727  |
|          basic_gnn_gin          |    1    |  15.243177  |
|         resnext50_32x4d         |    1    |  12.949256  |
|           densenet121           |    1    |  12.747016  |
|           timm_vovnet           |    1    |  12.151468  |
|             alexnet             |    1    |  11.693379  |
|              llama              |    1    |  11.153778  |
|            resnet50             |    1    |  10.564609  |
|        timm_efficientnet        |    1    |   9.2094    |
|     resnet50_quantized_qat      |    1    |  7.599557   |
|          timm_resnest           |    1    |  6.178243   |
|      doctr_reco_predictor       |    1    |  5.279622   |
|   mobilenet_v2_quantized_qat    |    1    |  4.796093   |
|            resnet18             |    1    |  3.762343   |
|       mobilenet_v3_large        |    1    |  3.753954   |
|           mnasnet1_0            |    1    |  3.125732   |
|          mobilenet_v2           |    1    |  3.054001   |
|       shufflenet_v2_x1_0        |    1    |  2.920329   |
|         LearningToPaint         |    1    |  2.443249   |
|          squeezenet1_1          |    1    |  1.895159   |
|      functorch_dp_cifar10       |    1    |  1.784321   |
|         opacus_cifar10          |    1    |  1.774865   |
|        phlippe_densenet         |    1    |  1.723219   |
|               drq               |    1    |   0.91958   |
|         phlippe_resnet          |    1    |  0.635289   |
|          maml_omniglot          |    5    |  0.601353   |
|        soft_actor_critic        |   256   |  0.584675   |
|              dcgan              |    1    |  0.521611   |
|     functorch_maml_omniglot     |    1    |  0.516523   |
|              dlrm               |    1    |  0.480208   |
|     pyhpc_isoneutral_mixing     |    1    |  0.029168   |
|     pyhpc_equation_of_state     |    1    |  0.021944   |
|          lennard_jones          |    1    |  0.017856   |
|        timm_efficientdet        |    0    |     0.0     |
|              moco               |    0    |     0.0     |
|       doctr_det_predictor       |    0    |     0.0     |
|         DALLE2_pytorch          |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 3.407654 |
|     PegasusForConditionalGeneration     | 1  | 2.803089 |
|             XGLMForCausalLM             | 1  | 2.675791 |
|     MobileBertForQuestionAnswering      | 1  | 2.668054 |
|          DistilBertForMaskedLM          | 1  | 2.630945 |
|     M2M100ForConditionalGeneration      | 1  | 2.581039 |
|           PegasusForCausalLM            | 1  | 2.576536 |
|          BlenderbotForCausalLM          | 1  | 2.531334 |
|     DistilBertForQuestionAnswering      | 1  | 2.500604 |
|            YituTechConvBert             | 1  | 2.456637 |
|       BlenderbotSmallForCausalLM        | 1  | 2.409704 |
|       MT5ForConditionalGeneration       | 1  | 2.348492 |
|         Speech2Text2ForCausalLM         | 1  | 2.324935 |
| BlenderbotSmallForConditionalGeneration | 1  | 2.285014 |
|            XLNetLMHeadModel             | 1  | 1.879495 |
|           DebertaForMaskedLM            | 1  | 1.868532 |
|       DebertaForQuestionAnswering       | 1  | 1.848949 |
|           LayoutLMForMaskedLM           | 1  | 1.824124 |
|           ElectraForCausalLM            | 1  | 1.812921 |
|             BertForMaskedLM             | 1  | 1.810476 |
|           RobertaForCausalLM            | 1  | 1.799144 |
|                CamemBert                | 1  | 1.779235 |
|    LayoutLMForSequenceClassification    | 1  | 1.734932 |
|            TrOCRForCausalLM             | 1  | 1.719084 |
|       RobertaForQuestionAnswering       | 1  | 1.694367 |
|        BertForQuestionAnswering         | 1  | 1.690076 |
|         MegatronBertForCausalLM         | 1  | 1.655337 |
|    MegatronBertForQuestionAnswering     | 1  | 1.654729 |
|       ElectraForQuestionAnswering       | 1  | 1.62906  |
|               DistillGPT2               | 1  | 1.542819 |
|          DebertaV2ForMaskedLM           | 1  | 1.498284 |
|             OPTForCausalLM              | 1  | 1.458512 |
|      GPT2ForSequenceClassification      | 1  | 1.447885 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.447355 |
|             BartForCausalLM             | 1  | 1.373231 |
|      BartForConditionalGeneration       | 1  | 1.341487 |
|            PLBartForCausalLM            | 1  | 1.330423 |
|     PLBartForConditionalGeneration      | 1  | 1.32426  |
|            MBartForCausalLM             | 1  | 1.302535 |
|      MBartForConditionalGeneration      | 1  | 1.281657 |
|               GoogleFnet                | 1  | 1.224514 |
|       T5ForConditionalGeneration        | 1  | 1.135165 |
|                 T5Small                 | 1  | 1.131078 |
|            AlbertForMaskedLM            | 1  | 1.123514 |
|       AlbertForQuestionAnswering        | 1  | 1.116248 |
|          AllenaiLongformerBase          | 1  | 0.814742 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |  fail_accuracy   |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 47.492696 |
|     MobileBertForQuestionAnswering      | 1  | 36.182355 |
|          MobileBertForMaskedLM          | 1  | 36.112172 |
|     PegasusForConditionalGeneration     | 1  | 33.667338 |
|     M2M100ForConditionalGeneration      | 1  | 32.450932 |
|      MBartForConditionalGeneration      | 1  | 31.266022 |
|          BlenderbotForCausalLM          | 1  | 31.057471 |
|             XGLMForCausalLM             | 1  | 28.37595  |
|       MT5ForConditionalGeneration       | 1  | 27.950137 |
|            XLNetLMHeadModel             | 1  | 27.111493 |
|         MegatronBertForCausalLM         | 1  | 25.544746 |
|                 T5Small                 | 1  | 24.916875 |
|       T5ForConditionalGeneration        | 1  | 24.878571 |
|    MegatronBertForQuestionAnswering     | 1  | 24.57836  |
|      DebertaV2ForQuestionAnswering      | 1  | 23.994927 |
|          DebertaV2ForMaskedLM           | 1  | 23.512694 |
| BlenderbotSmallForConditionalGeneration | 1  | 22.413356 |
|           PegasusForCausalLM            | 1  | 21.666225 |
|      BartForConditionalGeneration       | 1  | 21.28709  |
|            YituTechConvBert             | 1  | 20.927318 |
|     PLBartForConditionalGeneration      | 1  | 20.388096 |
|            MBartForCausalLM             | 1  | 19.692146 |
|      GPT2ForSequenceClassification      | 1  | 19.689505 |
|             OPTForCausalLM              | 1  | 19.205498 |
|               DistillGPT2               | 1  | 17.330151 |
|       DebertaForQuestionAnswering       | 1  | 16.547476 |
|           DebertaForMaskedLM            | 1  | 16.45255  |
|       AlbertForQuestionAnswering        | 1  | 16.437589 |
|            AlbertForMaskedLM            | 1  | 16.286948 |
|            TrOCRForCausalLM             | 1  | 16.237993 |
|       RobertaForQuestionAnswering       | 1  | 16.029546 |
|                CamemBert                | 1  | 15.999096 |
|           RobertaForCausalLM            | 1  | 15.914386 |
|    LayoutLMForSequenceClassification    | 1  | 15.741634 |
|           LayoutLMForMaskedLM           | 1  | 15.212241 |
|           ElectraForCausalLM            | 1  | 15.173034 |
|       ElectraForQuestionAnswering       | 1  | 15.136537 |
|        BertForQuestionAnswering         | 1  | 14.909738 |
|             BertForMaskedLM             | 1  | 14.802126 |
|       BlenderbotSmallForCausalLM        | 1  | 14.130658 |
|         Speech2Text2ForCausalLM         | 1  | 13.998887 |
|               GoogleFnet                | 1  | 13.589718 |
|     DistilBertForQuestionAnswering      | 1  | 13.301127 |
|          DistilBertForMaskedLM          | 1  | 13.277916 |
|            PLBartForCausalLM            | 1  | 13.252254 |
|             BartForCausalLM             | 1  | 12.688647 |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.977731 |
|            XLNetLMHeadModel             | 1  | 0.973768 |
|               DistillGPT2               | 1  | 0.95223  |
|             BartForCausalLM             | 1  | 0.946806 |
|            PLBartForCausalLM            | 1  | 0.944955 |
|            MBartForCausalLM             | 1  | 0.943901 |
|    MegatronBertForQuestionAnswering     | 1  | 0.943112 |
|      GPT2ForSequenceClassification      | 1  | 0.942699 |
|       RobertaForQuestionAnswering       | 1  | 0.939819 |
|       MT5ForConditionalGeneration       | 1  | 0.935334 |
|           LayoutLMForMaskedLM           | 1  | 0.933962 |
|          BlenderbotForCausalLM          | 1  | 0.93262  |
|                CamemBert                | 1  | 0.931771 |
|                 T5Small                 | 1  | 0.929771 |
|       T5ForConditionalGeneration        | 1  | 0.929086 |
|            YituTechConvBert             | 1  | 0.929048 |
|           RobertaForCausalLM            | 1  | 0.926604 |
|     PLBartForConditionalGeneration      | 1  | 0.925991 |
|             BertForMaskedLM             | 1  | 0.925342 |
|            TrOCRForCausalLM             | 1  | 0.925325 |
|           DebertaForMaskedLM            | 1  | 0.922888 |
|      BartForConditionalGeneration       | 1  | 0.922409 |
|      MBartForConditionalGeneration      | 1  | 0.921198 |
|     M2M100ForConditionalGeneration      | 1  | 0.919944 |
|             XGLMForCausalLM             | 1  | 0.918331 |
|    LayoutLMForSequenceClassification    | 1  | 0.916369 |
|        BertForQuestionAnswering         | 1  | 0.916026 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.915334 |
|          AllenaiLongformerBase          | 1  | 0.908998 |
|          DebertaV2ForMaskedLM           | 1  | 0.907013 |
|       BlenderbotSmallForCausalLM        | 1  | 0.904336 |
|          DistilBertForMaskedLM          | 1  | 0.902414 |
|       DebertaForQuestionAnswering       | 1  | 0.902287 |
|           PegasusForCausalLM            | 1  | 0.895507 |
|           ElectraForCausalLM            | 1  | 0.891247 |
|         MegatronBertForCausalLM         | 1  | 0.888518 |
|     PegasusForConditionalGeneration     | 1  | 0.88561  |
|     DistilBertForQuestionAnswering      | 1  | 0.875987 |
|         Speech2Text2ForCausalLM         | 1  | 0.864682 |
|       ElectraForQuestionAnswering       | 1  | 0.862312 |
|               GoogleFnet                | 1  | 0.855612 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.848541 |
|          MobileBertForMaskedLM          | 1  | 0.780104 |
|     MobileBertForQuestionAnswering      | 1  | 0.758844 |
|            AlbertForMaskedLM            | 1  | 0.687663 |
|       AlbertForQuestionAnswering        | 1  | 0.667702 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+-------------+
|                  name                   | bs |  inductor   |
+-----------------------------------------+----+-------------+
|            AlbertForMaskedLM            | 1  | 4137.023262 |
|       AlbertForQuestionAnswering        | 1  | 4127.645545 |
|             OPTForCausalLM              | 1  | 2140.038838 |
|      MBartForConditionalGeneration      | 1  | 1783.10622  |
|      BartForConditionalGeneration       | 1  | 1409.012679 |
|          DebertaV2ForMaskedLM           | 1  | 1332.31877  |
|          AllenaiLongformerBase          | 1  | 1104.158711 |
|            XLNetLMHeadModel             | 1  | 1042.866665 |
|      DebertaV2ForQuestionAnswering      | 1  | 1033.308426 |
|            MBartForCausalLM             | 1  | 961.716202  |
|                 T5Small                 | 1  | 806.944309  |
|       T5ForConditionalGeneration        | 1  | 806.674494  |
|          BlenderbotForCausalLM          | 1  | 750.152307  |
|     PLBartForConditionalGeneration      | 1  |  650.15265  |
|             BartForCausalLM             | 1  | 608.676922  |
|         MegatronBertForCausalLM         | 1  | 505.911722  |
|      GPT2ForSequenceClassification      | 1  | 467.984981  |
|    MegatronBertForQuestionAnswering     | 1  | 463.674145  |
|               GoogleFnet                | 1  | 449.645772  |
|            PLBartForCausalLM            | 1  | 374.388284  |
|             XGLMForCausalLM             | 1  | 240.315914  |
|           DebertaForMaskedLM            | 1  |  212.57168  |
|     M2M100ForConditionalGeneration      | 1  | 210.199459  |
|           RobertaForCausalLM            | 1  |  204.85136  |
|            YituTechConvBert             | 1  | 194.781048  |
|                CamemBert                | 1  | 179.901983  |
|     PegasusForConditionalGeneration     | 1  | 176.826632  |
|           LayoutLMForMaskedLM           | 1  | 176.289278  |
|             BertForMaskedLM             | 1  | 175.285384  |
|            TrOCRForCausalLM             | 1  | 172.135147  |
|               DistillGPT2               | 1  | 163.844628  |
|       DebertaForQuestionAnswering       | 1  | 145.904757  |
|    LayoutLMForSequenceClassification    | 1  |  140.42053  |
|        BertForQuestionAnswering         | 1  | 140.065959  |
|       RobertaForQuestionAnswering       | 1  | 139.702369  |
|       MT5ForConditionalGeneration       | 1  | 109.964046  |
|           PegasusForCausalLM            | 1  |  87.294164  |
| BlenderbotSmallForConditionalGeneration | 1  |  52.32288   |
|           ElectraForCausalLM            | 1  |  49.945237  |
|          DistilBertForMaskedLM          | 1  |  29.925289  |
|          MobileBertForMaskedLM          | 1  |  29.488203  |
|       ElectraForQuestionAnswering       | 1  |  29.090274  |
|       BlenderbotSmallForCausalLM        | 1  |  27.969184  |
|     DistilBertForQuestionAnswering      | 1  |  18.170235  |
|     MobileBertForQuestionAnswering      | 1  |  16.600136  |
|         Speech2Text2ForCausalLM         | 1  |  5.405227   |
+-----------------------------------------+----+-------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|           resnest101e           | 1  | 5.382345 |
|          pnasnet5large          | 1  | 5.356115 |
|        ese_vovnet19b_dw         | 1  | 5.281948 |
|         mobilenetv2_100         | 1  | 4.844529 |
|          cspdarknet53           | 1  | 4.75785  |
|     swsl_resnext101_32x16d      | 1  | 4.595599 |
|           fbnetc_100            | 1  | 4.592137 |
|           mnasnet_100           | 1  | 4.541181 |
|          spnasnet_100           | 1  | 4.479129 |
|          inception_v3           | 1  | 4.456737 |
|          botnet26t_256          | 1  | 4.423364 |
|           res2next50            | 1  | 4.400259 |
|             dla102              | 1  | 4.357039 |
|       gluon_inception_v3        | 1  | 4.34969  |
|        adv_inception_v3         | 1  | 4.283916 |
|           selecsls42b           | 1  | 4.249402 |
|            gernet_l             | 1  | 4.097928 |
|      mobilenetv3_large_100      | 1  | 3.935802 |
|           dm_nfnet_f0           | 1  | 3.907828 |
|        res2net50_14w_8s         | 1  | 3.879919 |
|            fbnetv3_b            | 1  | 3.835599 |
|        res2net101_26w_4s        | 1  | 3.777076 |
|            hrnet_w18            | 1  | 3.745326 |
|            nfnet_l0             | 1  | 3.743917 |
|           regnety_002           | 1  | 3.562605 |
|            lcnet_050            | 1  | 3.447718 |
|            repvgg_a2            | 1  | 3.440078 |
|       eca_botnext26ts_256       | 1  | 3.318491 |
|          ghostnet_100           | 1  | 3.229233 |
|         visformer_small         | 1  | 3.051099 |
|         poolformer_m36          | 1  | 2.89116  |
|             dpn107              | 1  | 2.889853 |
|        eca_halonext26ts         | 1  | 2.722109 |
|           rexnet_100            | 1  | 2.641254 |
|           mobilevit_s           | 1  | 2.619138 |
|            tinynet_a            | 1  | 2.513059 |
|            levit_128            | 1  | 2.492718 |
|       tf_efficientnet_b0        | 1  | 2.344159 |
|           tf_mixnet_l           | 1  | 2.248406 |
|            mixnet_l             | 1  | 2.140655 |
|        twins_pcpvt_base         | 1  | 2.033565 |
|          gmixer_24_224          | 1  | 1.877029 |
|           volo_d1_224           | 1  | 1.852166 |
|          gmlp_s16_224           | 1  | 1.780351 |
|        convmixer_768_32         | 1  | 1.770857 |
|            pit_b_224            | 1  | 1.719256 |
|      beit_base_patch16_224      | 1  | 1.707349 |
|  swin_base_patch4_window7_224   | 1  | 1.62261  |
|      vit_base_patch16_224       | 1  | 1.595081 |
|      xcit_large_24_p8_224       | 1  | 1.561849 |
|          convnext_base          | 1  | 1.539251 |
|           convit_base           | 1  | 1.519424 |
|          cait_m36_384           | 1  | 1.435842 |
|         crossvit_9_240          | 1  | 1.434054 |
|        tnt_s_patch16_224        | 1  | 1.379968 |
| deit_base_distilled_patch16_224 | 1  | 1.343949 |
|          jx_nest_base           | 1  | 1.334892 |
|        sebotnet33ts_256         | 1  | 1.319965 |
|          resmlp_12_224          | 1  | 1.185635 |
|          mixer_b16_224          | 1  | 1.166584 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|       eca_botnext26ts_256       | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          cait_m36_384           | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
|        eca_halonext26ts         | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|          botnet26t_256          | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|        convmixer_768_32         | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|        sebotnet33ts_256         | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|         coat_lite_mini          | 1  |  fail_to_run  |
|           resnest101e           | 1  | fail_accuracy |
|            levit_128            | 1  | fail_accuracy |
|          ghostnet_100           | 1  | fail_accuracy |
|          pnasnet5large          | 1  | fail_accuracy |
|        res2net101_26w_4s        | 1  | fail_accuracy |
|        res2net50_14w_8s         | 1  | fail_accuracy |
|             dla102              | 1  | fail_accuracy |
|     swsl_resnext101_32x16d      | 1  | fail_accuracy |
|           rexnet_100            | 1  | fail_accuracy |
|           selecsls42b           | 1  | fail_accuracy |
|           res2next50            | 1  | fail_accuracy |
|            hrnet_w18            | 1  | fail_accuracy |
|           volo_d1_224           | 1  | fail_accuracy |
|         visformer_small         | 1  | fail_accuracy |
|      xcit_large_24_p8_224       | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|          pnasnet5large          | 1  | 62.472559 |
|  swin_base_patch4_window7_224   | 1  | 61.716097 |
|           tf_mixnet_l           | 1  | 52.978734 |
|             dpn107              | 1  |  49.317   |
|          jx_nest_base           | 1  | 44.397741 |
|           rexnet_100            | 1  | 43.399844 |
|        res2net50_14w_8s         | 1  | 43.098643 |
|         crossvit_9_240          | 1  |  41.6744  |
|          ghostnet_100           | 1  | 40.377657 |
|            mixnet_l             | 1  | 39.18258  |
|        sebotnet33ts_256         | 1  | 38.890151 |
|            levit_128            | 1  | 38.713414 |
|         poolformer_m36          | 1  | 38.506178 |
|      xcit_large_24_p8_224       | 1  | 38.472027 |
|        tnt_s_patch16_224        | 1  | 37.872555 |
|        twins_pcpvt_base         | 1  | 37.63125  |
|           dm_nfnet_f0           | 1  | 37.252363 |
|        eca_halonext26ts         | 1  | 37.242547 |
|           mobilevit_s           | 1  | 36.380924 |
|          cait_m36_384           | 1  | 35.078352 |
|         visformer_small         | 1  | 34.571322 |
|           volo_d1_224           | 1  | 32.130895 |
|           resnest101e           | 1  | 31.727342 |
|       eca_botnext26ts_256       | 1  | 31.466678 |
|       tf_efficientnet_b0        | 1  | 31.320356 |
|            hrnet_w18            | 1  | 30.652824 |
|        res2net101_26w_4s        | 1  | 29.731423 |
|            nfnet_l0             | 1  | 28.730982 |
|          convnext_base          | 1  | 28.119997 |
|       gluon_inception_v3        | 1  | 27.198337 |
|        adv_inception_v3         | 1  | 27.195058 |
|          inception_v3           | 1  | 27.161974 |
|            tinynet_a            | 1  | 26.402645 |
|           res2next50            | 1  | 25.645254 |
|           convit_base           | 1  | 25.234083 |
|            pit_b_224            | 1  | 23.35073  |
|          botnet26t_256          | 1  | 22.726482 |
|             dla102              | 1  | 20.74277  |
|          cspdarknet53           | 1  | 20.705247 |
|            fbnetv3_b            | 1  | 20.47764  |
| deit_base_distilled_patch16_224 | 1  | 19.36066  |
|          gmixer_24_224          | 1  | 18.104369 |
|      vit_base_patch16_224       | 1  | 17.555762 |
|      mobilenetv3_large_100      | 1  | 17.538681 |
|        ese_vovnet19b_dw         | 1  | 17.43489  |
|          gmlp_s16_224           | 1  | 16.672756 |
|           regnety_002           | 1  | 14.140759 |
|      beit_base_patch16_224      | 1  | 13.877075 |
|          resmlp_12_224          | 1  | 13.797674 |
|            repvgg_a2            | 1  | 12.741665 |
|          mixer_b16_224          | 1  | 12.437604 |
|        convmixer_768_32         | 1  | 12.316906 |
|           selecsls42b           | 1  | 12.055945 |
|            lcnet_050            | 1  | 10.335146 |
|     swsl_resnext101_32x16d      | 1  | 9.351945  |
|           fbnetc_100            | 1  | 7.755633  |
|          spnasnet_100           | 1  | 7.717313  |
|            gernet_l             | 1  | 7.660946  |
|         mobilenetv2_100         | 1  | 7.592802  |
|           mnasnet_100           | 1  | 7.362309  |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        convmixer_768_32         | 1  |  0.9128  |
|            nfnet_l0             | 1  | 0.912508 |
|          convnext_base          | 1  | 0.901706 |
|      vit_base_patch16_224       | 1  | 0.898571 |
|      beit_base_patch16_224      | 1  | 0.89755  |
|          pnasnet5large          | 1  | 0.897462 |
|           dm_nfnet_f0           | 1  | 0.895012 |
|          cait_m36_384           | 1  | 0.894497 |
|          resmlp_12_224          | 1  | 0.892803 |
| deit_base_distilled_patch16_224 | 1  | 0.891494 |
|        ese_vovnet19b_dw         | 1  | 0.889668 |
|  swin_base_patch4_window7_224   | 1  | 0.886923 |
|         mobilenetv2_100         | 1  | 0.886379 |
|           convit_base           | 1  | 0.886112 |
|           volo_d1_224           | 1  | 0.881365 |
|         poolformer_m36          | 1  | 0.877681 |
|          mixer_b16_224          | 1  | 0.877148 |
|         visformer_small         | 1  | 0.876765 |
|           mnasnet_100           | 1  | 0.874668 |
|      mobilenetv3_large_100      | 1  | 0.873412 |
|            pit_b_224            | 1  | 0.873178 |
|          spnasnet_100           | 1  | 0.872665 |
|          gmlp_s16_224           | 1  | 0.871714 |
|           fbnetc_100            | 1  | 0.871273 |
|          gmixer_24_224          | 1  | 0.868596 |
|        twins_pcpvt_base         | 1  | 0.867689 |
|            lcnet_050            | 1  | 0.867468 |
|        eca_halonext26ts         | 1  | 0.866935 |
|            fbnetv3_b            | 1  | 0.865371 |
|       eca_botnext26ts_256       | 1  | 0.864205 |
|            tinynet_a            | 1  | 0.863406 |
|       tf_efficientnet_b0        | 1  | 0.863227 |
|          botnet26t_256          | 1  | 0.862687 |
|           mobilevit_s           | 1  | 0.862439 |
|           rexnet_100            | 1  | 0.855583 |
|      xcit_large_24_p8_224       | 1  | 0.850456 |
|          jx_nest_base           | 1  | 0.850023 |
|          ghostnet_100           | 1  | 0.847435 |
|           regnety_002           | 1  | 0.845677 |
|           tf_mixnet_l           | 1  | 0.842164 |
|        tnt_s_patch16_224        | 1  | 0.840789 |
|            mixnet_l             | 1  | 0.837972 |
|        sebotnet33ts_256         | 1  | 0.835414 |
|         crossvit_9_240          | 1  | 0.830796 |
|            levit_128            | 1  | 0.826943 |
|             dpn107              | 1  | 0.826093 |
|          cspdarknet53           | 1  | 0.824691 |
|           res2next50            | 1  | 0.821295 |
|        res2net50_14w_8s         | 1  | 0.811405 |
|             dla102              | 1  | 0.811004 |
|       gluon_inception_v3        | 1  | 0.80872  |
|          inception_v3           | 1  | 0.808312 |
|        adv_inception_v3         | 1  | 0.807569 |
|            hrnet_w18            | 1  | 0.805046 |
|           resnest101e           | 1  | 0.802028 |
|           selecsls42b           | 1  | 0.795267 |
|            gernet_l             | 1  | 0.787455 |
|            repvgg_a2            | 1  | 0.781929 |
|        res2net101_26w_4s        | 1  | 0.774035 |
|     swsl_resnext101_32x16d      | 1  | 0.727484 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 1638.508807 |
|      xcit_large_24_p8_224       | 1  | 448.813181  |
|          pnasnet5large          | 1  |  123.87366  |
|          convnext_base          | 1  | 104.141579  |
|        convmixer_768_32         | 1  |   100.004   |
|          jx_nest_base           | 1  |  99.701369  |
|           convit_base           | 1  |  96.396719  |
|     swsl_resnext101_32x16d      | 1  |  93.649239  |
| deit_base_distilled_patch16_224 | 1  |  84.967815  |
|  swin_base_patch4_window7_224   | 1  |  83.227347  |
|      beit_base_patch16_224      | 1  |  76.197216  |
|          mixer_b16_224          | 1  |  73.140539  |
|      vit_base_patch16_224       | 1  |  72.251536  |
|             dpn107              | 1  |  69.542809  |
|            pit_b_224            | 1  |  65.117086  |
|         poolformer_m36          | 1  |  51.848099  |
|           dm_nfnet_f0           | 1  |   50.2731   |
|        tnt_s_patch16_224        | 1  |  49.727227  |
|        twins_pcpvt_base         | 1  |  43.057363  |
|        sebotnet33ts_256         | 1  |  41.790681  |
|           volo_d1_224           | 1  |  40.487297  |
|            nfnet_l0             | 1  |  34.129274  |
|           resnest101e           | 1  |  30.860728  |
|        res2net101_26w_4s        | 1  |  27.712088  |
|          gmlp_s16_224           | 1  |  27.092361  |
|          gmixer_24_224          | 1  |  25.278317  |
|          resmlp_12_224          | 1  |  22.453373  |
|           mobilevit_s           | 1  |  21.815094  |
|         visformer_small         | 1  |  21.386698  |
|            hrnet_w18            | 1  |  21.300143  |
|        adv_inception_v3         | 1  |  20.858975  |
|          inception_v3           | 1  |  20.655034  |
|       gluon_inception_v3        | 1  |  20.585864  |
|             dla102              | 1  |  18.622804  |
|        eca_halonext26ts         | 1  |  18.498803  |
|          cspdarknet53           | 1  |  17.20218   |
|           tf_mixnet_l           | 1  |  16.924235  |
|        res2net50_14w_8s         | 1  |  16.709336  |
|            mixnet_l             | 1  |  16.577223  |
|       eca_botnext26ts_256       | 1  |   14.7606   |
|           res2next50            | 1  |  14.747987  |
|         crossvit_9_240          | 1  |  14.641245  |
|            repvgg_a2            | 1  |  13.102081  |
|            gernet_l             | 1  |  11.615114  |
|          botnet26t_256          | 1  |  11.056721  |
|           selecsls42b           | 1  |  10.418943  |
|       tf_efficientnet_b0        | 1  |  9.669172   |
|        ese_vovnet19b_dw         | 1  |   9.22758   |
|           rexnet_100            | 1  |   8.48563   |
|            tinynet_a            | 1  |  8.139692   |
|            fbnetv3_b            | 1  |  7.962501   |
|            levit_128            | 1  |  5.893398   |
|          ghostnet_100           | 1  |  4.984154   |
|           fbnetc_100            | 1  |  3.791146   |
|      mobilenetv3_large_100      | 1  |  3.735017   |
|          spnasnet_100           | 1  |  3.489905   |
|           mnasnet_100           | 1  |  3.103795   |
|         mobilenetv2_100         | 1  |  3.051508   |
|           regnety_002           | 1  |  3.019299   |
|            lcnet_050            | 1  |  1.385196   |
+---------------------------------+----+-------------+

@WeizhuoZhang-intel
Copy link
Contributor

[amp] Performance Dashboard for amp precision -- Single-Socket Multi-threads (2024-04-27 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 7478b7f1cac9686f00edf3db4667cf86d2421531
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c7i.metal-24xl
CPU Model Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory 192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS Ubuntu 22.04.3 LTS
Kernel 6.2.0-1017-aws
Microcode 0x2b0004d0
GCC gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.8.18
OpenSSL OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000" 

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 78%, 62/79 | 100%, 46/46 | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.80x    |    2.02x    |    2.33x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   21.10    |    24.01    |    31.04    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.91x    |    0.97x    |    0.98x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 11.846421 |
|          timm_resnest           |   32    | 5.145247  |
|            resnet50             |   32    | 4.804628  |
|          squeezenet1_1          |   16    | 4.506084  |
|         resnext50_32x4d         |    8    | 4.330565  |
|         phlippe_resnet          |   128   | 4.325408  |
|          mobilenet_v2           |   16    |  4.11363  |
|            resnet18             |    8    | 3.981659  |
|           mnasnet1_0            |   32    | 3.938204  |
|              vgg16              |    4    | 3.929082  |
|            resnet152            |   32    | 3.660472  |
|             yolov3              |    8    | 3.354989  |
|           timm_vovnet           |   32    | 3.337508  |
|       mobilenet_v3_large        |   32    | 3.322968  |
|             alexnet             |   128   | 3.203805  |
|       shufflenet_v2_x1_0        |   64    | 2.887261  |
|          maml_omniglot          |    5    |  2.63683  |
|           timm_regnet           |   32    | 2.560177  |
|        phlippe_densenet         |   128   | 2.534758  |
|        soft_actor_critic        |   256   | 2.503665  |
|        timm_efficientnet        |   64    | 2.467808  |
|             hf_GPT2             |    1    | 2.377862  |
|          lennard_jones          |  1000   | 2.277851  |
|              llama              |   32    | 2.221798  |
|           timm_nfnet            |   128   | 2.205328  |
|           densenet121           |   64    | 2.181728  |
|             hf_Bert             |    1    | 2.157282  |
|          hf_DistilBert          |    1    | 2.053225  |
|          pytorch_unet           |    1    |  2.04605  |
|               drq               |    1    | 2.001549  |
|     functorch_maml_omniglot     |    1    | 1.992311  |
|          hf_Bert_large          |    1    | 1.973619  |
|         LearningToPaint         |   96    | 1.916444  |
|           hf_T5_base            |    1    | 1.915616  |
|          BERT_pytorch           |    2    | 1.897983  |
|              dcgan              |   256   |  1.87616  |
|             hf_Bart             |    1    | 1.872421  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.869319  |
|          fastNLP_Bert           |    1    | 1.787624  |
|      doctr_reco_predictor       |    1    | 1.786283  |
|              hf_T5              |    1    | 1.733535  |
|            moondream            |    1    | 1.673536  |
|       Background_Matting        |    1    | 1.628582  |
|        basic_gnn_edgecnn        |    1    | 1.627669  |
|            hf_Albert            |    1    | 1.601856  |
|           hf_T5_large           |    1    | 1.574916  |
|          hf_GPT2_large          |    1    | 1.556307  |
|     timm_vision_transformer     |   32    | 1.489057  |
|        hf_distil_whisper        |    1    | 1.423693  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.415926  |
|          hf_Longformer          |    1    | 1.350847  |
|         pytorch_stargan         |   16    | 1.308187  |
|          basic_gnn_gin          |    1    | 1.302475  |
|       speech_transformer        |    1    | 1.268959  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.256127  |
|         basic_gnn_sage          |    1    | 1.233769  |
|           hf_Reformer           |    1    | 1.227484  |
| detectron2_fasterrcnn_r_101_c4  |    1    |  1.20617  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.153735  |
|     nvidia_deeprecommender      |   256   | 1.146289  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.091754  |
|          basic_gnn_gcn          |    1    | 1.074517  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.071867  |
|  timm_vision_transformer_large  |   32    | 1.071677  |
|              dlrm               |  2048   | 1.066276  |
|             demucs              |    1    | 1.052086  |
|           hf_BigBird            |    1    | 1.048293  |
|   mobilenet_v2_quantized_qat    |   96    | 1.004899  |
|     resnet50_quantized_qat      |   32    | 0.986927  |
|           tts_angular           |   64    | 0.986152  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.853854  |
|      torch_multimodal_clip      |   32    | 0.783132  |
|              maml               |    1    | 0.744462  |
|         opacus_cifar10          |   64    | 0.473324  |
|      functorch_dp_cifar10       |   64    | 0.454508  |
|         DALLE2_pytorch          |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|       doctr_det_predictor       |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|              dcgan              |    4    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|           hf_T5_base            |    4    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|             demucs              |    1    |        pass        |
|              dlrm               |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|               drq               |    1    |        pass        |
|             yolov3              |    4    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|            hf_Albert            |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|       speech_transformer        |    1    |        pass        |
|        hf_distil_whisper        |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|            moondream            |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|              llama              |    4    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
|         vision_maskrcnn         |    1    |    fail_to_run     |
|           Super_SloMo           |    4    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    4    |    fail_to_run     |
| detectron2_fasterrcnn_r_101_fpn |    4    |    fail_to_run     |
|           mnasnet1_0            |    4    |   fail_accuracy    |
|          mobilenet_v2           |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_c4  |    4    |   fail_accuracy    |
|           densenet121           |    4    |   fail_accuracy    |
|      doctr_reco_predictor       |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    4    |   fail_accuracy    |
|       shufflenet_v2_x1_0        |    4    |   fail_accuracy    |
|         resnext50_32x4d         |    4    |   fail_accuracy    |
|            resnet50             |    4    |   fail_accuracy    |
|            resnet18             |    4    |   fail_accuracy    |
|            resnet152            |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    4    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    4    |   fail_accuracy    |
|       doctr_det_predictor       |    0    | eager_fail_to_run  |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           densenet121           |   64    | 79.774921 |
|           hf_BigBird            |    1    | 67.90861  |
|    detectron2_fcos_r_50_fpn     |    1    | 59.22134  |
|           hf_T5_large           |    1    | 52.935066 |
|              maml               |    1    | 49.051428 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 44.113319 |
|           timm_nfnet            |   128   | 42.849979 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 40.805285 |
|          hf_Longformer          |    1    | 40.654636 |
|            moondream            |    1    | 40.619983 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 38.489208 |
|      torch_multimodal_clip      |   32    | 37.982547 |
|           hf_Reformer           |    1    | 37.682739 |
|  timm_vision_transformer_large  |   32    | 37.066228 |
|          hf_GPT2_large          |    1    | 36.703785 |
|           hf_T5_base            |    1    | 35.301497 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 35.289454 |
|        phlippe_densenet         |   128   | 34.730776 |
|       speech_transformer        |    1    | 34.702835 |
|             demucs              |    1    | 31.410985 |
|        timm_efficientnet        |   64    | 29.016155 |
|        hf_distil_whisper        |    1    | 28.771464 |
|             yolov3              |    8    | 25.761111 |
|              hf_T5              |    1    | 24.87305  |
|         opacus_cifar10          |   64    | 22.966139 |
|      functorch_dp_cifar10       |   64    | 22.01646  |
|     pyhpc_isoneutral_mixing     | 1048576 | 21.850568 |
|          timm_resnest           |   32    | 21.42921  |
|          hf_Bert_large          |    1    | 21.408265 |
|              llama              |   32    | 21.316054 |
|             hf_GPT2             |    1    | 20.458066 |
|          BERT_pytorch           |    2    | 20.31987  |
|       shufflenet_v2_x1_0        |   64    | 20.096339 |
|       Background_Matting        |    1    | 19.52593  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 18.894171 |
|       mobilenet_v3_large        |   32    | 18.743042 |
|     timm_vision_transformer     |   32    | 18.549875 |
|           timm_regnet           |   32    | 18.00525  |
|          pytorch_unet           |    1    | 17.704531 |
|           timm_vovnet           |   32    | 17.654679 |
|          fastNLP_Bert           |    1    | 17.523612 |
|             hf_Bart             |    1    | 17.450311 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 15.530498 |
|            hf_Albert            |    1    | 15.147558 |
|             hf_Bert             |    1    | 14.891587 |
|          squeezenet1_1          |   16    | 13.606502 |
|         pytorch_stargan         |   16    | 13.575258 |
|          hf_DistilBert          |    1    | 12.770784 |
|            resnet152            |   32    | 12.32871  |
|      doctr_reco_predictor       |    1    | 11.275842 |
|          basic_gnn_gcn          |    1    | 10.987002 |
|              vgg16              |    4    | 10.633266 |
|         basic_gnn_sage          |    1    | 10.279225 |
|               drq               |    1    | 9.144867  |
|             alexnet             |   128   | 8.968616  |
|            resnet50             |   32    | 8.845064  |
|         resnext50_32x4d         |    8    | 8.784954  |
|     nvidia_deeprecommender      |   256   | 8.381569  |
|              dlrm               |  2048   | 8.206753  |
|     functorch_maml_omniglot     |    1    | 8.112864  |
|          maml_omniglot          |    5    | 7.808082  |
|            resnet18             |    8    | 7.725787  |
|          mobilenet_v2           |   16    | 7.660062  |
|          basic_gnn_gin          |    1    | 7.657166  |
|           mnasnet1_0            |   32    | 7.434165  |
|     pyhpc_equation_of_state     | 1048576 | 7.208224  |
|        basic_gnn_edgecnn        |    1    |  6.89665  |
|         LearningToPaint         |   96    | 6.758726  |
|        soft_actor_critic        |   256   | 6.601031  |
|         phlippe_resnet          |   128   | 6.341258  |
|          lennard_jones          |  1000   | 5.414023  |
|              dcgan              |   256   | 4.915043  |
|           tts_angular           |   64    | 4.774871  |
|   mobilenet_v2_quantized_qat    |   96    | 0.209496  |
|     resnet50_quantized_qat      |   32    | 0.184186  |
|        timm_efficientdet        |    0    |    0.0    |
|       doctr_det_predictor       |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |  2048   | 0.987365 |
|           timm_nfnet            |   128   | 0.98704  |
|           hf_T5_base            |    1    |  0.9824  |
|             demucs              |    1    | 0.982067 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.977717 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.977524 |
|        timm_efficientnet        |   64    | 0.976885 |
|           timm_regnet           |   32    | 0.973853 |
|             yolov3              |    8    | 0.972472 |
|       Background_Matting        |    1    | 0.971294 |
|              llama              |   32    | 0.970404 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.969893 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.968557 |
|          pytorch_unet           |    1    | 0.968467 |
|            resnet152            |   32    | 0.967254 |
|           densenet121           |   64    | 0.966605 |
|         LearningToPaint         |   96    | 0.966162 |
|           timm_vovnet           |   32    | 0.963787 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.963128 |
|        basic_gnn_edgecnn        |    1    | 0.961958 |
|            resnet50             |   32    | 0.960191 |
|           hf_BigBird            |    1    | 0.959923 |
|     timm_vision_transformer     |   32    | 0.956559 |
|          timm_resnest           |   32    | 0.955505 |
|   mobilenet_v2_quantized_qat    |   96    | 0.954643 |
|     resnet50_quantized_qat      |   32    | 0.953916 |
|             alexnet             |   128   | 0.953047 |
|           mnasnet1_0            |   32    | 0.946008 |
|          basic_gnn_gcn          |    1    | 0.94533  |
|      torch_multimodal_clip      |   32    | 0.944949 |
|          mobilenet_v2           |   16    | 0.939757 |
|       mobilenet_v3_large        |   32    | 0.939657 |
|       shufflenet_v2_x1_0        |   64    | 0.938484 |
|     pyhpc_equation_of_state     | 1048576 | 0.936464 |
|         resnext50_32x4d         |    8    | 0.93421  |
|         basic_gnn_sage          |    1    | 0.931387 |
|          basic_gnn_gin          |    1    | 0.930128 |
|             hf_Bert             |    1    | 0.926673 |
|  timm_vision_transformer_large  |   32    | 0.924215 |
|       speech_transformer        |    1    | 0.923664 |
|      doctr_reco_predictor       |    1    | 0.922806 |
|     nvidia_deeprecommender      |   256   | 0.922262 |
|            moondream            |    1    | 0.920755 |
|          hf_GPT2_large          |    1    | 0.918804 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.918083 |
|          fastNLP_Bert           |    1    | 0.917225 |
|          hf_Bert_large          |    1    | 0.915293 |
|         pytorch_stargan         |   16    | 0.914745 |
|        phlippe_densenet         |   128   | 0.91474  |
|          BERT_pytorch           |    2    | 0.911908 |
|             hf_GPT2             |    1    | 0.90997  |
|            hf_Albert            |    1    | 0.904039 |
|               drq               |    1    | 0.903226 |
|              dcgan              |   256   | 0.902913 |
|            resnet18             |    8    | 0.902519 |
|          squeezenet1_1          |   16    | 0.901582 |
|          hf_Longformer          |    1    | 0.898121 |
|           tts_angular           |   64    | 0.895402 |
|          hf_DistilBert          |    1    | 0.89519  |
|         opacus_cifar10          |   64    | 0.891549 |
|             hf_Bart             |    1    | 0.890868 |
|              hf_T5              |    1    | 0.890855 |
|        soft_actor_critic        |   256   | 0.889807 |
|        hf_distil_whisper        |    1    | 0.886153 |
|              vgg16              |    4    | 0.880617 |
|         phlippe_resnet          |   128   | 0.876234 |
|      functorch_dp_cifar10       |   64    | 0.873231 |
|          lennard_jones          |  1000   | 0.867698 |
|          maml_omniglot          |    5    | 0.858306 |
|     functorch_maml_omniglot     |    1    | 0.856911 |
|           hf_T5_large           |    1    | 0.846756 |
|           hf_Reformer           |    1    | 0.818544 |
|              maml               |    1    | 0.780822 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.761336 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.567313 |
|         DALLE2_pytorch          |    0    |   0.0    |
|       doctr_det_predictor       |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|  timm_vision_transformer_large  |   32    | 856.897075 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 785.93369  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 784.885324 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 705.155834 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 672.076366 |
|           hf_T5_base            |    1    | 428.568779 |
|          hf_GPT2_large          |    1    | 152.426699 |
|           hf_T5_large           |    1    | 126.527212 |
|           timm_nfnet            |   128   | 120.193756 |
|            moondream            |    1    | 115.371239 |
|        hf_distil_whisper        |    1    | 92.252611  |
|    detectron2_fcos_r_50_fpn     |    1    | 83.582014  |
|      torch_multimodal_clip      |   32    | 79.650565  |
|           hf_BigBird            |    1    |  79.3417   |
|       Background_Matting        |    1    | 67.053088  |
|          pytorch_unet           |    1    | 57.421464  |
|             demucs              |    1    | 53.018883  |
|           densenet121           |   64    | 52.796426  |
|              maml               |    1    | 43.834612  |
|           timm_regnet           |   32    |  41.15515  |
|          hf_Longformer          |    1    | 38.454033  |
|          hf_Bert_large          |    1    | 34.718287  |
|   mobilenet_v2_quantized_qat    |   96    | 30.671545  |
|     pyhpc_isoneutral_mixing     | 1048576 | 29.736467  |
|            resnet152            |   32    | 28.939512  |
|             yolov3              |    8    | 28.643905  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  27.97986  |
|        timm_efficientnet        |   64    | 24.127243  |
|       speech_transformer        |    1    | 23.239239  |
|           hf_Reformer           |    1    | 21.689994  |
|     nvidia_deeprecommender      |   256   | 20.930799  |
|     timm_vision_transformer     |   32    |  20.6048   |
|             hf_Bart             |    1    | 18.453769  |
|              hf_T5              |    1    | 18.176745  |
|          fastNLP_Bert           |    1    | 17.202063  |
|         pytorch_stargan         |   16    | 17.124079  |
|           timm_vovnet           |   32    | 16.429483  |
|            hf_Albert            |    1    | 14.633503  |
|         opacus_cifar10          |   64    | 14.438901  |
|      functorch_dp_cifar10       |   64    | 14.221714  |
|             hf_Bert             |    1    | 13.940141  |
|          BERT_pytorch           |    2    | 12.118743  |
|             hf_GPT2             |    1    | 11.049157  |
|            resnet50             |   32    | 10.872455  |
|     resnet50_quantized_qat      |   32    | 10.330701  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  9.603668  |
|              llama              |   32    |  8.954674  |
|          timm_resnest           |   32    |  8.615809  |
|         LearningToPaint         |   96    |  8.136017  |
|          hf_DistilBert          |    1    |  8.132954  |
|           tts_angular           |   64    |  7.908875  |
|       shufflenet_v2_x1_0        |   64    |  7.703037  |
|          basic_gnn_gcn          |    1    |  7.531776  |
|       mobilenet_v3_large        |   32    |  6.514628  |
|        phlippe_densenet         |   128   |  6.432864  |
|        basic_gnn_edgecnn        |    1    |  6.305403  |
|              vgg16              |    4    |  6.106264  |
|             alexnet             |   128   |  5.985651  |
|         resnext50_32x4d         |    8    |  5.889975  |
|           mnasnet1_0            |   32    |  5.117751  |
|              dlrm               |  2048   |  4.191277  |
|         basic_gnn_sage          |    1    |  3.838782  |
|          mobilenet_v2           |   16    |  3.686188  |
|          basic_gnn_gin          |    1    |  3.387736  |
|      doctr_reco_predictor       |    1    |  2.88306   |
|          squeezenet1_1          |   16    |  2.589292  |
|              dcgan              |   256   |  2.556968  |
|            resnet18             |    8    |  2.142823  |
|     pyhpc_equation_of_state     | 1048576 |  1.530129  |
|         phlippe_resnet          |   128   |  1.242826  |
|               drq               |    1    |  0.762951  |
|     functorch_maml_omniglot     |    1    |  0.361814  |
|          maml_omniglot          |    5    |  0.256924  |
|        soft_actor_critic        |   256   |  0.235966  |
|          lennard_jones          |  1000   |  0.132375  |
|        timm_efficientdet        |    0    |    0.0     |
|              moco               |    0    |    0.0     |
|       doctr_det_predictor       |    0    |    0.0     |
|         DALLE2_pytorch          |    0    |    0.0     |
+---------------------------------+---------+------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|            XLNetLMHeadModel             |  8  | 11.846873 |
|       ElectraForQuestionAnswering       | 64  | 4.250612  |
|           ElectraForCausalLM            | 32  | 3.575614  |
|          MobileBertForMaskedLM          | 128 | 3.366259  |
|     MobileBertForQuestionAnswering      | 128 | 3.257002  |
|    LayoutLMForSequenceClassification    | 16  | 3.125301  |
|       RobertaForQuestionAnswering       | 16  | 3.098751  |
|           RobertaForCausalLM            | 16  | 3.034729  |
|        BertForQuestionAnswering         | 16  | 2.977359  |
|                CamemBert                | 16  | 2.953028  |
|           LayoutLMForMaskedLM           | 16  | 2.899235  |
|             BertForMaskedLM             | 16  | 2.888736  |
|    MegatronBertForQuestionAnswering     |  8  | 2.642489  |
|            YituTechConvBert             | 16  | 2.407352  |
|         MegatronBertForCausalLM         |  4  | 2.219608  |
|               DistillGPT2               | 16  | 2.082995  |
|             OPTForCausalLM              |  2  | 1.952999  |
|       MT5ForConditionalGeneration       | 16  | 1.930745  |
|         Speech2Text2ForCausalLM         | 256 | 1.902223  |
|                 T5Small                 |  4  | 1.867864  |
|       BlenderbotSmallForCausalLM        | 64  | 1.865432  |
|       T5ForConditionalGeneration        |  4  |  1.86428  |
|      DebertaV2ForQuestionAnswering      |  1  | 1.853778  |
|      GPT2ForSequenceClassification      |  4  | 1.846444  |
|          DistilBertForMaskedLM          | 128 | 1.828095  |
|             XGLMForCausalLM             |  8  | 1.759622  |
|            PLBartForCausalLM            |  8  | 1.693066  |
|          BlenderbotForCausalLM          |  4  |  1.66064  |
| BlenderbotSmallForConditionalGeneration | 64  | 1.655602  |
|           DebertaForMaskedLM            |  8  | 1.638262  |
|            MBartForCausalLM             |  4  | 1.628393  |
|     DistilBertForQuestionAnswering      | 256 | 1.603815  |
|            TrOCRForCausalLM             | 32  | 1.583008  |
|     M2M100ForConditionalGeneration      | 16  | 1.532227  |
|      MBartForConditionalGeneration      |  2  | 1.513469  |
|       DebertaForQuestionAnswering       | 16  | 1.512612  |
|     PegasusForConditionalGeneration     | 32  |  1.49438  |
|           PegasusForCausalLM            | 32  | 1.489804  |
|      BartForConditionalGeneration       |  2  | 1.424268  |
|             BartForCausalLM             |  4  | 1.412573  |
|               GoogleFnet                | 16  | 1.393762  |
|     PLBartForConditionalGeneration      |  4  | 1.362421  |
|       AlbertForQuestionAnswering        |  4  | 1.331315  |
|            AlbertForMaskedLM            |  4  | 1.266202  |
|          AllenaiLongformerBase          |  4  | 1.049299  |
|          DebertaV2ForMaskedLM           |  2  | 0.846355  |
+-----------------------------------------+-----+-----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|          AllenaiLongformerBase          |  4  | 72.746326 |
|     M2M100ForConditionalGeneration      | 16  | 40.109417 |
|     PegasusForConditionalGeneration     | 32  | 39.637567 |
|          BlenderbotForCausalLM          |  4  | 37.881629 |
|      MBartForConditionalGeneration      |  2  | 37.446601 |
|          MobileBertForMaskedLM          | 128 | 36.993643 |
|       MT5ForConditionalGeneration       | 16  | 36.949223 |
|     MobileBertForQuestionAnswering      | 128 | 35.869436 |
|             XGLMForCausalLM             |  8  | 32.793017 |
|                 T5Small                 |  4  | 31.539628 |
|       T5ForConditionalGeneration        |  4  | 31.441642 |
|          DebertaV2ForMaskedLM           |  2  | 28.693166 |
|             OPTForCausalLM              |  2  | 27.320937 |
|      DebertaV2ForQuestionAnswering      |  1  | 27.157659 |
| BlenderbotSmallForConditionalGeneration | 64  | 26.361476 |
|         MegatronBertForCausalLM         |  4  | 25.181972 |
|      BartForConditionalGeneration       |  2  | 25.100075 |
|           PegasusForCausalLM            | 32  | 25.062261 |
|     PLBartForConditionalGeneration      |  4  | 24.970515 |
|    MegatronBertForQuestionAnswering     |  8  | 24.09154  |
|            MBartForCausalLM             |  4  | 23.538583 |
|      GPT2ForSequenceClassification      |  4  | 22.396165 |
|            YituTechConvBert             | 16  | 22.294423 |
|               DistillGPT2               | 16  | 21.214243 |
|            TrOCRForCausalLM             | 32  | 19.546505 |
|       DebertaForQuestionAnswering       | 16  | 19.451207 |
|           DebertaForMaskedLM            |  8  | 19.180736 |
|          DistilBertForMaskedLM          | 128 | 16.997459 |
|     DistilBertForQuestionAnswering      | 256 | 16.790137 |
|       BlenderbotSmallForCausalLM        | 64  | 16.487101 |
|         Speech2Text2ForCausalLM         | 256 | 16.21644  |
|               GoogleFnet                | 16  | 16.141198 |
|            PLBartForCausalLM            |  8  | 15.978994 |
|           RobertaForCausalLM            | 16  | 15.652025 |
|            AlbertForMaskedLM            |  4  | 15.583679 |
|       AlbertForQuestionAnswering        |  4  | 15.227525 |
|                CamemBert                | 16  | 15.101664 |
|             BartForCausalLM             |  4  | 14.989691 |
|       RobertaForQuestionAnswering       | 16  | 14.987468 |
|    LayoutLMForSequenceClassification    | 16  | 14.854643 |
|            XLNetLMHeadModel             |  8  | 14.744866 |
|           ElectraForCausalLM            | 32  | 14.23138  |
|           LayoutLMForMaskedLM           | 16  | 14.182431 |
|             BertForMaskedLM             | 16  | 14.000408 |
|        BertForQuestionAnswering         | 16  | 13.846975 |
|       ElectraForQuestionAnswering       | 64  | 13.521881 |
+-----------------------------------------+-----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  | 0.992737 |
|            PLBartForCausalLM            |  8  | 0.991906 |
|       AlbertForQuestionAnswering        |  4  | 0.991682 |
|               GoogleFnet                | 16  | 0.99092  |
|               DistillGPT2               | 16  | 0.990296 |
|          DistilBertForMaskedLM          | 128 | 0.990233 |
|           ElectraForCausalLM            | 32  | 0.990165 |
|                CamemBert                | 16  | 0.98812  |
|            YituTechConvBert             | 16  | 0.987571 |
|             OPTForCausalLM              |  2  | 0.98726  |
|       BlenderbotSmallForCausalLM        | 64  | 0.987245 |
|         Speech2Text2ForCausalLM         | 256 | 0.986656 |
|     DistilBertForQuestionAnswering      | 256 | 0.986586 |
|           RobertaForCausalLM            | 16  | 0.986087 |
|             BertForMaskedLM             | 16  | 0.985845 |
|           LayoutLMForMaskedLM           | 16  | 0.985556 |
|       MT5ForConditionalGeneration       | 16  | 0.984482 |
|       ElectraForQuestionAnswering       | 64  | 0.984416 |
|       DebertaForQuestionAnswering       | 16  | 0.983353 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.982405 |
|          MobileBertForMaskedLM          | 128 | 0.982032 |
|            TrOCRForCausalLM             | 32  | 0.981817 |
|    LayoutLMForSequenceClassification    | 16  | 0.981453 |
|       RobertaForQuestionAnswering       | 16  | 0.98076  |
|        BertForQuestionAnswering         | 16  | 0.980363 |
|          AllenaiLongformerBase          |  4  | 0.978627 |
|      GPT2ForSequenceClassification      |  4  | 0.976143 |
|       T5ForConditionalGeneration        |  4  | 0.973459 |
|                 T5Small                 |  4  | 0.973181 |
|     MobileBertForQuestionAnswering      | 128 | 0.972756 |
|             BartForCausalLM             |  4  | 0.971692 |
|           DebertaForMaskedLM            |  8  | 0.969458 |
|           PegasusForCausalLM            | 32  | 0.967773 |
|     PLBartForConditionalGeneration      |  4  | 0.967018 |
|    MegatronBertForQuestionAnswering     |  8  | 0.956908 |
|            XLNetLMHeadModel             |  8  | 0.955346 |
|         MegatronBertForCausalLM         |  4  | 0.948385 |
|             XGLMForCausalLM             |  8  | 0.939001 |
|          BlenderbotForCausalLM          |  4  | 0.936244 |
|      MBartForConditionalGeneration      |  2  | 0.933549 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.92826  |
|     PegasusForConditionalGeneration     | 32  | 0.919073 |
|     M2M100ForConditionalGeneration      | 16  | 0.915858 |
|      BartForConditionalGeneration       |  2  | 0.913951 |
|            MBartForCausalLM             |  4  | 0.91093  |
|          DebertaV2ForMaskedLM           |  2  | 0.888631 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 951.766839 |
|            AlbertForMaskedLM            |  4  | 565.65106  |
|       AlbertForQuestionAnswering        |  4  | 533.640809 |
|            XLNetLMHeadModel             |  8  | 396.253199 |
|               GoogleFnet                | 16  | 266.646808 |
|          DebertaV2ForMaskedLM           |  2  | 218.057096 |
|             OPTForCausalLM              |  2  | 205.968166 |
|            TrOCRForCausalLM             | 32  | 176.753737 |
|            MBartForCausalLM             |  4  | 176.610159 |
|      MBartForConditionalGeneration      |  2  | 173.162703 |
|     PegasusForConditionalGeneration     | 32  | 167.195414 |
|       T5ForConditionalGeneration        |  4  | 152.08024  |
|                 T5Small                 |  4  | 151.800443 |
|     PLBartForConditionalGeneration      |  4  | 147.690847 |
|     DistilBertForQuestionAnswering      | 256 | 146.901017 |
|            PLBartForCausalLM            |  8  | 136.549413 |
|      BartForConditionalGeneration       |  2  | 132.884687 |
|    MegatronBertForQuestionAnswering     |  8  | 125.396114 |
|          BlenderbotForCausalLM          |  4  | 123.03024  |
|            YituTechConvBert             | 16  | 118.820236 |
|       DebertaForQuestionAnswering       | 16  | 117.392119 |
|     M2M100ForConditionalGeneration      | 16  | 116.255378 |
| BlenderbotSmallForConditionalGeneration | 64  | 104.630423 |
|          DistilBertForMaskedLM          | 128 | 104.164832 |
|           RobertaForCausalLM            | 16  | 102.268883 |
|             BartForCausalLM             |  4  | 97.310965  |
|           LayoutLMForMaskedLM           | 16  | 94.280842  |
|                CamemBert                | 16  | 92.978111  |
|             BertForMaskedLM             | 16  | 92.799512  |
|          MobileBertForMaskedLM          | 128 | 86.271849  |
|         MegatronBertForCausalLM         |  4  |  83.65648  |
|               DistillGPT2               | 16  | 82.901923  |
|           PegasusForCausalLM            | 32  | 81.127051  |
|           DebertaForMaskedLM            |  8  | 79.050404  |
|             XGLMForCausalLM             |  8  | 76.227729  |
|        BertForQuestionAnswering         | 16  | 75.619327  |
|    LayoutLMForSequenceClassification    | 16  | 73.025159  |
|       RobertaForQuestionAnswering       | 16  | 72.370282  |
|      GPT2ForSequenceClassification      |  4  | 69.962329  |
|       MT5ForConditionalGeneration       | 16  | 65.245994  |
|       ElectraForQuestionAnswering       | 64  | 60.370815  |
|         Speech2Text2ForCausalLM         | 256 | 58.767814  |
|      DebertaV2ForQuestionAnswering      |  1  | 57.915405  |
|           ElectraForCausalLM            | 32  | 57.419185  |
|       BlenderbotSmallForCausalLM        | 64  | 53.555687  |
|     MobileBertForQuestionAnswering      | 128 | 50.686996  |
+-----------------------------------------+-----+------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           resnest101e           |  64  | 5.036248 |
|           mnasnet_100           | 512  | 4.708549 |
|           fbnetc_100            | 512  | 4.697192 |
|          spnasnet_100           | 128  | 4.65402  |
|            lcnet_050            | 256  |  4.479   |
|      mobilenetv3_large_100      | 512  | 4.357488 |
|          cspdarknet53           |  64  | 4.335126 |
|           regnety_002           | 1024 | 4.289812 |
|         mobilenetv2_100         | 128  |  4.2833  |
|          botnet26t_256          | 128  | 4.225179 |
|        ese_vovnet19b_dw         | 256  | 4.110605 |
|            hrnet_w18            | 128  | 4.064987 |
|           res2next50            | 128  | 4.044073 |
|        res2net101_26w_4s        | 128  | 3.865294 |
|          inception_v3           | 128  | 3.795838 |
|             dla102              | 128  | 3.777601 |
|          pnasnet5large          |  16  | 3.769363 |
|       gluon_inception_v3        | 256  | 3.74652  |
|            gernet_l             | 128  | 3.708461 |
|        res2net50_14w_8s         | 128  | 3.694382 |
|        adv_inception_v3         | 128  | 3.649629 |
|            fbnetv3_b            | 256  | 3.619479 |
|     swsl_resnext101_32x16d      |  32  | 3.563564 |
|           rexnet_100            | 256  | 3.553939 |
|       eca_botnext26ts_256       | 128  | 3.327205 |
|            nfnet_l0             | 128  | 3.174627 |
|        eca_halonext26ts         | 128  | 2.965377 |
|           volo_d1_224           |  64  |  2.6418  |
|            repvgg_a2            | 128  | 2.637029 |
|           dm_nfnet_f0           | 128  | 2.633008 |
|           selecsls42b           | 128  | 2.626501 |
|          ghostnet_100           | 512  | 2.403228 |
|       tf_efficientnet_b0        | 128  | 2.395036 |
|            tinynet_a            | 128  | 2.378484 |
|         visformer_small         | 128  | 2.319353 |
|           convit_base           |  64  | 2.103588 |
|        convmixer_768_32         |  32  | 2.018377 |
|         poolformer_m36          |  64  | 2.011904 |
|             dpn107              |  64  | 1.925697 |
|            levit_128            | 1024 | 1.863192 |
|          convnext_base          |  64  | 1.85974  |
|           tf_mixnet_l           | 128  | 1.839484 |
|            mixnet_l             | 128  | 1.768992 |
|      xcit_large_24_p8_224       |  16  | 1.750372 |
|        twins_pcpvt_base         | 128  | 1.633391 |
|  swin_base_patch4_window7_224   |  64  | 1.608933 |
|        sebotnet33ts_256         |  64  | 1.588832 |
|          gmlp_s16_224           | 128  | 1.588241 |
|           mobilevit_s           |  64  | 1.551421 |
|      beit_base_patch16_224      |  64  | 1.447452 |
| deit_base_distilled_patch16_224 |  64  | 1.437987 |
|          mixer_b16_224          | 128  | 1.375454 |
|          gmixer_24_224          | 128  | 1.369802 |
|      vit_base_patch16_224       |  64  | 1.350473 |
|         crossvit_9_240          | 256  | 1.346188 |
|            pit_b_224            |  64  | 1.339445 |
|        tnt_s_patch16_224        | 128  | 1.258112 |
|          jx_nest_base           |  32  | 1.099688 |
|          resmlp_12_224          | 128  | 1.054985 |
|          cait_m36_384           |  4   | 0.846597 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 8  |     pass      |
|           dm_nfnet_f0           | 8  |     pass      |
|            mixnet_l             | 8  |     pass      |
|       eca_botnext26ts_256       | 8  |     pass      |
|         crossvit_9_240          | 8  |     pass      |
| deit_base_distilled_patch16_224 | 8  |     pass      |
|          convnext_base          | 8  |     pass      |
|           regnety_002           | 8  |     pass      |
|           convit_base           | 8  |     pass      |
|          gmlp_s16_224           | 8  |     pass      |
|          cait_m36_384           | 8  |     pass      |
|             dpn107              | 8  |     pass      |
|      vit_base_patch16_224       | 8  |     pass      |
|          cspdarknet53           | 8  |     pass      |
|        eca_halonext26ts         | 8  |     pass      |
|        ese_vovnet19b_dw         | 8  |     pass      |
|           fbnetc_100            | 8  |     pass      |
|            fbnetv3_b            | 8  |     pass      |
|            gernet_l             | 8  |     pass      |
|          botnet26t_256          | 8  |     pass      |
|       gluon_inception_v3        | 8  |     pass      |
|          gmixer_24_224          | 8  |     pass      |
|          jx_nest_base           | 8  |     pass      |
|        convmixer_768_32         | 8  |     pass      |
|        twins_pcpvt_base         | 8  |     pass      |
|         poolformer_m36          | 8  |     pass      |
|            lcnet_050            | 8  |     pass      |
|          mixer_b16_224          | 8  |     pass      |
|        tnt_s_patch16_224        | 8  |     pass      |
|           mnasnet_100           | 8  |     pass      |
|         mobilenetv2_100         | 8  |     pass      |
|      mobilenetv3_large_100      | 8  |     pass      |
|           mobilevit_s           | 8  |     pass      |
|            nfnet_l0             | 8  |     pass      |
|            pit_b_224            | 8  |     pass      |
|      beit_base_patch16_224      | 8  |     pass      |
|            repvgg_a2            | 8  |     pass      |
|  swin_base_patch4_window7_224   | 8  |     pass      |
|          inception_v3           | 8  |     pass      |
|           tf_mixnet_l           | 8  |     pass      |
|       tf_efficientnet_b0        | 8  |     pass      |
|            tinynet_a            | 8  |     pass      |
|          spnasnet_100           | 8  |     pass      |
|        sebotnet33ts_256         | 8  |     pass      |
|          resmlp_12_224          | 8  |     pass      |
|         coat_lite_mini          | 8  |  fail_to_run  |
|           resnest101e           | 8  | fail_accuracy |
|            levit_128            | 8  | fail_accuracy |
|          ghostnet_100           | 8  | fail_accuracy |
|          pnasnet5large          | 8  | fail_accuracy |
|        res2net101_26w_4s        | 8  | fail_accuracy |
|        res2net50_14w_8s         | 8  | fail_accuracy |
|             dla102              | 8  | fail_accuracy |
|     swsl_resnext101_32x16d      | 8  | fail_accuracy |
|           rexnet_100            | 8  | fail_accuracy |
|           selecsls42b           | 8  | fail_accuracy |
|           res2next50            | 8  | fail_accuracy |
|            hrnet_w18            | 8  | fail_accuracy |
|           volo_d1_224           | 8  | fail_accuracy |
|         visformer_small         | 8  | fail_accuracy |
|      xcit_large_24_p8_224       | 8  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+------+-----------+
|              name               |  bs  | inductor  |
+---------------------------------+------+-----------+
|          pnasnet5large          |  16  | 75.841614 |
|  swin_base_patch4_window7_224   |  64  | 71.935624 |
|             dpn107              |  64  | 62.341153 |
|           tf_mixnet_l           | 128  | 62.057738 |
|           rexnet_100            | 256  | 54.843974 |
|           mobilevit_s           |  64  | 53.013552 |
|        eca_halonext26ts         | 128  | 52.309114 |
|        sebotnet33ts_256         |  64  | 51.273166 |
|            levit_128            | 1024 | 50.602076 |
|          jx_nest_base           |  32  | 50.428295 |
|         crossvit_9_240          | 256  | 49.847094 |
|        res2net50_14w_8s         | 128  | 49.510266 |
|          ghostnet_100           | 512  | 49.225497 |
|        tnt_s_patch16_224        | 128  | 48.306549 |
|          cait_m36_384           |  4   | 46.509534 |
|      xcit_large_24_p8_224       |  16  | 45.343176 |
|            mixnet_l             | 128  | 45.081639 |
|           dm_nfnet_f0           | 128  | 44.255508 |
|         poolformer_m36          |  64  | 43.946671 |
|        twins_pcpvt_base         | 128  | 43.489115 |
|         visformer_small         | 128  | 42.486217 |
|       eca_botnext26ts_256       | 128  | 41.695962 |
|           volo_d1_224           |  64  | 39.996581 |
|           resnest101e           |  64  | 36.845532 |
|       tf_efficientnet_b0        | 128  | 35.943572 |
|        res2net101_26w_4s        | 128  | 34.44006  |
|           convit_base           |  64  | 34.23339  |
|          convnext_base          |  64  | 33.977216 |
|            hrnet_w18            | 128  | 33.534055 |
|            nfnet_l0             | 128  | 33.174012 |
|       gluon_inception_v3        | 256  | 32.575689 |
|        adv_inception_v3         | 128  | 31.435353 |
|          botnet26t_256          | 128  | 31.281978 |
|          inception_v3           | 128  | 31.206118 |
|            tinynet_a            | 128  | 29.909534 |
|           res2next50            | 128  | 29.827454 |
|            pit_b_224            |  64  | 26.082676 |
|            fbnetv3_b            | 256  | 24.71832  |
|             dla102              | 128  | 23.961003 |
|          cspdarknet53           |  64  | 22.71451  |
|        ese_vovnet19b_dw         | 256  | 21.522487 |
| deit_base_distilled_patch16_224 |  64  | 21.378954 |
|      mobilenetv3_large_100      | 512  | 21.007699 |
|          gmixer_24_224          | 128  | 20.338798 |
|      vit_base_patch16_224       |  64  | 19.533052 |
|          gmlp_s16_224           | 128  | 18.795959 |
|           regnety_002           | 1024 | 16.304307 |
|          resmlp_12_224          | 128  | 15.520773 |
|      beit_base_patch16_224      |  64  | 15.325011 |
|          mixer_b16_224          | 128  | 14.018424 |
|        convmixer_768_32         |  32  | 13.705195 |
|            repvgg_a2            | 128  | 13.518063 |
|           selecsls42b           | 128  | 13.19107  |
|            lcnet_050            | 256  | 11.323918 |
|     swsl_resnext101_32x16d      |  32  | 10.837349 |
|          spnasnet_100           | 128  | 8.366123  |
|           fbnetc_100            | 512  | 8.361864  |
|         mobilenetv2_100         | 128  | 8.335463  |
|            gernet_l             | 128  | 8.101776  |
|           mnasnet_100           | 512  | 7.823164  |
+---------------------------------+------+-----------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.99522  |
|            fbnetv3_b            | 256  | 0.993723 |
|           fbnetc_100            | 512  | 0.993567 |
|           mnasnet_100           | 512  | 0.99314  |
|      mobilenetv3_large_100      | 512  | 0.992745 |
|           rexnet_100            | 256  | 0.992375 |
|           regnety_002           | 1024 | 0.992113 |
|            levit_128            | 1024 | 0.991792 |
|          ghostnet_100           | 512  | 0.991729 |
|           dm_nfnet_f0           | 128  | 0.991111 |
|       gluon_inception_v3        | 256  | 0.99098  |
|             dla102              | 128  | 0.990157 |
|       eca_botnext26ts_256       | 128  | 0.98987  |
|           convit_base           |  64  | 0.989447 |
|           res2next50            | 128  | 0.989257 |
|      xcit_large_24_p8_224       |  16  | 0.989183 |
|          mixer_b16_224          | 128  | 0.989042 |
|        eca_halonext26ts         | 128  | 0.988875 |
|        twins_pcpvt_base         | 128  | 0.988716 |
|             dpn107              |  64  | 0.988568 |
|          gmlp_s16_224           | 128  | 0.987728 |
|          inception_v3           | 128  | 0.987654 |
|        res2net101_26w_4s        | 128  | 0.987602 |
|         visformer_small         | 128  | 0.987469 |
|           tf_mixnet_l           | 128  | 0.987441 |
|          botnet26t_256          | 128  | 0.987342 |
|        convmixer_768_32         |  32  | 0.987314 |
|            mixnet_l             | 128  | 0.987122 |
|         mobilenetv2_100         | 128  | 0.987031 |
|        adv_inception_v3         | 128  | 0.986978 |
|       tf_efficientnet_b0        | 128  |  0.9867  |
|        res2net50_14w_8s         | 128  | 0.986428 |
|            gernet_l             | 128  | 0.985993 |
|            nfnet_l0             | 128  | 0.985962 |
|          cspdarknet53           |  64  | 0.985176 |
|          convnext_base          |  64  | 0.98515  |
|      beit_base_patch16_224      |  64  | 0.984972 |
|          gmixer_24_224          | 128  | 0.984891 |
|        tnt_s_patch16_224        | 128  | 0.984739 |
|            tinynet_a            | 128  | 0.984701 |
|            hrnet_w18            | 128  | 0.984567 |
|          pnasnet5large          |  16  | 0.984515 |
|         crossvit_9_240          | 256  | 0.983584 |
|  swin_base_patch4_window7_224   |  64  | 0.983488 |
|      vit_base_patch16_224       |  64  | 0.983233 |
| deit_base_distilled_patch16_224 |  64  | 0.983171 |
|           mobilevit_s           |  64  | 0.982508 |
|          resmlp_12_224          | 128  | 0.982414 |
|            pit_b_224            |  64  | 0.981794 |
|           selecsls42b           | 128  | 0.981368 |
|         poolformer_m36          |  64  | 0.981144 |
|          spnasnet_100           | 128  | 0.980949 |
|           resnest101e           |  64  | 0.980032 |
|           volo_d1_224           |  64  | 0.978913 |
|            lcnet_050            | 256  | 0.975891 |
|          cait_m36_384           |  4   | 0.97476  |
|          jx_nest_base           |  32  | 0.973228 |
|            repvgg_a2            | 128  | 0.971719 |
|     swsl_resnext101_32x16d      |  32  | 0.961014 |
|        sebotnet33ts_256         |  64  | 0.835055 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          cait_m36_384           |  4   | 467.158738 |
|      xcit_large_24_p8_224       |  16  | 277.552882 |
|        tnt_s_patch16_224        | 128  | 246.073249 |
|             dpn107              |  64  | 223.289532 |
|            levit_128            | 1024 | 213.308221 |
|           dm_nfnet_f0           | 128  | 207.528335 |
|       gluon_inception_v3        | 256  | 180.251139 |
|        ese_vovnet19b_dw         | 256  | 174.180469 |
|          convnext_base          |  64  | 170.812413 |
|          mixer_b16_224          | 128  | 169.255785 |
|           convit_base           |  64  | 160.264823 |
|         poolformer_m36          |  64  | 156.749434 |
|            nfnet_l0             | 128  | 156.203947 |
|        twins_pcpvt_base         | 128  | 148.409797 |
|  swin_base_patch4_window7_224   |  64  | 141.436132 |
|         crossvit_9_240          | 256  | 135.701578 |
|           tf_mixnet_l           | 128  | 130.130961 |
|            mixnet_l             | 128  | 129.805461 |
|           volo_d1_224           |  64  | 125.201126 |
|          ghostnet_100           | 512  | 124.861705 |
|          jx_nest_base           |  32  | 123.913325 |
|           mobilevit_s           |  64  | 114.827176 |
|        sebotnet33ts_256         |  64  | 114.597171 |
|          gmixer_24_224          | 128  | 112.339725 |
|      vit_base_patch16_224       |  64  | 110.547877 |
| deit_base_distilled_patch16_224 |  64  | 110.378298 |
|      beit_base_patch16_224      |  64  | 107.901262 |
|          resmlp_12_224          | 128  | 106.503412 |
|        eca_halonext26ts         | 128  | 104.714654 |
|            pit_b_224            |  64  | 103.40242  |
|          gmlp_s16_224           | 128  | 102.14247  |
|        res2net101_26w_4s        | 128  | 101.966475 |
|        convmixer_768_32         |  32  | 101.796824 |
|          pnasnet5large          |  16  | 101.382906 |
|           fbnetc_100            | 512  | 97.580415  |
|       eca_botnext26ts_256       | 128  |  92.0275   |
|        adv_inception_v3         | 128  | 87.450981  |
|           regnety_002           | 1024 |  86.9916   |
|          inception_v3           | 128  | 86.706155  |
|         visformer_small         | 128  | 85.719968  |
|     swsl_resnext101_32x16d      |  32  | 85.688244  |
|             dla102              | 128  | 84.944025  |
|        res2net50_14w_8s         | 128  | 83.570163  |
|           rexnet_100            | 256  | 82.389429  |
|           mnasnet_100           | 512  | 82.112799  |
|            fbnetv3_b            | 256  | 81.706693  |
|            hrnet_w18            | 128  | 81.219401  |
|           res2next50            | 128  |  79.0352   |
|      mobilenetv3_large_100      | 512  | 78.495988  |
|           resnest101e           |  64  | 73.148835  |
|          botnet26t_256          | 128  | 70.892428  |
|       tf_efficientnet_b0        | 128  | 56.586299  |
|          cspdarknet53           |  64  | 49.338371  |
|            repvgg_a2            | 128  |  46.9759   |
|            tinynet_a            | 128  | 39.379041  |
|            gernet_l             | 128  | 35.987927  |
|           selecsls42b           | 128  | 33.363761  |
|         mobilenetv2_100         | 128  | 22.108894  |
|          spnasnet_100           | 128  | 16.977991  |
|            lcnet_050            | 256  |  7.736581  |
+---------------------------------+------+------------+

@WeizhuoZhang-intel
Copy link
Contributor

[amp] Performance Dashboard for amp precision -- Single-core Single-thread (2024-04-27 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 7478b7f1cac9686f00edf3db4667cf86d2421531
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c7i.metal-24xl
CPU Model Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory 192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS Ubuntu 22.04.3 LTS
Kernel 6.2.0-1017-aws
Microcode 0x2b0004d0
GCC gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.8.18
OpenSSL OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 80%, 63/79 | 98%, 45/46  | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.44x    |    1.76x    |    2.55x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   18.60    |    21.68    |    26.06    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.89x    |    0.90x    |    0.86x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 49.967517 |
|     pyhpc_equation_of_state     |    1    | 20.550304 |
|              dcgan              |    1    | 9.418206  |
|          squeezenet1_1          |    1    | 8.380501  |
|          timm_resnest           |    1    | 6.679343  |
|          lennard_jones          |    1    | 6.232648  |
|            resnet18             |    1    | 5.845043  |
|          maml_omniglot          |    5    | 5.809353  |
|         opacus_cifar10          |    1    | 5.604349  |
|      functorch_dp_cifar10       |    1    | 5.571682  |
|            resnet50             |    1    | 5.271456  |
|         LearningToPaint         |    1    | 5.002961  |
|         resnext50_32x4d         |    1    | 4.910784  |
|           timm_nfnet            |    1    | 4.881065  |
|          mobilenet_v2           |    1    | 4.685891  |
|              vgg16              |    1    | 4.638741  |
|           mnasnet1_0            |    1    | 4.417426  |
|            resnet152            |    1    | 4.255526  |
|           timm_vovnet           |    1    | 4.174769  |
|     nvidia_deeprecommender      |    1    | 4.152537  |
|             alexnet             |    1    | 4.108328  |
|      doctr_reco_predictor       |    1    | 3.927046  |
|             yolov3              |    1    | 3.912831  |
|       mobilenet_v3_large        |    1    | 3.784338  |
|              llama              |    1    | 3.733841  |
|       shufflenet_v2_x1_0        |    1    | 3.466075  |
|     functorch_maml_omniglot     |    1    | 3.333208  |
|           densenet121           |    1    | 3.188606  |
|           timm_regnet           |    1    | 3.101055  |
|         phlippe_resnet          |    1    | 2.904985  |
|              dlrm               |    1    | 2.826724  |
|          basic_gnn_gcn          |    1    |  2.80677  |
|    detectron2_fcos_r_50_fpn     |    1    | 2.618034  |
|        phlippe_densenet         |    1    | 2.577769  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 2.479367  |
|               drq               |    1    |  2.42679  |
|        timm_efficientnet        |    1    | 2.371521  |
|          BERT_pytorch           |    1    | 2.199833  |
|          pytorch_unet           |    1    | 2.128032  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 2.090461  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2.008158  |
|        soft_actor_critic        |   256   | 1.992993  |
|          basic_gnn_gin          |    1    | 1.966777  |
|       Background_Matting        |    1    | 1.832443  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.814788  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  1.79921  |
|         basic_gnn_sage          |    1    | 1.798542  |
|     timm_vision_transformer     |    1    |  1.78707  |
|             hf_Bert             |    1    | 1.760897  |
|          hf_Bert_large          |    1    | 1.739527  |
|             hf_GPT2             |    1    | 1.705287  |
|        basic_gnn_edgecnn        |    1    | 1.671881  |
|          hf_DistilBert          |    1    | 1.636433  |
|         pytorch_stargan         |   16    | 1.609769  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.595058  |
|  timm_vision_transformer_large  |    1    | 1.533589  |
|             hf_Bart             |    1    | 1.516317  |
|          hf_GPT2_large          |    1    | 1.473555  |
|           hf_T5_base            |    1    | 1.438041  |
|        hf_distil_whisper        |    1    | 1.330228  |
|           hf_BigBird            |    1    | 1.310516  |
|            moondream            |    1    |  1.28077  |
|       speech_transformer        |    1    | 1.271927  |
|           hf_T5_large           |    1    |  1.20919  |
|          fastNLP_Bert           |    1    | 1.167278  |
|            hf_Albert            |    1    | 1.162571  |
|              hf_T5              |    1    | 1.085954  |
|           hf_Reformer           |    1    | 1.041464  |
|      torch_multimodal_clip      |    1    | 1.030481  |
|             demucs              |    1    | 1.027527  |
|           tts_angular           |    1    | 0.999008  |
|     resnet50_quantized_qat      |    1    | 0.986578  |
|   mobilenet_v2_quantized_qat    |    1    | 0.978046  |
|              maml               |    1    |  0.96704  |
|          hf_Longformer          |    1    | 0.913587  |
|        timm_efficientdet        |    0    |    0.0    |
|       doctr_det_predictor       |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
|              moco               |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|               drq               |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|       speech_transformer        |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|              llama              |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|          timm_resnest           |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
|        timm_efficientdet        |    0    | model_fail_to_load |
|         vision_maskrcnn         |    1    |    fail_to_run     |
|           Super_SloMo           |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_101_fpn |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    1    |    fail_to_run     |
|          mobilenet_v2           |    1    |   fail_accuracy    |
|           mnasnet1_0            |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_c4  |    1    |   fail_accuracy    |
|           densenet121           |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    1    |   fail_accuracy    |
|       shufflenet_v2_x1_0        |    1    |   fail_accuracy    |
|         resnext50_32x4d         |    1    |   fail_accuracy    |
|            resnet50             |    1    |   fail_accuracy    |
|            resnet18             |    1    |   fail_accuracy    |
|            resnet152            |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    1    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    1    |   fail_accuracy    |
|       doctr_det_predictor       |    0    | eager_fail_to_run  |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           densenet121           |    1    | 70.579194 |
|           hf_BigBird            |    1    | 65.929286 |
|    detectron2_fcos_r_50_fpn     |    1    |  51.6771  |
|              maml               |    1    | 47.75485  |
|           hf_T5_large           |    1    | 47.554721 |
|          hf_Longformer          |    1    | 40.199042 |
|           timm_nfnet            |    1    | 37.56869  |
|            moondream            |    1    | 37.470566 |
|           hf_Reformer           |    1    | 36.386836 |
|      torch_multimodal_clip      |    1    | 33.513255 |
|       speech_transformer        |    1    | 33.449511 |
|        phlippe_densenet         |    1    | 32.933609 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 30.836948 |
|  timm_vision_transformer_large  |    1    | 29.819574 |
|             demucs              |    1    | 29.494119 |
|          hf_GPT2_large          |    1    | 29.083867 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 28.871767 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 27.989867 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 26.244476 |
|        timm_efficientnet        |    1    | 25.885133 |
|        hf_distil_whisper        |    1    | 24.582147 |
|              hf_T5              |    1    | 23.94062  |
|             yolov3              |    1    | 22.977527 |
|         opacus_cifar10          |    1    | 21.684323 |
|              llama              |    1    | 21.496875 |
|      functorch_dp_cifar10       |    1    | 20.56319  |
|          hf_Bert_large          |    1    | 20.018146 |
|          timm_resnest           |    1    | 19.424277 |
|             hf_GPT2             |    1    | 19.343861 |
|       shufflenet_v2_x1_0        |    1    | 18.464054 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 18.304265 |
|     timm_vision_transformer     |    1    | 17.475465 |
|       mobilenet_v3_large        |    1    | 17.428613 |
|          BERT_pytorch           |    1    | 17.139666 |
|          fastNLP_Bert           |    1    | 16.857436 |
|           timm_regnet           |    1    | 16.400679 |
|             hf_Bart             |    1    | 16.354867 |
|           timm_vovnet           |    1    | 16.169855 |
|           hf_T5_base            |    1    | 15.872139 |
|            hf_Albert            |    1    | 14.599245 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 14.429842 |
|             hf_Bert             |    1    | 13.995077 |
|       Background_Matting        |    1    | 12.951744 |
|          squeezenet1_1          |    1    | 12.638318 |
|         pytorch_stargan         |   16    | 12.45228  |
|          hf_DistilBert          |    1    | 12.06794  |
|            resnet152            |    1    |  11.466   |
|      doctr_reco_predictor       |    1    | 10.927155 |
|              vgg16              |    1    | 9.727972  |
|     pyhpc_isoneutral_mixing     |    1    | 9.049268  |
|               drq               |    1    | 8.909614  |
|          pytorch_unet           |    1    | 8.767333  |
|          basic_gnn_gcn          |    1    | 8.608047  |
|            resnet50             |    1    |  8.20616  |
|         resnext50_32x4d         |    1    | 8.158192  |
|             alexnet             |    1    | 8.132387  |
|     functorch_maml_omniglot     |    1    | 7.816295  |
|     nvidia_deeprecommender      |    1    | 7.804957  |
|              dlrm               |    1    | 7.742108  |
|          maml_omniglot          |    5    |  7.50944  |
|         basic_gnn_sage          |    1    | 7.311903  |
|          mobilenet_v2           |    1    | 7.264377  |
|            resnet18             |    1    | 7.237676  |
|           mnasnet1_0            |    1    | 6.967003  |
|     pyhpc_equation_of_state     |    1    | 6.645739  |
|        basic_gnn_edgecnn        |    1    | 6.365929  |
|        soft_actor_critic        |   256   | 6.348829  |
|         LearningToPaint         |    1    | 6.320285  |
|         phlippe_resnet          |    1    | 6.036382  |
|          basic_gnn_gin          |    1    | 5.587119  |
|          lennard_jones          |    1    | 4.762646  |
|              dcgan              |    1    | 4.642563  |
|           tts_angular           |    1    | 4.542383  |
|   mobilenet_v2_quantized_qat    |    1    | 0.191112  |
|     resnet50_quantized_qat      |    1    | 0.174898  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
|       doctr_det_predictor       |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.986776 |
|             demucs              |    1    | 0.982432 |
|           hf_T5_large           |    1    | 0.980904 |
|           hf_T5_base            |    1    | 0.979345 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.976921 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.976771 |
|       Background_Matting        |    1    | 0.970683 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.970115 |
|          pytorch_unet           |    1    | 0.967826 |
|              llama              |    1    | 0.964455 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.964442 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.961905 |
|           hf_BigBird            |    1    |  0.9611  |
|        basic_gnn_edgecnn        |    1    | 0.960199 |
|         LearningToPaint         |    1    | 0.954295 |
|     resnet50_quantized_qat      |    1    | 0.94617  |
|      doctr_reco_predictor       |    1    | 0.943476 |
|      torch_multimodal_clip      |    1    | 0.941551 |
|          basic_gnn_gcn          |    1    | 0.940181 |
|              hf_T5              |    1    | 0.934413 |
|         basic_gnn_sage          |    1    | 0.932011 |
|          basic_gnn_gin          |    1    | 0.93059  |
|        hf_distil_whisper        |    1    | 0.928824 |
|         pytorch_stargan         |   16    | 0.928397 |
|       speech_transformer        |    1    | 0.926091 |
|             hf_Bert             |    1    | 0.922854 |
|          fastNLP_Bert           |    1    | 0.921846 |
|          hf_GPT2_large          |    1    | 0.919118 |
|            hf_Albert            |    1    | 0.916844 |
|   mobilenet_v2_quantized_qat    |    1    | 0.914694 |
|          BERT_pytorch           |    1    | 0.910165 |
|          hf_DistilBert          |    1    | 0.908687 |
|             hf_GPT2             |    1    | 0.908657 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.907813 |
|          hf_Longformer          |    1    | 0.896935 |
|          hf_Bert_large          |    1    | 0.895127 |
|               drq               |    1    | 0.893179 |
|         opacus_cifar10          |    1    | 0.893018 |
|             hf_Bart             |    1    | 0.890397 |
|           tts_angular           |    1    | 0.890125 |
|        soft_actor_critic        |   256   | 0.888889 |
|          mobilenet_v2           |    1    | 0.873601 |
|           timm_nfnet            |    1    | 0.872879 |
|  timm_vision_transformer_large  |    1    |  0.8694  |
|          squeezenet1_1          |    1    | 0.867536 |
|        timm_efficientnet        |    1    | 0.86564  |
|            moondream            |    1    | 0.864845 |
|       mobilenet_v3_large        |    1    | 0.863694 |
|     timm_vision_transformer     |    1    | 0.860665 |
|              vgg16              |    1    | 0.860449 |
|              dcgan              |    1    | 0.860215 |
|          maml_omniglot          |    5    | 0.859706 |
|          lennard_jones          |    1    | 0.859619 |
|           mnasnet1_0            |    1    | 0.859494 |
|     functorch_maml_omniglot     |    1    | 0.858306 |
|          timm_resnest           |    1    | 0.854347 |
|         phlippe_resnet          |    1    | 0.846026 |
|             alexnet             |    1    | 0.843811 |
|       shufflenet_v2_x1_0        |    1    | 0.843259 |
|      functorch_dp_cifar10       |    1    | 0.839722 |
|     nvidia_deeprecommender      |    1    | 0.836801 |
|     pyhpc_equation_of_state     |    1    | 0.833333 |
|           hf_Reformer           |    1    | 0.830053 |
|             yolov3              |    1    | 0.827218 |
|         resnext50_32x4d         |    1    | 0.826729 |
|            resnet18             |    1    | 0.812283 |
|     pyhpc_isoneutral_mixing     |    1    |  0.8116  |
|        phlippe_densenet         |    1    | 0.808821 |
|           timm_vovnet           |    1    |  0.808   |
|            resnet50             |    1    | 0.807132 |
|           densenet121           |    1    | 0.802612 |
|              maml               |    1    | 0.794457 |
|           timm_regnet           |    1    | 0.793398 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.76387  |
|            resnet152            |    1    | 0.760069 |
|        timm_efficientdet        |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|       doctr_det_predictor       |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|           hf_T5_base            |    1    | 9815.592358 |
|          hf_GPT2_large          |    1    | 3047.777915 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 2922.890057 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2887.271924 |
|            moondream            |    1    | 2442.942489 |
|           hf_T5_large           |    1    | 2179.307242 |
|        hf_distil_whisper        |    1    | 1995.107928 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1753.101904 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1740.151894 |
|          pytorch_unet           |    1    | 1512.987319 |
|       Background_Matting        |    1    | 1487.15727  |
|             demucs              |    1    | 1200.928853 |
|  timm_vision_transformer_large  |    1    | 959.216462  |
|    detectron2_fcos_r_50_fpn     |    1    | 719.847629  |
|           hf_BigBird            |    1    | 522.251188  |
|          hf_Longformer          |    1    | 490.254076  |
|      torch_multimodal_clip      |    1    | 452.984443  |
|          hf_Bert_large          |    1    | 439.482966  |
|             hf_Bart             |    1    | 242.345983  |
|              hf_T5              |    1    | 235.967586  |
|         pytorch_stargan         |   16    | 233.359452  |
|          fastNLP_Bert           |    1    | 221.405756  |
|            hf_Albert            |    1    | 194.946387  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 184.365438  |
|       speech_transformer        |    1    | 176.007435  |
|             hf_Bert             |    1    | 171.721882  |
|           hf_Reformer           |    1    | 169.204886  |
|             hf_GPT2             |    1    | 130.094988  |
|          hf_DistilBert          |    1    | 103.534745  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  89.005778  |
|        basic_gnn_edgecnn        |    1    |  88.931254  |
|             yolov3              |    1    |  70.597825  |
|          BERT_pytorch           |    1    |  50.371181  |
|              maml               |    1    |  48.906979  |
|              vgg16              |    1    |  41.370673  |
|     nvidia_deeprecommender      |    1    |  38.452949  |
|           timm_regnet           |    1    |  35.83613   |
|           timm_nfnet            |    1    |  35.83256   |
|           tts_angular           |    1    |  30.211379  |
|          basic_gnn_gcn          |    1    |  29.101512  |
|            resnet152            |    1    |  29.054804  |
|     timm_vision_transformer     |    1    |   18.6081   |
|         basic_gnn_sage          |    1    |  16.059777  |
|          basic_gnn_gin          |    1    |  15.243748  |
|         resnext50_32x4d         |    1    |  13.020418  |
|           densenet121           |    1    |  12.830312  |
|           timm_vovnet           |    1    |  12.260359  |
|             alexnet             |    1    |  11.401221  |
|              llama              |    1    |  11.274005  |
|            resnet50             |    1    |  10.475948  |
|        timm_efficientnet        |    1    |  9.239829   |
|     resnet50_quantized_qat      |    1    |  7.576081   |
|          timm_resnest           |    1    |  6.179932   |
|      doctr_reco_predictor       |    1    |  5.301373   |
|   mobilenet_v2_quantized_qat    |    1    |  4.840723   |
|       mobilenet_v3_large        |    1    |  3.756908   |
|            resnet18             |    1    |  3.730091   |
|           mnasnet1_0            |    1    |  3.099277   |
|          mobilenet_v2           |    1    |  3.090951   |
|       shufflenet_v2_x1_0        |    1    |  2.892706   |
|         LearningToPaint         |    1    |  2.371388   |
|          squeezenet1_1          |    1    |  1.899339   |
|      functorch_dp_cifar10       |    1    |  1.790431   |
|         opacus_cifar10          |    1    |  1.769071   |
|        phlippe_densenet         |    1    |   1.73579   |
|               drq               |    1    |  0.933427   |
|         phlippe_resnet          |    1    |  0.636103   |
|        soft_actor_critic        |   256   |   0.57198   |
|              dcgan              |    1    |  0.533103   |
|     functorch_maml_omniglot     |    1    |  0.518953   |
|              dlrm               |    1    |   0.48252   |
|          maml_omniglot          |    5    |  0.271423   |
|     pyhpc_isoneutral_mixing     |    1    |  0.029722   |
|     pyhpc_equation_of_state     |    1    |  0.022478   |
|          lennard_jones          |    1    |  0.017845   |
|        timm_efficientdet        |    0    |     0.0     |
|              moco               |    0    |     0.0     |
|       doctr_det_predictor       |    0    |     0.0     |
|         DALLE2_pytorch          |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 3.437697 |
|     PegasusForConditionalGeneration     | 1  | 2.793496 |
|     MobileBertForQuestionAnswering      | 1  | 2.683774 |
|             XGLMForCausalLM             | 1  | 2.660472 |
|          DistilBertForMaskedLM          | 1  | 2.652202 |
|           PegasusForCausalLM            | 1  | 2.597506 |
|     DistilBertForQuestionAnswering      | 1  | 2.528823 |
|     M2M100ForConditionalGeneration      | 1  | 2.522236 |
|          BlenderbotForCausalLM          | 1  | 2.492318 |
|            YituTechConvBert             | 1  | 2.430005 |
|       MT5ForConditionalGeneration       | 1  | 2.397343 |
|       BlenderbotSmallForCausalLM        | 1  | 2.353225 |
|         Speech2Text2ForCausalLM         | 1  | 2.310875 |
| BlenderbotSmallForConditionalGeneration | 1  | 2.24582  |
|           DebertaForMaskedLM            | 1  | 1.870955 |
|            XLNetLMHeadModel             | 1  | 1.862896 |
|       DebertaForQuestionAnswering       | 1  | 1.834845 |
|           LayoutLMForMaskedLM           | 1  | 1.826953 |
|           ElectraForCausalLM            | 1  | 1.793856 |
|                CamemBert                | 1  | 1.791466 |
|             BertForMaskedLM             | 1  | 1.784112 |
|           RobertaForCausalLM            | 1  | 1.779905 |
|    LayoutLMForSequenceClassification    | 1  | 1.736176 |
|            TrOCRForCausalLM             | 1  | 1.699903 |
|        BertForQuestionAnswering         | 1  | 1.692879 |
|    MegatronBertForQuestionAnswering     | 1  | 1.686646 |
|       RobertaForQuestionAnswering       | 1  | 1.674531 |
|         MegatronBertForCausalLM         | 1  | 1.667842 |
|       ElectraForQuestionAnswering       | 1  | 1.607373 |
|               DistillGPT2               | 1  | 1.54808  |
|          DebertaV2ForMaskedLM           | 1  | 1.496805 |
|             OPTForCausalLM              | 1  | 1.465089 |
|      GPT2ForSequenceClassification      | 1  | 1.444558 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.426219 |
|             BartForCausalLM             | 1  | 1.383401 |
|      BartForConditionalGeneration       | 1  | 1.337012 |
|            PLBartForCausalLM            | 1  | 1.331689 |
|     PLBartForConditionalGeneration      | 1  | 1.327155 |
|            MBartForCausalLM             | 1  | 1.299485 |
|      MBartForConditionalGeneration      | 1  | 1.279027 |
|               GoogleFnet                | 1  | 1.22026  |
|       T5ForConditionalGeneration        | 1  | 1.132109 |
|                 T5Small                 | 1  | 1.131175 |
|            AlbertForMaskedLM            | 1  | 1.092077 |
|       AlbertForQuestionAnswering        | 1  | 1.081243 |
|          AllenaiLongformerBase          | 1  | 0.807531 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |  fail_accuracy   |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 47.869343 |
|          MobileBertForMaskedLM          | 1  | 36.553966 |
|     MobileBertForQuestionAnswering      | 1  | 36.538341 |
|     PegasusForConditionalGeneration     | 1  | 34.586358 |
|     M2M100ForConditionalGeneration      | 1  | 33.08983  |
|          BlenderbotForCausalLM          | 1  | 32.077615 |
|      MBartForConditionalGeneration      | 1  | 32.014832 |
|             XGLMForCausalLM             | 1  | 28.836005 |
|       MT5ForConditionalGeneration       | 1  | 28.437861 |
|            XLNetLMHeadModel             | 1  | 27.527151 |
|         MegatronBertForCausalLM         | 1  | 26.106523 |
|       T5ForConditionalGeneration        | 1  | 25.399593 |
|                 T5Small                 | 1  | 25.321147 |
|    MegatronBertForQuestionAnswering     | 1  | 25.065447 |
|      DebertaV2ForQuestionAnswering      | 1  | 24.444756 |
|          DebertaV2ForMaskedLM           | 1  | 24.201255 |
| BlenderbotSmallForConditionalGeneration | 1  | 22.768986 |
|           PegasusForCausalLM            | 1  | 22.087837 |
|      BartForConditionalGeneration       | 1  | 21.787148 |
|            YituTechConvBert             | 1  | 21.17981  |
|     PLBartForConditionalGeneration      | 1  | 20.737277 |
|            MBartForCausalLM             | 1  | 20.208013 |
|      GPT2ForSequenceClassification      | 1  | 20.062108 |
|             OPTForCausalLM              | 1  | 19.905354 |
|               DistillGPT2               | 1  | 17.600112 |
|       AlbertForQuestionAnswering        | 1  | 17.500815 |
|            AlbertForMaskedLM            | 1  | 17.309924 |
|       DebertaForQuestionAnswering       | 1  | 16.818895 |
|           DebertaForMaskedLM            | 1  | 16.735079 |
|            TrOCRForCausalLM             | 1  | 16.515574 |
|       RobertaForQuestionAnswering       | 1  | 16.296878 |
|           RobertaForCausalLM            | 1  | 16.232977 |
|                CamemBert                | 1  | 16.209266 |
|    LayoutLMForSequenceClassification    | 1  | 15.983177 |
|           LayoutLMForMaskedLM           | 1  | 15.455894 |
|           ElectraForCausalLM            | 1  | 15.248355 |
|        BertForQuestionAnswering         | 1  | 15.22362  |
|             BertForMaskedLM             | 1  | 15.087415 |
|       ElectraForQuestionAnswering       | 1  | 15.063536 |
|       BlenderbotSmallForCausalLM        | 1  | 14.331566 |
|         Speech2Text2ForCausalLM         | 1  | 14.109315 |
|               GoogleFnet                | 1  | 13.778022 |
|     DistilBertForQuestionAnswering      | 1  | 13.493766 |
|            PLBartForCausalLM            | 1  | 13.484188 |
|          DistilBertForMaskedLM          | 1  | 13.452258 |
|             BartForCausalLM             | 1  | 13.041503 |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.977595 |
|            XLNetLMHeadModel             | 1  | 0.973782 |
|               DistillGPT2               | 1  | 0.952985 |
|             BartForCausalLM             | 1  | 0.946377 |
|            PLBartForCausalLM            | 1  | 0.944677 |
|            MBartForCausalLM             | 1  | 0.943482 |
|      GPT2ForSequenceClassification      | 1  | 0.942962 |
|    MegatronBertForQuestionAnswering     | 1  | 0.942605 |
|       RobertaForQuestionAnswering       | 1  | 0.938742 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.93579  |
|           LayoutLMForMaskedLM           | 1  | 0.935493 |
|       MT5ForConditionalGeneration       | 1  | 0.934127 |
|          BlenderbotForCausalLM          | 1  | 0.932677 |
|       T5ForConditionalGeneration        | 1  | 0.929368 |
|                 T5Small                 | 1  | 0.929145 |
|            YituTechConvBert             | 1  | 0.928459 |
|                CamemBert                | 1  | 0.927584 |
|           RobertaForCausalLM            | 1  | 0.926584 |
|     PLBartForConditionalGeneration      | 1  | 0.926124 |
|            TrOCRForCausalLM             | 1  | 0.925557 |
|             BertForMaskedLM             | 1  | 0.925325 |
|           DebertaForMaskedLM            | 1  | 0.922603 |
|      BartForConditionalGeneration       | 1  | 0.922486 |
|      MBartForConditionalGeneration      | 1  | 0.921191 |
|     M2M100ForConditionalGeneration      | 1  | 0.919512 |
|             XGLMForCausalLM             | 1  | 0.917691 |
|    LayoutLMForSequenceClassification    | 1  | 0.916358 |
|        BertForQuestionAnswering         | 1  | 0.916325 |
|          AllenaiLongformerBase          | 1  | 0.909402 |
|          DebertaV2ForMaskedLM           | 1  | 0.906969 |
|       BlenderbotSmallForCausalLM        | 1  | 0.904367 |
|       DebertaForQuestionAnswering       | 1  | 0.902461 |
|          DistilBertForMaskedLM          | 1  | 0.901755 |
|           PegasusForCausalLM            | 1  | 0.895732 |
|           ElectraForCausalLM            | 1  | 0.890415 |
|         MegatronBertForCausalLM         | 1  | 0.888234 |
|     PegasusForConditionalGeneration     | 1  |  0.8851  |
|     DistilBertForQuestionAnswering      | 1  | 0.875562 |
|         Speech2Text2ForCausalLM         | 1  | 0.863593 |
|       ElectraForQuestionAnswering       | 1  | 0.860571 |
|               GoogleFnet                | 1  | 0.856029 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.849093 |
|          MobileBertForMaskedLM          | 1  | 0.784731 |
|     MobileBertForQuestionAnswering      | 1  | 0.757581 |
|            AlbertForMaskedLM            | 1  | 0.688589 |
|       AlbertForQuestionAnswering        | 1  | 0.668134 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+-------------+
|                  name                   | bs |  inductor   |
+-----------------------------------------+----+-------------+
|            AlbertForMaskedLM            | 1  | 4224.197781 |
|       AlbertForQuestionAnswering        | 1  | 4217.049641 |
|             OPTForCausalLM              | 1  | 2157.498129 |
|      MBartForConditionalGeneration      | 1  | 1785.701747 |
|      BartForConditionalGeneration       | 1  | 1406.410089 |
|          DebertaV2ForMaskedLM           | 1  | 1355.574547 |
|          AllenaiLongformerBase          | 1  | 1113.537783 |
|            XLNetLMHeadModel             | 1  | 1047.053947 |
|      DebertaV2ForQuestionAnswering      | 1  | 1042.22374  |
|            MBartForCausalLM             | 1  | 962.342803  |
|       T5ForConditionalGeneration        | 1  | 809.292421  |
|                 T5Small                 | 1  | 808.758778  |
|          BlenderbotForCausalLM          | 1  | 749.464862  |
|     PLBartForConditionalGeneration      | 1  | 652.160528  |
|             BartForCausalLM             | 1  | 614.788261  |
|         MegatronBertForCausalLM         | 1  | 509.092068  |
|      GPT2ForSequenceClassification      | 1  | 470.478392  |
|    MegatronBertForQuestionAnswering     | 1  | 466.459029  |
|               GoogleFnet                | 1  | 451.117314  |
|            PLBartForCausalLM            | 1  | 374.939385  |
|             XGLMForCausalLM             | 1  | 240.634055  |
|           DebertaForMaskedLM            | 1  | 213.217136  |
|     M2M100ForConditionalGeneration      | 1  |  209.4765   |
|           RobertaForCausalLM            | 1  |  208.0948   |
|            YituTechConvBert             | 1  | 196.212173  |
|                CamemBert                | 1  | 181.951734  |
|     PegasusForConditionalGeneration     | 1  | 178.961304  |
|           LayoutLMForMaskedLM           | 1  | 178.782775  |
|             BertForMaskedLM             | 1  | 178.617384  |
|            TrOCRForCausalLM             | 1  | 172.427078  |
|               DistillGPT2               | 1  | 165.663697  |
|       DebertaForQuestionAnswering       | 1  | 146.747351  |
|        BertForQuestionAnswering         | 1  | 142.143122  |
|    LayoutLMForSequenceClassification    | 1  | 141.649028  |
|       RobertaForQuestionAnswering       | 1  | 141.590184  |
|       MT5ForConditionalGeneration       | 1  |  111.00233  |
|           PegasusForCausalLM            | 1  |  88.297397  |
| BlenderbotSmallForConditionalGeneration | 1  |  52.599947  |
|           ElectraForCausalLM            | 1  |  49.992997  |
|          DistilBertForMaskedLM          | 1  |  29.878975  |
|          MobileBertForMaskedLM          | 1  |  29.37485   |
|       ElectraForQuestionAnswering       | 1  |  29.164048  |
|       BlenderbotSmallForCausalLM        | 1  |  28.073636  |
|     DistilBertForQuestionAnswering      | 1  |  18.322514  |
|     MobileBertForQuestionAnswering      | 1  |  16.729135  |
|         Speech2Text2ForCausalLM         | 1  |  5.494295   |
+-----------------------------------------+----+-------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|           resnest101e           | 1  | 5.501152 |
|          pnasnet5large          | 1  | 5.347219 |
|        ese_vovnet19b_dw         | 1  | 5.278805 |
|         mobilenetv2_100         | 1  | 4.775536 |
|           fbnetc_100            | 1  | 4.663554 |
|          cspdarknet53           | 1  | 4.603987 |
|          botnet26t_256          | 1  | 4.484101 |
|          spnasnet_100           | 1  | 4.443077 |
|           mnasnet_100           | 1  | 4.442823 |
|             dla102              | 1  | 4.424196 |
|          inception_v3           | 1  | 4.423457 |
|     swsl_resnext101_32x16d      | 1  | 4.421085 |
|           res2next50            | 1  | 4.380559 |
|       gluon_inception_v3        | 1  | 4.327795 |
|        adv_inception_v3         | 1  | 4.323594 |
|           selecsls42b           | 1  |  4.177   |
|            gernet_l             | 1  | 4.056972 |
|           dm_nfnet_f0           | 1  | 4.01962  |
|      mobilenetv3_large_100      | 1  | 3.922834 |
|        res2net50_14w_8s         | 1  | 3.91999  |
|            fbnetv3_b            | 1  | 3.893097 |
|            nfnet_l0             | 1  | 3.835032 |
|        res2net101_26w_4s        | 1  | 3.751096 |
|            repvgg_a2            | 1  | 3.567354 |
|           regnety_002           | 1  | 3.565002 |
|            hrnet_w18            | 1  | 3.514345 |
|            lcnet_050            | 1  | 3.428615 |
|       eca_botnext26ts_256       | 1  | 3.407544 |
|          ghostnet_100           | 1  | 3.149317 |
|         visformer_small         | 1  | 3.059317 |
|             dpn107              | 1  | 2.932961 |
|         poolformer_m36          | 1  | 2.891019 |
|        eca_halonext26ts         | 1  | 2.781282 |
|           rexnet_100            | 1  | 2.668809 |
|           mobilevit_s           | 1  | 2.604388 |
|            tinynet_a            | 1  | 2.497953 |
|       tf_efficientnet_b0        | 1  | 2.348632 |
|            levit_128            | 1  | 2.305941 |
|           tf_mixnet_l           | 1  | 2.219646 |
|            mixnet_l             | 1  | 2.136557 |
|        twins_pcpvt_base         | 1  | 2.027297 |
|          gmixer_24_224          | 1  | 1.884714 |
|           volo_d1_224           | 1  | 1.844107 |
|        convmixer_768_32         | 1  | 1.778181 |
|          gmlp_s16_224           | 1  | 1.766799 |
|      beit_base_patch16_224      | 1  | 1.75332  |
|            pit_b_224            | 1  | 1.737974 |
|  swin_base_patch4_window7_224   | 1  | 1.619421 |
|      vit_base_patch16_224       | 1  | 1.602657 |
|      xcit_large_24_p8_224       | 1  | 1.585866 |
|          convnext_base          | 1  | 1.573751 |
|           convit_base           | 1  | 1.489692 |
|          cait_m36_384           | 1  | 1.436587 |
|         crossvit_9_240          | 1  | 1.431853 |
|        tnt_s_patch16_224        | 1  | 1.383236 |
| deit_base_distilled_patch16_224 | 1  | 1.351251 |
|          jx_nest_base           | 1  | 1.345634 |
|        sebotnet33ts_256         | 1  | 1.342483 |
|          resmlp_12_224          | 1  | 1.187716 |
|          mixer_b16_224          | 1  | 1.155683 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|       eca_botnext26ts_256       | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          cait_m36_384           | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
|        eca_halonext26ts         | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|          botnet26t_256          | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|        convmixer_768_32         | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|        sebotnet33ts_256         | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|         coat_lite_mini          | 1  |  fail_to_run  |
|           resnest101e           | 1  | fail_accuracy |
|            levit_128            | 1  | fail_accuracy |
|          ghostnet_100           | 1  | fail_accuracy |
|          pnasnet5large          | 1  | fail_accuracy |
|        res2net101_26w_4s        | 1  | fail_accuracy |
|        res2net50_14w_8s         | 1  | fail_accuracy |
|             dla102              | 1  | fail_accuracy |
|     swsl_resnext101_32x16d      | 1  | fail_accuracy |
|           rexnet_100            | 1  | fail_accuracy |
|           selecsls42b           | 1  | fail_accuracy |
|           res2next50            | 1  | fail_accuracy |
|            hrnet_w18            | 1  | fail_accuracy |
|           volo_d1_224           | 1  | fail_accuracy |
|         visformer_small         | 1  | fail_accuracy |
|      xcit_large_24_p8_224       | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|          pnasnet5large          | 1  | 63.199887 |
|  swin_base_patch4_window7_224   | 1  | 62.464935 |
|           tf_mixnet_l           | 1  | 53.258291 |
|             dpn107              | 1  | 50.101508 |
|          jx_nest_base           | 1  |  44.8854  |
|        res2net50_14w_8s         | 1  | 43.662789 |
|           rexnet_100            | 1  | 43.647773 |
|         crossvit_9_240          | 1  | 41.977995 |
|          ghostnet_100           | 1  | 40.577418 |
|            mixnet_l             | 1  | 39.359018 |
|        sebotnet33ts_256         | 1  | 39.128763 |
|      xcit_large_24_p8_224       | 1  | 39.056232 |
|         poolformer_m36          | 1  | 38.732983 |
|            levit_128            | 1  | 38.715562 |
|        tnt_s_patch16_224        | 1  | 38.213167 |
|        twins_pcpvt_base         | 1  | 37.993401 |
|           dm_nfnet_f0           | 1  | 37.650147 |
|        eca_halonext26ts         | 1  | 37.421702 |
|           mobilevit_s           | 1  | 36.497595 |
|          cait_m36_384           | 1  | 35.563489 |
|         visformer_small         | 1  | 35.093183 |
|           volo_d1_224           | 1  | 32.358297 |
|           resnest101e           | 1  | 32.057673 |
|       eca_botnext26ts_256       | 1  | 31.568417 |
|       tf_efficientnet_b0        | 1  | 31.443591 |
|            hrnet_w18            | 1  | 30.970683 |
|        res2net101_26w_4s        | 1  | 29.980518 |
|            nfnet_l0             | 1  | 29.122747 |
|          convnext_base          | 1  | 28.545045 |
|        adv_inception_v3         | 1  | 27.501551 |
|          inception_v3           | 1  | 27.479327 |
|       gluon_inception_v3        | 1  | 27.472576 |
|            tinynet_a            | 1  | 26.523392 |
|           res2next50            | 1  | 25.926784 |
|           convit_base           | 1  | 25.562481 |
|            pit_b_224            | 1  | 23.542401 |
|          botnet26t_256          | 1  | 22.804377 |
|          cspdarknet53           | 1  | 21.041789 |
|             dla102              | 1  | 21.033871 |
|            fbnetv3_b            | 1  | 20.535621 |
| deit_base_distilled_patch16_224 | 1  | 19.602836 |
|          gmixer_24_224          | 1  | 18.34107  |
|      vit_base_patch16_224       | 1  | 17.776104 |
|      mobilenetv3_large_100      | 1  | 17.615932 |
|        ese_vovnet19b_dw         | 1  | 17.532875 |
|          gmlp_s16_224           | 1  | 16.764892 |
|           regnety_002           | 1  | 14.212472 |
|          resmlp_12_224          | 1  | 13.875327 |
|      beit_base_patch16_224      | 1  | 13.862661 |
|            repvgg_a2            | 1  | 12.868333 |
|          mixer_b16_224          | 1  | 12.528934 |
|        convmixer_768_32         | 1  | 12.389191 |
|           selecsls42b           | 1  | 12.225657 |
|            lcnet_050            | 1  | 10.382233 |
|     swsl_resnext101_32x16d      | 1  | 9.673218  |
|           fbnetc_100            | 1  | 7.827535  |
|          spnasnet_100           | 1  | 7.775008  |
|            gernet_l             | 1  | 7.749711  |
|         mobilenetv2_100         | 1  | 7.648139  |
|           mnasnet_100           | 1  | 7.401917  |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|            nfnet_l0             | 1  | 0.911588 |
|        convmixer_768_32         | 1  | 0.910562 |
|          convnext_base          | 1  | 0.902079 |
|      vit_base_patch16_224       | 1  | 0.897619 |
|      beit_base_patch16_224      | 1  | 0.896275 |
|          cait_m36_384           | 1  | 0.895308 |
|          pnasnet5large          | 1  | 0.891912 |
|        ese_vovnet19b_dw         | 1  | 0.891462 |
| deit_base_distilled_patch16_224 | 1  | 0.891405 |
|          resmlp_12_224          | 1  | 0.890943 |
|           dm_nfnet_f0           | 1  | 0.888344 |
|           convit_base           | 1  | 0.886022 |
|  swin_base_patch4_window7_224   | 1  | 0.884335 |
|         mobilenetv2_100         | 1  | 0.883187 |
|           volo_d1_224           | 1  | 0.880488 |
|         poolformer_m36          | 1  | 0.878183 |
|         visformer_small         | 1  | 0.876983 |
|          mixer_b16_224          | 1  | 0.875792 |
|           mnasnet_100           | 1  | 0.874961 |
|      mobilenetv3_large_100      | 1  | 0.873555 |
|            pit_b_224            | 1  | 0.873218 |
|           fbnetc_100            | 1  | 0.873195 |
|          spnasnet_100           | 1  | 0.871263 |
|          gmlp_s16_224           | 1  | 0.870027 |
|          gmixer_24_224          | 1  | 0.869117 |
|        twins_pcpvt_base         | 1  | 0.867348 |
|            lcnet_050            | 1  | 0.866167 |
|        eca_halonext26ts         | 1  | 0.866163 |
|       eca_botnext26ts_256       | 1  | 0.863547 |
|            fbnetv3_b            | 1  | 0.863254 |
|           mobilevit_s           | 1  | 0.863198 |
|            tinynet_a            | 1  | 0.863137 |
|       tf_efficientnet_b0        | 1  | 0.86214  |
|          botnet26t_256          | 1  | 0.860862 |
|           rexnet_100            | 1  | 0.85542  |
|      xcit_large_24_p8_224       | 1  | 0.850732 |
|          jx_nest_base           | 1  | 0.849725 |
|          ghostnet_100           | 1  | 0.846538 |
|           regnety_002           | 1  | 0.846388 |
|        tnt_s_patch16_224        | 1  | 0.842357 |
|           tf_mixnet_l           | 1  | 0.840939 |
|            mixnet_l             | 1  | 0.839484 |
|        sebotnet33ts_256         | 1  | 0.833802 |
|         crossvit_9_240          | 1  | 0.832729 |
|            levit_128            | 1  | 0.828585 |
|             dpn107              | 1  | 0.82616  |
|          cspdarknet53           | 1  | 0.82331  |
|           res2next50            | 1  | 0.822464 |
|             dla102              | 1  | 0.810063 |
|        res2net50_14w_8s         | 1  | 0.808504 |
|        adv_inception_v3         | 1  | 0.808194 |
|       gluon_inception_v3        | 1  | 0.807961 |
|            hrnet_w18            | 1  | 0.807511 |
|          inception_v3           | 1  | 0.807404 |
|           resnest101e           | 1  | 0.803339 |
|           selecsls42b           | 1  | 0.79552  |
|            gernet_l             | 1  | 0.786777 |
|            repvgg_a2            | 1  | 0.773834 |
|        res2net101_26w_4s        | 1  | 0.773741 |
|     swsl_resnext101_32x16d      | 1  | 0.74476  |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 1636.051442 |
|      xcit_large_24_p8_224       | 1  | 453.253451  |
|          pnasnet5large          | 1  | 124.142378  |
|          convnext_base          | 1  | 103.822523  |
|          jx_nest_base           | 1  | 101.230214  |
|        convmixer_768_32         | 1  | 100.059859  |
|     swsl_resnext101_32x16d      | 1  |  96.330402  |
|           convit_base           | 1  |  96.113892  |
| deit_base_distilled_patch16_224 | 1  |  85.160866  |
|  swin_base_patch4_window7_224   | 1  |  82.98731   |
|      beit_base_patch16_224      | 1  |  75.728602  |
|          mixer_b16_224          | 1  |  72.980481  |
|      vit_base_patch16_224       | 1  |  72.917041  |
|             dpn107              | 1  |  68.995727  |
|            pit_b_224            | 1  |  65.890875  |
|         poolformer_m36          | 1  |  51.86013   |
|           dm_nfnet_f0           | 1  |  50.656565  |
|        tnt_s_patch16_224        | 1  |  49.89051   |
|        twins_pcpvt_base         | 1  |  43.174571  |
|        sebotnet33ts_256         | 1  |  41.908513  |
|           volo_d1_224           | 1  |  40.646721  |
|            nfnet_l0             | 1  |  33.967185  |
|           resnest101e           | 1  |  31.210086  |
|        res2net101_26w_4s        | 1  |  28.014062  |
|          gmlp_s16_224           | 1  |  27.529634  |
|          gmixer_24_224          | 1  |  25.443697  |
|          resmlp_12_224          | 1  |  22.576581  |
|           mobilevit_s           | 1  |  21.864841  |
|            hrnet_w18            | 1  |  21.610275  |
|         visformer_small         | 1  |  21.562488  |
|       gluon_inception_v3        | 1  |  20.802232  |
|          inception_v3           | 1  |  20.76947   |
|        adv_inception_v3         | 1  |  20.595319  |
|             dla102              | 1  |  18.825752  |
|        eca_halonext26ts         | 1  |  18.443216  |
|          cspdarknet53           | 1  |  17.414992  |
|        res2net50_14w_8s         | 1  |  16.848035  |
|           tf_mixnet_l           | 1  |  16.833149  |
|            mixnet_l             | 1  |  16.725233  |
|       eca_botnext26ts_256       | 1  |  15.009696  |
|           res2next50            | 1  |  14.918319  |
|         crossvit_9_240          | 1  |  14.700202  |
|            repvgg_a2            | 1  |  12.705689  |
|            gernet_l             | 1  |  11.659213  |
|          botnet26t_256          | 1  |  10.995347  |
|           selecsls42b           | 1  |  10.49794   |
|       tf_efficientnet_b0        | 1  |  9.694478   |
|        ese_vovnet19b_dw         | 1  |  9.231032   |
|           rexnet_100            | 1  |  8.511765   |
|            tinynet_a            | 1  |  8.138135   |
|            fbnetv3_b            | 1  |  8.094722   |
|            levit_128            | 1  |  5.827954   |
|          ghostnet_100           | 1  |  4.928684   |
|           fbnetc_100            | 1  |  3.757482   |
|      mobilenetv3_large_100      | 1  |  3.737674   |
|          spnasnet_100           | 1  |  3.509754   |
|           mnasnet_100           | 1  |  3.171781   |
|         mobilenetv2_100         | 1  |   3.09185   |
|           regnety_002           | 1  |  3.028667   |
|            lcnet_050            | 1  |  1.389893   |
+---------------------------------+----+-------------+

@zxd1997066
Copy link
Contributor

[cppwrapper_static_shape] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-04-28 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 7478b7f1cac9686f00edf3db4667cf86d2421531
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000" 

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 85%, 67/79 | 100%, 46/46 | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.44x    |    1.38x    |    1.92x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   56.46    |    39.63    |    37.01    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.89x    |    0.97x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 11.107687 |
|          squeezenet1_1          |   16    | 3.403179  |
|       mobilenet_v3_large        |   32    | 3.196743  |
|          mobilenet_v2           |   16    | 3.029048  |
|        timm_efficientnet        |   64    |  2.99501  |
|           mnasnet1_0            |   32    | 2.893412  |
|       shufflenet_v2_x1_0        |   64    | 2.718108  |
|          timm_resnest           |   32    | 2.356622  |
|            resnet50             |   32    | 2.225098  |
|        soft_actor_critic        |   256   | 2.204391  |
|        phlippe_densenet         |   128   | 2.199451  |
|         resnext50_32x4d         |    8    | 2.106781  |
|            resnet18             |    8    | 2.039335  |
|            resnet152            |   32    | 2.030724  |
|         phlippe_resnet          |   128   | 2.009596  |
|           densenet121           |   64    | 1.987502  |
|       doctr_det_predictor       |    1    | 1.964473  |
|           timm_regnet           |   32    | 1.915856  |
|             hf_GPT2             |    1    |  1.90563  |
|          maml_omniglot          |    5    | 1.890181  |
|           timm_nfnet            |   128   | 1.810121  |
|          BERT_pytorch           |    2    | 1.703038  |
|           timm_vovnet           |   32    | 1.695481  |
|             alexnet             |   128   | 1.677166  |
|      doctr_reco_predictor       |    1    | 1.630337  |
|          hf_Bert_large          |    1    | 1.616671  |
|             yolov3              |    8    | 1.610404  |
|            hf_Albert            |    1    |  1.57944  |
|     functorch_maml_omniglot     |    1    | 1.563881  |
|          fastNLP_Bert           |    1    | 1.554981  |
|            moondream            |    1    | 1.528736  |
|          hf_GPT2_large          |    1    |  1.52108  |
|              llama              |   32    | 1.516062  |
|             hf_Bert             |    1    | 1.496686  |
|          hf_Longformer          |    1    | 1.475738  |
|              vgg16              |    4    | 1.453084  |
|        basic_gnn_edgecnn        |    1    | 1.451833  |
|         LearningToPaint         |   96    | 1.439209  |
|              dcgan              |   256   | 1.400515  |
|          lennard_jones          |  1000   | 1.359043  |
|          hf_DistilBert          |    1    |  1.34723  |
|      torch_multimodal_clip      |   32    | 1.336592  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.293925  |
|             hf_Bart             |    1    | 1.288505  |
|           hf_BigBird            |    1    | 1.282377  |
|         basic_gnn_sage          |    1    | 1.249675  |
|          basic_gnn_gcn          |    1    |  1.24496  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.234328  |
|        hf_distil_whisper        |    1    | 1.232716  |
|     nvidia_deeprecommender      |   256   | 1.223505  |
|         pytorch_stargan         |   16    | 1.220446  |
|           hf_T5_large           |    1    | 1.217037  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.194586  |
|          pytorch_unet           |    1    | 1.192296  |
|              dlrm               |  2048   | 1.162923  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.158992  |
|          basic_gnn_gin          |    1    | 1.153713  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.138988  |
|              hf_T5              |    1    | 1.129593  |
|     timm_vision_transformer     |   32    | 1.115172  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.079925  |
|  timm_vision_transformer_large  |   32    | 1.058562  |
|       speech_transformer        |    1    | 1.053716  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.052266  |
|           hf_Reformer           |    1    | 1.030322  |
|             demucs              |    1    | 1.016302  |
|               drq               |    1    | 1.013106  |
|           tts_angular           |   64    | 1.000424  |
|   mobilenet_v2_quantized_qat    |   96    | 0.995658  |
|     resnet50_quantized_qat      |   32    | 0.986069  |
|       Background_Matting        |    1    | 0.809306  |
|           hf_T5_base            |    1    | 0.791838  |
|         opacus_cifar10          |   64    | 0.747125  |
|              maml               |    1    | 0.708783  |
|      functorch_dp_cifar10       |   64    | 0.696628  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.682207  |
|              moco               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|         basic_gnn_sage          |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    4    |        pass        |
|          hf_Longformer          |    4    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    4    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    4    |        pass        |
|             demucs              |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|              dlrm               |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|           hf_T5_base            |    4    |        pass        |
|               drq               |    1    |        pass        |
|            hf_Albert            |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|             yolov3              |    4    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|          lennard_jones          |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|            moondream            |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|              llama              |    4    |        pass        |
|        hf_distil_whisper        |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|              hf_T5              |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|       speech_transformer        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_fpn |    4    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
|           Super_SloMo           |    4    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    4    |    fail_to_run     |
|         resnext50_32x4d         |    4    |   fail_accuracy    |
|            resnet152            |    4    |   fail_accuracy    |
|       shufflenet_v2_x1_0        |    4    |   fail_accuracy    |
|           mnasnet1_0            |    4    |   fail_accuracy    |
|              dcgan              |    4    |   fail_accuracy    |
|          mobilenet_v2           |    4    |   fail_accuracy    |
|            resnet18             |    4    |   fail_accuracy    |
|            resnet50             |    4    |   fail_accuracy    |
|           densenet121           |    4    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 472.48019  |
|    detectron2_fcos_r_50_fpn     |    1    | 410.351042 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 232.588485 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 224.386749 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 217.794595 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 208.015045 |
|              maml               |    1    | 143.498239 |
|           hf_T5_large           |    1    | 107.707324 |
|       speech_transformer        |    1    | 90.740052  |
|          hf_Longformer          |    1    | 85.068717  |
|           hf_Reformer           |    1    | 80.392083  |
|          basic_gnn_gcn          |    1    | 61.242104  |
|          fastNLP_Bert           |    1    | 55.356868  |
|  timm_vision_transformer_large  |   32    | 54.646453  |
|          hf_GPT2_large          |    1    | 53.756965  |
|            resnet152            |   32    | 52.662217  |
|           densenet121           |   64    | 51.462768  |
|           hf_T5_base            |    1    |  50.82289  |
|            moondream            |    1    | 49.664194  |
|     pyhpc_isoneutral_mixing     | 1048576 | 48.094395  |
|       doctr_det_predictor       |    1    | 47.150119  |
|          hf_Bert_large          |    1    | 44.521787  |
|        hf_distil_whisper        |    1    |  43.82643  |
|      torch_multimodal_clip      |   32    | 43.443676  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 42.462319  |
|             demucs              |    1    | 34.648967  |
|           timm_regnet           |   32    |  33.82971  |
|           timm_nfnet            |   128   | 32.791382  |
|             hf_Bart             |    1    |  31.53508  |
|              hf_T5              |    1    | 31.099113  |
|             yolov3              |    8    | 30.943721  |
|          BERT_pytorch           |    2    | 30.212251  |
|        timm_efficientnet        |   64    | 29.106596  |
|             hf_Bert             |    1    | 27.189313  |
|       shufflenet_v2_x1_0        |   64    | 26.453065  |
|        phlippe_densenet         |   128   | 26.209177  |
|            hf_Albert            |    1    | 25.785651  |
|      doctr_reco_predictor       |    1    | 25.387847  |
|             hf_GPT2             |    1    |  24.80771  |
|       Background_Matting        |    1    |  24.4349   |
|       mobilenet_v3_large        |   32    | 23.827101  |
|     timm_vision_transformer     |   32    | 23.822875  |
|              llama              |   32    | 22.939267  |
|         opacus_cifar10          |   64    |  22.29475  |
|          timm_resnest           |   32    |  22.26817  |
|           timm_vovnet           |   32    | 21.994706  |
|         resnext50_32x4d         |    8    | 21.580932  |
|         pytorch_stargan         |   16    | 21.500719  |
|            resnet50             |   32    | 21.481485  |
|          hf_DistilBert          |    1    | 21.105347  |
|      functorch_dp_cifar10       |   64    | 20.972852  |
|          mobilenet_v2           |   16    | 20.796521  |
|           mnasnet1_0            |   32    | 20.485572  |
|          pytorch_unet           |    1    | 19.959838  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 19.342907  |
|          squeezenet1_1          |   16    | 18.259305  |
|     pyhpc_equation_of_state     | 1048576 | 17.421133  |
|            resnet18             |    8    |  17.24856  |
|         LearningToPaint         |   96    | 17.092958  |
|         phlippe_resnet          |   128   | 16.789853  |
|              vgg16              |    4    | 16.789108  |
|             alexnet             |   128   | 16.706134  |
|               drq               |    1    | 15.803515  |
|     functorch_maml_omniglot     |    1    | 15.665086  |
|              dlrm               |  2048   |  15.25363  |
|          maml_omniglot          |    5    | 15.246607  |
|     nvidia_deeprecommender      |   256   | 14.896109  |
|        basic_gnn_edgecnn        |    1    | 14.827115  |
|              dcgan              |   256   | 14.748871  |
|          basic_gnn_gin          |    1    | 14.728535  |
|         basic_gnn_sage          |    1    | 14.676311  |
|        soft_actor_critic        |   256   | 14.237525  |
|          lennard_jones          |  1000   | 14.226849  |
|           tts_angular           |   64    | 13.778611  |
|   mobilenet_v2_quantized_qat    |   96    |  0.106836  |
|     resnet50_quantized_qat      |   32    |  0.07994   |
|         DALLE2_pytorch          |    0    |    0.0     |
|              moco               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|           timm_nfnet            |   128   | 0.99314  |
|           hf_T5_base            |    1    | 0.987957 |
|              dlrm               |  2048   | 0.987865 |
|        timm_efficientnet        |   64    | 0.984892 |
|       Background_Matting        |    1    | 0.983333 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981263 |
|           densenet121           |   64    | 0.978886 |
|          pytorch_unet           |    1    | 0.978352 |
|             demucs              |    1    | 0.977616 |
|             yolov3              |    8    | 0.975483 |
|  timm_vision_transformer_large  |   32    | 0.974104 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973411 |
|          hf_GPT2_large          |    1    | 0.972882 |
|            resnet50             |   32    | 0.970998 |
|        basic_gnn_edgecnn        |    1    | 0.969706 |
|         LearningToPaint         |   96    | 0.969662 |
|          timm_resnest           |   32    | 0.969419 |
|           timm_vovnet           |   32    | 0.969225 |
|       doctr_det_predictor       |    1    | 0.967431 |
|      torch_multimodal_clip      |   32    | 0.963874 |
|   mobilenet_v2_quantized_qat    |   96    | 0.963567 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.963338 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.963264 |
|     resnet50_quantized_qat      |   32    | 0.962423 |
|           timm_regnet           |   32    | 0.961682 |
|           mnasnet1_0            |   32    | 0.959988 |
|     timm_vision_transformer     |   32    | 0.959693 |
|            resnet152            |   32    | 0.958113 |
|          mobilenet_v2           |   16    | 0.955254 |
|       mobilenet_v3_large        |   32    | 0.954804 |
|       shufflenet_v2_x1_0        |   64    | 0.954681 |
|           hf_BigBird            |    1    | 0.951057 |
|         pytorch_stargan         |   16    | 0.947659 |
|        phlippe_densenet         |   128   | 0.944866 |
|         resnext50_32x4d         |    8    | 0.942526 |
|          basic_gnn_gcn          |    1    | 0.939137 |
|      doctr_reco_predictor       |    1    | 0.937291 |
|           tts_angular           |   64    | 0.929651 |
|              llama              |   32    | 0.918798 |
|          squeezenet1_1          |   16    | 0.917325 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.916703 |
|              dcgan              |   256   | 0.916315 |
|     pyhpc_equation_of_state     | 1048576 | 0.906121 |
|            resnet18             |    8    | 0.89781  |
|         phlippe_resnet          |   128   | 0.895349 |
|             alexnet             |   128   | 0.895253 |
|        hf_distil_whisper        |    1    | 0.893582 |
|         opacus_cifar10          |   64    | 0.890144 |
|        soft_actor_critic        |   256   | 0.883871 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.881613 |
|          lennard_jones          |  1000   | 0.861556 |
|          maml_omniglot          |    5    | 0.856835 |
|         basic_gnn_sage          |    1    | 0.855816 |
|     functorch_maml_omniglot     |    1    | 0.855759 |
|          basic_gnn_gin          |    1    | 0.85293  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.847509 |
|          fastNLP_Bert           |    1    | 0.838382 |
|       speech_transformer        |    1    | 0.815625 |
|          hf_Bert_large          |    1    | 0.802819 |
|      functorch_dp_cifar10       |   64    | 0.801917 |
|          BERT_pytorch           |    2    | 0.800744 |
|          hf_Longformer          |    1    | 0.798782 |
|            moondream            |    1    | 0.797785 |
|             hf_Bert             |    1    | 0.792493 |
|              maml               |    1    | 0.792304 |
|            hf_Albert            |    1    | 0.791617 |
|           hf_T5_large           |    1    | 0.78529  |
|     nvidia_deeprecommender      |   256   | 0.774425 |
|              vgg16              |    4    | 0.771964 |
|               drq               |    1    | 0.766031 |
|          hf_DistilBert          |    1    | 0.762249 |
|             hf_GPT2             |    1    | 0.760809 |
|              hf_T5              |    1    | 0.755819 |
|           hf_Reformer           |    1    | 0.736307 |
|             hf_Bart             |    1    | 0.736212 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.687301 |
|        timm_efficientdet        |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
|              moco               |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4235.018249 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1425.136677 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1290.353157 |
|           hf_T5_base            |    1    | 1240.897849 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1160.82743  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1073.796169 |
|          hf_GPT2_large          |    1    | 540.637571  |
|           timm_nfnet            |   128   | 538.641427  |
|           hf_T5_large           |    1    | 389.111515  |
|            moondream            |    1    | 379.439522  |
|        hf_distil_whisper        |    1    | 338.073443  |
|       Background_Matting        |    1    | 336.676863  |
|          pytorch_unet           |    1    |  224.48555  |
|           timm_regnet           |   32    | 210.969779  |
|            resnet152            |   32    | 184.115435  |
|           densenet121           |   64    | 182.497053  |
|    detectron2_fcos_r_50_fpn     |    1    | 175.518222  |
|             yolov3              |    8    | 151.578277  |
|             demucs              |    1    | 144.923083  |
|      torch_multimodal_clip      |   32    | 140.993274  |
|           hf_BigBird            |    1    | 117.161396  |
|           timm_vovnet           |   32    | 112.734881  |
|          hf_Bert_large          |    1    | 101.159606  |
|     timm_vision_transformer     |   32    |  95.874686  |
|         pytorch_stargan         |   16    |  92.136146  |
|       doctr_det_predictor       |    1    |  84.21419   |
|            resnet50             |   32    |  73.656133  |
|          hf_Longformer          |    1    |  69.184786  |
|             hf_Bart             |    1    |  51.648747  |
|          timm_resnest           |   32    |  51.411293  |
|       speech_transformer        |    1    |  51.319936  |
|              maml               |    1    |  48.691242  |
|        timm_efficientnet        |   64    |  46.871584  |
|              hf_T5              |    1    |  42.405816  |
|             alexnet             |   128   |  41.253067  |
|             hf_Bert             |    1    |  40.016073  |
|   mobilenet_v2_quantized_qat    |   96    |  38.890426  |
|         LearningToPaint         |   96    |  36.208257  |
|           hf_Reformer           |    1    |  35.820096  |
|            hf_Albert            |    1    |  34.313847  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  32.391651  |
|          fastNLP_Bert           |    1    |  32.149943  |
|              vgg16              |    4    |  31.621244  |
|     nvidia_deeprecommender      |   256   |  29.162295  |
|     pyhpc_isoneutral_mixing     | 1048576 |  28.213296  |
|          hf_DistilBert          |    1    |  26.513798  |
|     resnet50_quantized_qat      |   32    |  25.502403  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  24.689685  |
|             hf_GPT2             |    1    |  22.359449  |
|         resnext50_32x4d         |    8    |  21.385491  |
|              llama              |   32    |  20.420539  |
|           tts_angular           |   64    |  19.494096  |
|        basic_gnn_edgecnn        |    1    |  19.398851  |
|          BERT_pytorch           |    2    |  19.264242  |
|        phlippe_densenet         |   128   |  18.44131   |
|              dcgan              |   256   |  18.360744  |
|       shufflenet_v2_x1_0        |   64    |  14.49216   |
|           mnasnet1_0            |   32    |  13.861747  |
|       mobilenet_v3_large        |   32    |  11.927325  |
|          basic_gnn_gcn          |    1    |  9.826749   |
|      functorch_dp_cifar10       |   64    |  9.135671   |
|         opacus_cifar10          |   64    |  9.028022   |
|          mobilenet_v2           |   16    |  8.617868   |
|            resnet18             |    8    |  8.146327   |
|              dlrm               |  2048   |  6.353521   |
|          squeezenet1_1          |   16    |  5.008153   |
|         basic_gnn_sage          |    1    |  4.875677   |
|          basic_gnn_gin          |    1    |  4.680269   |
|         phlippe_resnet          |   128   |  3.790303   |
|      doctr_reco_predictor       |    1    |  3.296093   |
|     pyhpc_equation_of_state     | 1048576 |  1.114497   |
|               drq               |    1    |  0.888625   |
|     functorch_maml_omniglot     |    1    |  0.424753   |
|          maml_omniglot          |    5    |  0.311047   |
|        soft_actor_critic        |   256   |  0.282867   |
|          lennard_jones          |  1000   |  0.168788   |
|        timm_efficientdet        |    0    |     0.0     |
|              moco               |    0    |     0.0     |
|         DALLE2_pytorch          |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.863381 |
|     MobileBertForQuestionAnswering      | 128 | 2.095946 |
|      GPT2ForSequenceClassification      |  4  | 2.032629 |
|           ElectraForCausalLM            | 32  | 1.945722 |
|       ElectraForQuestionAnswering       | 64  | 1.870288 |
|          MobileBertForMaskedLM          | 128 | 1.763132 |
|               DistillGPT2               | 16  | 1.569375 |
|            YituTechConvBert             | 16  | 1.538031 |
|       RobertaForQuestionAnswering       | 16  | 1.491354 |
|               GoogleFnet                | 16  | 1.467295 |
|    LayoutLMForSequenceClassification    | 16  | 1.466817 |
|        BertForQuestionAnswering         | 16  | 1.455877 |
|    MegatronBertForQuestionAnswering     |  8  | 1.442703 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.439407 |
|           RobertaForCausalLM            | 16  | 1.425502 |
|           LayoutLMForMaskedLM           | 16  | 1.40912  |
|         MegatronBertForCausalLM         |  4  | 1.397137 |
|             BertForMaskedLM             | 16  | 1.394773 |
|                CamemBert                | 16  | 1.391988 |
|          AllenaiLongformerBase          |  4  | 1.385777 |
|           DebertaForMaskedLM            |  8  | 1.322685 |
|       DebertaForQuestionAnswering       | 16  | 1.31463  |
|     PLBartForConditionalGeneration      |  4  | 1.314176 |
|             XGLMForCausalLM             |  8  | 1.301074 |
| BlenderbotSmallForConditionalGeneration | 64  | 1.25655  |
|            AlbertForMaskedLM            |  4  | 1.253194 |
|       AlbertForQuestionAnswering        |  4  | 1.252651 |
|      MBartForConditionalGeneration      |  2  | 1.246072 |
|             OPTForCausalLM              |  2  | 1.239344 |
|          BlenderbotForCausalLM          |  4  | 1.234482 |
|          DebertaV2ForMaskedLM           |  2  | 1.227718 |
|         Speech2Text2ForCausalLM         | 256 | 1.220341 |
|       MT5ForConditionalGeneration       | 16  | 1.206908 |
|          DistilBertForMaskedLM          | 128 | 1.204139 |
|     DistilBertForQuestionAnswering      | 256 | 1.187174 |
|     M2M100ForConditionalGeneration      | 16  | 1.171903 |
|             BartForCausalLM             |  4  | 1.157708 |
|     PegasusForConditionalGeneration     | 32  | 1.157274 |
|       BlenderbotSmallForCausalLM        | 64  | 1.155369 |
|      BartForConditionalGeneration       |  2  | 1.153323 |
|           PegasusForCausalLM            | 32  | 1.136978 |
|            MBartForCausalLM             |  4  | 1.133372 |
|            TrOCRForCausalLM             | 32  | 1.085169 |
|            PLBartForCausalLM            |  8  | 1.078184 |
|       T5ForConditionalGeneration        |  4  | 1.014935 |
|                 T5Small                 |  4  | 1.013725 |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 125.986037 |
|          MobileBertForMaskedLM          | 128 | 123.509436 |
|     MobileBertForQuestionAnswering      | 128 | 122.322297 |
|      BartForConditionalGeneration       |  2  | 58.922878  |
|      MBartForConditionalGeneration      |  2  | 55.150973  |
|     PegasusForConditionalGeneration     | 32  |  54.61794  |
|     M2M100ForConditionalGeneration      | 16  | 54.518144  |
|          BlenderbotForCausalLM          |  4  | 50.693414  |
|             XGLMForCausalLM             |  8  | 48.780525  |
|          DebertaV2ForMaskedLM           |  2  | 48.522695  |
|      DebertaV2ForQuestionAnswering      |  1  | 48.253694  |
|         MegatronBertForCausalLM         |  4  | 44.965411  |
|    MegatronBertForQuestionAnswering     |  8  | 44.324463  |
|       MT5ForConditionalGeneration       | 16  | 43.941803  |
| BlenderbotSmallForConditionalGeneration | 64  | 42.163603  |
|            YituTechConvBert             | 16  | 37.244332  |
|     PLBartForConditionalGeneration      |  4  | 34.696932  |
|                 T5Small                 |  4  | 34.127789  |
|       T5ForConditionalGeneration        |  4  | 34.088865  |
|            TrOCRForCausalLM             | 32  |  30.77431  |
|             OPTForCausalLM              |  2  | 30.767033  |
|            MBartForCausalLM             |  4  | 30.417539  |
|           PegasusForCausalLM            | 32  | 30.214386  |
|           DebertaForMaskedLM            |  8  | 29.788582  |
|       DebertaForQuestionAnswering       | 16  | 29.239826  |
|            XLNetLMHeadModel             |  8  |  28.41223  |
|           ElectraForCausalLM            | 32  | 28.047381  |
|           RobertaForCausalLM            | 16  | 27.824063  |
|        BertForQuestionAnswering         | 16  | 27.499328  |
|             BartForCausalLM             |  4  | 27.457565  |
|       ElectraForQuestionAnswering       | 64  |  27.33129  |
|           LayoutLMForMaskedLM           | 16  | 27.304159  |
|                CamemBert                | 16  | 27.264399  |
|             BertForMaskedLM             | 16  | 27.236069  |
|       RobertaForQuestionAnswering       | 16  | 27.172955  |
|    LayoutLMForSequenceClassification    | 16  | 27.033185  |
|      GPT2ForSequenceClassification      |  4  | 25.377119  |
|       AlbertForQuestionAnswering        |  4  | 24.993993  |
|            AlbertForMaskedLM            |  4  | 24.986741  |
|       BlenderbotSmallForCausalLM        | 64  | 24.513198  |
|            PLBartForCausalLM            |  8  | 22.585755  |
|          DistilBertForMaskedLM          | 128 | 22.577282  |
|     DistilBertForQuestionAnswering      | 256 | 22.511886  |
|               GoogleFnet                | 16  | 22.409478  |
|         Speech2Text2ForCausalLM         | 256 | 21.877503  |
|               DistillGPT2               | 16  | 20.364486  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  | 0.994454 |
|       AlbertForQuestionAnswering        |  4  | 0.994371 |
|     DistilBertForQuestionAnswering      | 256 | 0.993326 |
|             OPTForCausalLM              |  2  | 0.993223 |
|           RobertaForCausalLM            | 16  | 0.992436 |
|            TrOCRForCausalLM             | 32  | 0.99216  |
|               DistillGPT2               | 16  | 0.992123 |
|          DistilBertForMaskedLM          | 128 | 0.991621 |
|               GoogleFnet                | 16  | 0.991295 |
|           ElectraForCausalLM            | 32  | 0.991187 |
|            PLBartForCausalLM            |  8  | 0.990829 |
|                CamemBert                | 16  | 0.99075  |
|             BertForMaskedLM             | 16  | 0.990316 |
|       ElectraForQuestionAnswering       | 64  | 0.990295 |
|           LayoutLMForMaskedLM           | 16  | 0.990284 |
|            MBartForCausalLM             |  4  | 0.990038 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.98895  |
|       RobertaForQuestionAnswering       | 16  | 0.988729 |
|       DebertaForQuestionAnswering       | 16  | 0.988547 |
|        BertForQuestionAnswering         | 16  | 0.988503 |
|            YituTechConvBert             | 16  | 0.988467 |
|         Speech2Text2ForCausalLM         | 256 | 0.988414 |
|     PLBartForConditionalGeneration      |  4  | 0.988197 |
|    LayoutLMForSequenceClassification    | 16  | 0.987321 |
|      GPT2ForSequenceClassification      |  4  | 0.987308 |
|           PegasusForCausalLM            | 32  | 0.987178 |
|       BlenderbotSmallForCausalLM        | 64  | 0.986917 |
|             BartForCausalLM             |  4  | 0.985745 |
|           DebertaForMaskedLM            |  8  | 0.985671 |
|            XLNetLMHeadModel             |  8  | 0.985581 |
|          MobileBertForMaskedLM          | 128 | 0.984067 |
|       T5ForConditionalGeneration        |  4  | 0.983594 |
|     MobileBertForQuestionAnswering      | 128 | 0.983222 |
|                 T5Small                 |  4  | 0.983028 |
|          AllenaiLongformerBase          |  4  | 0.979129 |
|         MegatronBertForCausalLM         |  4  | 0.976501 |
|     PegasusForConditionalGeneration     | 32  | 0.97535  |
|      BartForConditionalGeneration       |  2  | 0.968684 |
|    MegatronBertForQuestionAnswering     |  8  | 0.968042 |
|       MT5ForConditionalGeneration       | 16  | 0.963198 |
|             XGLMForCausalLM             |  8  | 0.931638 |
|          DebertaV2ForMaskedLM           |  2  | 0.915336 |
|      MBartForConditionalGeneration      |  2  | 0.901472 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.869447 |
|          BlenderbotForCausalLM          |  4  | 0.843911 |
|     M2M100ForConditionalGeneration      | 16  | 0.772331 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|            AlbertForMaskedLM            |  4  | 2398.270867 |
|       AlbertForQuestionAnswering        |  4  |  2393.8376  |
|            XLNetLMHeadModel             |  8  | 1227.065145 |
|            TrOCRForCausalLM             | 32  | 929.925647  |
|     PegasusForConditionalGeneration     | 32  | 888.496668  |
|     DistilBertForQuestionAnswering      | 256 | 803.816523  |
|    MegatronBertForQuestionAnswering     |  8  | 696.913096  |
|      MBartForConditionalGeneration      |  2  |  638.08269  |
|            MBartForCausalLM             |  4  | 619.588491  |
|          DistilBertForMaskedLM          | 128 | 607.478436  |
|           RobertaForCausalLM            | 16  | 576.847495  |
|             OPTForCausalLM              |  2  | 566.935377  |
|          BlenderbotForCausalLM          |  4  | 553.155271  |
|      BartForConditionalGeneration       |  2  | 549.193732  |
|     M2M100ForConditionalGeneration      | 16  | 549.190309  |
|          DebertaV2ForMaskedLM           |  2  | 543.010637  |
|            YituTechConvBert             | 16  | 534.650992  |
|                CamemBert                | 16  | 527.859225  |
|             BertForMaskedLM             | 16  | 522.354484  |
|           LayoutLMForMaskedLM           | 16  | 520.000742  |
|          AllenaiLongformerBase          |  4  |  508.92721  |
|       DebertaForQuestionAnswering       | 16  | 497.834228  |
|            PLBartForCausalLM            |  8  |  494.95144  |
|             BartForCausalLM             |  4  | 469.426154  |
| BlenderbotSmallForConditionalGeneration | 64  | 442.305995  |
|     PLBartForConditionalGeneration      |  4  | 441.341708  |
|           PegasusForCausalLM            | 32  | 438.751206  |
|        BertForQuestionAnswering         | 16  | 418.068617  |
|    LayoutLMForSequenceClassification    | 16  | 416.322925  |
|         MegatronBertForCausalLM         |  4  | 409.194822  |
|       RobertaForQuestionAnswering       | 16  | 401.114449  |
|       T5ForConditionalGeneration        |  4  | 384.994116  |
|                 T5Small                 |  4  | 384.420567  |
|               GoogleFnet                | 16  | 376.019869  |
|               DistillGPT2               | 16  |  365.6636   |
|          MobileBertForMaskedLM          | 128 | 357.574228  |
|           DebertaForMaskedLM            |  8  | 331.639765  |
|             XGLMForCausalLM             |  8  | 312.218614  |
|       ElectraForQuestionAnswering       | 64  | 294.935661  |
|       BlenderbotSmallForCausalLM        | 64  |  258.22686  |
|         Speech2Text2ForCausalLM         | 256 | 247.704867  |
|      GPT2ForSequenceClassification      |  4  | 240.131633  |
|      DebertaV2ForQuestionAnswering      |  1  | 221.001213  |
|           ElectraForCausalLM            | 32  | 220.880167  |
|       MT5ForConditionalGeneration       | 16  | 220.429755  |
|     MobileBertForQuestionAnswering      | 128 | 207.896521  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|            lcnet_050            | 256  | 3.975254 |
|           fbnetc_100            | 512  | 3.966376 |
|           mnasnet_100           | 512  | 3.893928 |
|         mobilenetv2_100         | 128  | 3.811795 |
|          spnasnet_100           | 128  | 3.644112 |
|      mobilenetv3_large_100      | 512  | 3.616141 |
|            fbnetv3_b            | 256  | 3.462143 |
|           regnety_002           | 1024 | 3.334628 |
|           rexnet_100            | 256  | 3.133714 |
|       tf_efficientnet_b0        | 128  | 3.029948 |
|            tinynet_a            | 128  | 2.906429 |
|          pnasnet5large          |  16  | 2.696024 |
|        ese_vovnet19b_dw         | 256  | 2.62723  |
|            hrnet_w18            | 128  | 2.595181 |
|          botnet26t_256          | 128  | 2.582153 |
|           res2next50            | 128  | 2.449116 |
|       eca_botnext26ts_256       | 128  | 2.364691 |
|          ghostnet_100           | 512  | 2.34509  |
|       gluon_inception_v3        | 256  | 2.315822 |
|           resnest101e           |  64  | 2.290149 |
|        eca_halonext26ts         | 128  | 2.262768 |
|          inception_v3           | 128  | 2.262715 |
|        adv_inception_v3         | 128  | 2.240364 |
|             dla102              | 128  | 2.237351 |
|        res2net50_14w_8s         | 128  | 2.165653 |
|        res2net101_26w_4s        | 128  | 2.15408  |
|            repvgg_a2            | 128  | 2.098084 |
|          cspdarknet53           |  64  | 2.077809 |
|            nfnet_l0             | 128  | 1.986808 |
|        convmixer_768_32         |  32  | 1.985628 |
|           tf_mixnet_l           | 128  | 1.921982 |
|            gernet_l             | 128  | 1.919749 |
|           dm_nfnet_f0           | 128  | 1.836539 |
|           selecsls42b           | 128  | 1.802521 |
|        sebotnet33ts_256         |  64  | 1.792328 |
|            mixnet_l             | 128  | 1.749943 |
|           volo_d1_224           |  64  | 1.739573 |
|           mobilevit_s           |  64  | 1.690802 |
|         poolformer_m36          |  64  | 1.685267 |
|         visformer_small         | 128  | 1.68131  |
|     swsl_resnext101_32x16d      |  32  | 1.622629 |
|           convit_base           |  64  | 1.611847 |
|             dpn107              |  64  | 1.50153  |
|            levit_128            | 1024 | 1.481808 |
|          gmlp_s16_224           | 128  | 1.403319 |
|      xcit_large_24_p8_224       |  16  |  1.3468  |
|          gmixer_24_224          | 128  | 1.346364 |
|  swin_base_patch4_window7_224   |  64  | 1.330245 |
|        twins_pcpvt_base         | 128  | 1.259913 |
|          mixer_b16_224          | 128  | 1.225026 |
|        tnt_s_patch16_224        | 128  | 1.215942 |
|          convnext_base          |  64  | 1.209084 |
|      beit_base_patch16_224      |  64  | 1.190172 |
|      vit_base_patch16_224       |  64  | 1.174328 |
| deit_base_distilled_patch16_224 |  64  | 1.17231  |
|          cait_m36_384           |  4   | 1.162248 |
|          jx_nest_base           |  32  | 1.130284 |
|            pit_b_224            |  64  | 1.128853 |
|         crossvit_9_240          | 256  | 1.082963 |
|          resmlp_12_224          | 128  | 0.76722  |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 8  |     pass      |
|           dm_nfnet_f0           | 8  |     pass      |
|            mixnet_l             | 8  |     pass      |
|       eca_botnext26ts_256       | 8  |     pass      |
|         crossvit_9_240          | 8  |     pass      |
| deit_base_distilled_patch16_224 | 8  |     pass      |
|          convnext_base          | 8  |     pass      |
|           regnety_002           | 8  |     pass      |
|           convit_base           | 8  |     pass      |
|          gmlp_s16_224           | 8  |     pass      |
|          cait_m36_384           | 8  |     pass      |
|             dpn107              | 8  |     pass      |
|      vit_base_patch16_224       | 8  |     pass      |
|          cspdarknet53           | 8  |     pass      |
|        eca_halonext26ts         | 8  |     pass      |
|        ese_vovnet19b_dw         | 8  |     pass      |
|           fbnetc_100            | 8  |     pass      |
|            fbnetv3_b            | 8  |     pass      |
|            gernet_l             | 8  |     pass      |
|          botnet26t_256          | 8  |     pass      |
|       gluon_inception_v3        | 8  |     pass      |
|          gmixer_24_224          | 8  |     pass      |
|          jx_nest_base           | 8  |     pass      |
|        convmixer_768_32         | 8  |     pass      |
|        twins_pcpvt_base         | 8  |     pass      |
|         poolformer_m36          | 8  |     pass      |
|            lcnet_050            | 8  |     pass      |
|          mixer_b16_224          | 8  |     pass      |
|        tnt_s_patch16_224        | 8  |     pass      |
|           mnasnet_100           | 8  |     pass      |
|         mobilenetv2_100         | 8  |     pass      |
|      mobilenetv3_large_100      | 8  |     pass      |
|           mobilevit_s           | 8  |     pass      |
|            nfnet_l0             | 8  |     pass      |
|            pit_b_224            | 8  |     pass      |
|      beit_base_patch16_224      | 8  |     pass      |
|            repvgg_a2            | 8  |     pass      |
|  swin_base_patch4_window7_224   | 8  |     pass      |
|          inception_v3           | 8  |     pass      |
|           tf_mixnet_l           | 8  |     pass      |
|       tf_efficientnet_b0        | 8  |     pass      |
|            tinynet_a            | 8  |     pass      |
|          spnasnet_100           | 8  |     pass      |
|        sebotnet33ts_256         | 8  |     pass      |
|          resmlp_12_224          | 8  |     pass      |
|         coat_lite_mini          | 8  |  fail_to_run  |
|           resnest101e           | 8  | fail_accuracy |
|            levit_128            | 8  | fail_accuracy |
|          ghostnet_100           | 8  | fail_accuracy |
|          pnasnet5large          | 8  | fail_accuracy |
|        res2net101_26w_4s        | 8  | fail_accuracy |
|        res2net50_14w_8s         | 8  | fail_accuracy |
|             dla102              | 8  | fail_accuracy |
|     swsl_resnext101_32x16d      | 8  | fail_accuracy |
|           rexnet_100            | 8  | fail_accuracy |
|           selecsls42b           | 8  | fail_accuracy |
|           res2next50            | 8  | fail_accuracy |
|            hrnet_w18            | 8  | fail_accuracy |
|           volo_d1_224           | 8  | fail_accuracy |
|         visformer_small         | 8  | fail_accuracy |
|      xcit_large_24_p8_224       | 8  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          pnasnet5large          |  16  | 477.537024 |
|            hrnet_w18            | 128  | 263.84734  |
|        res2net101_26w_4s        | 128  | 82.958339  |
|          cait_m36_384           |  4   | 75.756253  |
|           tf_mixnet_l           | 128  | 74.770335  |
|            mixnet_l             | 128  |  70.82672  |
|      xcit_large_24_p8_224       |  16  | 70.747453  |
|        res2net50_14w_8s         | 128  | 66.445893  |
|           mobilevit_s           |  64  | 65.759967  |
|           resnest101e           |  64  |  65.31671  |
|        twins_pcpvt_base         | 128  | 63.990577  |
|  swin_base_patch4_window7_224   |  64  | 61.535071  |
|         poolformer_m36          |  64  | 59.939663  |
|             dpn107              |  64  |  56.81883  |
|        tnt_s_patch16_224        | 128  | 54.133228  |
|          jx_nest_base           |  32  | 49.803228  |
|        eca_halonext26ts         | 128  | 45.918425  |
|            fbnetv3_b            | 256  | 43.172362  |
|          convnext_base          |  64  | 40.989993  |
|           volo_d1_224           |  64  |  40.8486   |
|          gmixer_24_224          | 128  | 38.401722  |
|         crossvit_9_240          | 256  | 37.836861  |
|            levit_128            | 1024 | 37.827168  |
|        adv_inception_v3         | 128  | 37.822805  |
|          inception_v3           | 128  | 37.721772  |
|          gmlp_s16_224           | 128  | 37.634114  |
|        sebotnet33ts_256         |  64  | 37.556591  |
|       gluon_inception_v3        | 256  | 36.276331  |
|             dla102              | 128  | 35.830954  |
|          ghostnet_100           | 512  | 35.207499  |
|           res2next50            | 128  | 34.621356  |
|            tinynet_a            | 128  | 33.349816  |
|           rexnet_100            | 256  | 32.757939  |
|     swsl_resnext101_32x16d      |  32  | 32.461196  |
|           dm_nfnet_f0           | 128  | 31.952122  |
|       eca_botnext26ts_256       | 128  | 31.122456  |
|           convit_base           |  64  | 30.657753  |
|       tf_efficientnet_b0        | 128  |  30.55217  |
|        convmixer_768_32         |  32  | 30.108611  |
|            nfnet_l0             | 128  | 29.982481  |
|         visformer_small         | 128  | 29.782307  |
|          botnet26t_256          | 128  | 29.271205  |
|            pit_b_224            |  64  | 26.393175  |
|          mixer_b16_224          | 128  | 25.564914  |
|      beit_base_patch16_224      |  64  |  25.51725  |
|          cspdarknet53           |  64  | 25.395303  |
|           regnety_002           | 1024 | 25.034142  |
| deit_base_distilled_patch16_224 |  64  | 24.342667  |
|      vit_base_patch16_224       |  64  | 24.257795  |
|      mobilenetv3_large_100      | 512  | 23.634784  |
|          spnasnet_100           | 128  | 22.833271  |
|          resmlp_12_224          | 128  | 22.451516  |
|           fbnetc_100            | 512  | 21.977728  |
|            gernet_l             | 128  | 21.737788  |
|        ese_vovnet19b_dw         | 256  | 21.422196  |
|            repvgg_a2            | 128  | 21.418055  |
|         mobilenetv2_100         | 128  | 21.345578  |
|           mnasnet_100           | 512  | 19.965497  |
|           selecsls42b           | 128  | 19.351013  |
|            lcnet_050            | 256  | 18.605893  |
+---------------------------------+------+------------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.997541 |
|           fbnetc_100            | 512  | 0.996695 |
|      mobilenetv3_large_100      | 512  | 0.996386 |
|            fbnetv3_b            | 256  | 0.996317 |
|           mnasnet_100           | 512  | 0.995609 |
|          ghostnet_100           | 512  | 0.995559 |
|           regnety_002           | 1024 | 0.995467 |
|           dm_nfnet_f0           | 128  | 0.995134 |
|       eca_botnext26ts_256       | 128  | 0.994373 |
|          convnext_base          |  64  | 0.993995 |
|            levit_128            | 1024 | 0.993946 |
|           rexnet_100            | 256  | 0.993379 |
|        eca_halonext26ts         | 128  | 0.993264 |
|        res2net101_26w_4s        | 128  | 0.993085 |
|           res2next50            | 128  | 0.99284  |
|          botnet26t_256          | 128  | 0.99281  |
|           tf_mixnet_l           | 128  | 0.992591 |
|       tf_efficientnet_b0        | 128  | 0.992578 |
|        convmixer_768_32         |  32  | 0.992325 |
|          cspdarknet53           |  64  | 0.992027 |
|            mixnet_l             | 128  | 0.992011 |
|          gmlp_s16_224           | 128  | 0.991941 |
|       gluon_inception_v3        | 256  | 0.991721 |
|            gernet_l             | 128  | 0.991713 |
|            nfnet_l0             | 128  | 0.991642 |
|        sebotnet33ts_256         |  64  | 0.991578 |
|         visformer_small         | 128  | 0.991528 |
|          mixer_b16_224          | 128  | 0.990849 |
|           mobilevit_s           |  64  | 0.990509 |
|          gmixer_24_224          | 128  | 0.990506 |
|      xcit_large_24_p8_224       |  16  | 0.990444 |
|         mobilenetv2_100         | 128  | 0.990387 |
|        res2net50_14w_8s         | 128  | 0.990307 |
|        twins_pcpvt_base         | 128  | 0.989415 |
|             dla102              | 128  | 0.989307 |
|           selecsls42b           | 128  | 0.989255 |
|          pnasnet5large          |  16  | 0.988796 |
|  swin_base_patch4_window7_224   |  64  | 0.988775 |
|           convit_base           |  64  | 0.988588 |
|            tinynet_a            | 128  | 0.987905 |
|          spnasnet_100           | 128  | 0.987336 |
|        tnt_s_patch16_224        | 128  | 0.987317 |
|         poolformer_m36          |  64  | 0.987237 |
|          resmlp_12_224          | 128  | 0.986633 |
|      beit_base_patch16_224      |  64  | 0.986186 |
|        adv_inception_v3         | 128  | 0.986111 |
|          inception_v3           | 128  | 0.986054 |
|             dpn107              |  64  | 0.985541 |
|           resnest101e           |  64  | 0.985472 |
|            lcnet_050            | 256  | 0.984912 |
|            hrnet_w18            | 128  | 0.984571 |
| deit_base_distilled_patch16_224 |  64  | 0.983871 |
|      vit_base_patch16_224       |  64  | 0.983795 |
|            repvgg_a2            | 128  | 0.981916 |
|            pit_b_224            |  64  | 0.981791 |
|           volo_d1_224           |  64  | 0.981505 |
|          jx_nest_base           |  32  | 0.980007 |
|     swsl_resnext101_32x16d      |  32  | 0.979872 |
|          cait_m36_384           |  4   | 0.978842 |
|         crossvit_9_240          | 256  | 0.966575 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+-------------+
|              name               |  bs  |  inductor   |
+---------------------------------+------+-------------+
|      xcit_large_24_p8_224       |  16  | 1282.645277 |
|          cait_m36_384           |  4   | 1085.377484 |
|          convnext_base          |  64  | 1010.168521 |
|           dm_nfnet_f0           | 128  | 952.372749  |
|             dpn107              |  64  | 911.422113  |
|          mixer_b16_224          | 128  | 893.542454  |
|       gluon_inception_v3        | 256  | 794.661919  |
|        tnt_s_patch16_224        | 128  |  741.95085  |
|        twins_pcpvt_base         | 128  | 707.526706  |
|  swin_base_patch4_window7_224   |  64  |  698.78989  |
|           convit_base           |  64  | 676.728796  |
|        res2net101_26w_4s        | 128  |  628.68027  |
|     swsl_resnext101_32x16d      |  32  | 615.030672  |
|            nfnet_l0             | 128  | 605.119367  |
| deit_base_distilled_patch16_224 |  64  | 601.812427  |
|      vit_base_patch16_224       |  64  | 600.262054  |
|      beit_base_patch16_224      |  64  | 597.190489  |
|            levit_128            | 1024 | 557.359181  |
|        ese_vovnet19b_dw         | 256  | 548.338405  |
|             dla102              | 128  | 515.879166  |
|          gmlp_s16_224           | 128  | 506.118418  |
|            pit_b_224            |  64  | 499.909459  |
|          jx_nest_base           |  32  | 476.819106  |
|          gmixer_24_224          | 128  | 476.687411  |
|         crossvit_9_240          | 256  |  469.73207  |
|           resnest101e           |  64  | 467.114815  |
|         poolformer_m36          |  64  |  462.13935  |
|        convmixer_768_32         |  32  | 433.293007  |
|            hrnet_w18            | 128  | 426.821336  |
|          resmlp_12_224          | 128  | 409.701322  |
|           volo_d1_224           |  64  | 402.102114  |
|          inception_v3           | 128  | 399.945374  |
|        adv_inception_v3         | 128  | 397.825921  |
|        res2net50_14w_8s         | 128  |  389.23866  |
|         visformer_small         | 128  | 383.247749  |
|          ghostnet_100           | 512  | 356.994092  |
|            mixnet_l             | 128  | 350.501586  |
|           res2next50            | 128  |  350.21978  |
|           tf_mixnet_l           | 128  |  339.13755  |
|            repvgg_a2            | 128  | 327.644591  |
|          pnasnet5large          |  16  | 326.822454  |
|        eca_halonext26ts         | 128  |  307.35219  |
|           fbnetc_100            | 512  | 302.315605  |
|       eca_botnext26ts_256       | 128  | 287.891754  |
|            gernet_l             | 128  | 286.881238  |
|        sebotnet33ts_256         |  64  | 277.122156  |
|           regnety_002           | 1024 | 276.514079  |
|          botnet26t_256          | 128  | 274.780659  |
|          cspdarknet53           |  64  | 256.020052  |
|           mnasnet_100           | 512  | 255.324103  |
|            fbnetv3_b            | 256  | 240.387362  |
|      mobilenetv3_large_100      | 512  | 229.725681  |
|           selecsls42b           | 128  | 227.264887  |
|           mobilevit_s           |  64  | 222.251033  |
|           rexnet_100            | 256  | 221.980898  |
|       tf_efficientnet_b0        | 128  | 115.363642  |
|            tinynet_a            | 128  |  81.429827  |
|         mobilenetv2_100         | 128  |  71.689808  |
|          spnasnet_100           | 128  |  64.358399  |
|            lcnet_050            | 256  |  26.028453  |
+---------------------------------+------+-------------+

@zxd1997066
Copy link
Contributor

[dynamic] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-04-28 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 7478b7f1cac9686f00edf3db4667cf86d2421531
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000" 

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 79%, 62/78 | 100%, 46/46 | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.36x    |    1.27x    |    1.80x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   26.83    |    42.02    |    49.44    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.90x    |    0.98x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 11.204793 |
|          squeezenet1_1          |   16    | 2.875353  |
|       mobilenet_v3_large        |   32    | 2.756891  |
|        timm_efficientnet        |   64    | 2.694398  |
|           mnasnet1_0            |   32    | 2.628473  |
|          mobilenet_v2           |   16    | 2.625228  |
|       shufflenet_v2_x1_0        |   64    | 2.303058  |
|          timm_resnest           |   32    | 2.244711  |
|            resnet50             |   32    | 2.199144  |
|            resnet152            |   32    |  1.97769  |
|        phlippe_densenet         |   128   |  1.9647   |
|           densenet121           |   64    | 1.899763  |
|       doctr_det_predictor       |    1    | 1.888395  |
|           timm_nfnet            |   128   | 1.859038  |
|           timm_regnet           |   32    | 1.854989  |
|             hf_GPT2             |    1    | 1.806285  |
|         resnext50_32x4d         |    8    | 1.735473  |
|         phlippe_resnet          |   128   | 1.720559  |
|            resnet18             |    8    | 1.653691  |
|           timm_vovnet           |   32    | 1.641118  |
|             alexnet             |   128   |  1.58401  |
|          hf_Bert_large          |    1    | 1.571047  |
|            hf_Albert            |    1    | 1.523841  |
|            moondream            |    1    |  1.50903  |
|      doctr_reco_predictor       |    1    | 1.501532  |
|          hf_GPT2_large          |    1    | 1.498596  |
|             yolov3              |    8    | 1.495671  |
|        basic_gnn_edgecnn        |    1    | 1.476097  |
|          fastNLP_Bert           |    1    | 1.470048  |
|             hf_Bert             |    1    | 1.457879  |
|          hf_Longformer          |    1    | 1.436399  |
|         LearningToPaint         |   96    | 1.423126  |
|     functorch_maml_omniglot     |    1    | 1.415904  |
|              dcgan              |   256   |  1.3597   |
|          hf_DistilBert          |    1    | 1.313831  |
|              vgg16              |    4    | 1.282694  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.277206  |
|             hf_Bart             |    1    | 1.264687  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.232308  |
|          basic_gnn_gcn          |    1    | 1.230528  |
|         basic_gnn_sage          |    1    | 1.210953  |
|        hf_distil_whisper        |    1    | 1.209915  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.193103  |
|           hf_BigBird            |    1    | 1.190075  |
|           hf_T5_large           |    1    | 1.186891  |
|          pytorch_unet           |    1    | 1.185692  |
|         pytorch_stargan         |   16    | 1.170994  |
|        soft_actor_critic        |   256   | 1.149688  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.148024  |
|          BERT_pytorch           |    2    | 1.127277  |
|      torch_multimodal_clip      |   32    | 1.114197  |
|          lennard_jones          |  1000   | 1.111204  |
|              dlrm               |  2048   | 1.110591  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.107598  |
|          basic_gnn_gin          |    1    | 1.105728  |
|              hf_T5              |    1    |  1.10514  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.083485  |
|          maml_omniglot          |    5    | 1.072842  |
|     nvidia_deeprecommender      |   256   | 1.056372  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.050688  |
|     resnet50_quantized_qat      |   32    | 1.019764  |
|       speech_transformer        |    1    | 1.012653  |
|             demucs              |    1    | 1.012406  |
|  timm_vision_transformer_large  |   32    | 1.005548  |
|           hf_Reformer           |    1    | 1.005117  |
|           tts_angular           |   64    | 1.005045  |
|   mobilenet_v2_quantized_qat    |   96    | 1.004396  |
|               drq               |    1    | 0.972809  |
|     timm_vision_transformer     |   32    | 0.957149  |
|           hf_T5_base            |    1    | 0.827006  |
|       Background_Matting        |    1    | 0.802125  |
|              maml               |    1    | 0.700741  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.681913  |
|         opacus_cifar10          |   64    | 0.626493  |
|      functorch_dp_cifar10       |   64    | 0.584613  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+--------------------------------+---------+--------------------+
|              name              |   bs    |      inductor      |
+--------------------------------+---------+--------------------+
|          hf_T5_large           |    4    |  pass_due_to_skip  |
|       Background_Matting       |    1    |  pass_due_to_skip  |
| timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|         hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml              |    1    |  pass_due_to_skip  |
|         basic_gnn_sage         |    1    |        pass        |
|           hf_T5_base           |    4    |        pass        |
|      doctr_det_predictor       |    4    |        pass        |
|              dlrm              |    4    |        pass        |
|    detectron2_fcos_r_50_fpn    |    4    |        pass        |
|             demucs             |    1    |        pass        |
|         basic_gnn_gcn          |    1    |        pass        |
|         basic_gnn_gin          |    1    |        pass        |
|              drq               |    1    |        pass        |
|       basic_gnn_edgecnn        |    1    |        pass        |
|        LearningToPaint         |    4    |        pass        |
|      functorch_dp_cifar10      |    4    |        pass        |
|      doctr_reco_predictor      |    4    |        pass        |
|             yolov3             |    4    |        pass        |
|          fastNLP_Bert          |    4    |        pass        |
|         maml_omniglot          |    5    |        pass        |
|    functorch_maml_omniglot     |    1    |        pass        |
|            hf_Bart             |    4    |        pass        |
|            hf_Bert             |    4    |        pass        |
|             hf_T5              |    4    |        pass        |
|         hf_Bert_large          |    4    |        pass        |
|          hf_Reformer           |    4    |        pass        |
|         hf_Longformer          |    4    |        pass        |
|           hf_BigBird           |    4    |        pass        |
|         hf_DistilBert          |    4    |        pass        |
|            hf_GPT2             |    2    |        pass        |
|           hf_Albert            |    4    |        pass        |
|       hf_distil_whisper        |    4    |        pass        |
|            alexnet             |    4    |        pass        |
|        pytorch_stargan         |   16    |        pass        |
|         lennard_jones          |    4    |        pass        |
|         opacus_cifar10         |    4    |        pass        |
|    pyhpc_isoneutral_mixing     |    4    |        pass        |
|    pyhpc_equation_of_state     |    4    |        pass        |
|         phlippe_resnet         |    4    |        pass        |
|        phlippe_densenet        |    4    |        pass        |
|       mobilenet_v3_large       |    4    |        pass        |
|     nvidia_deeprecommender     |    4    |        pass        |
|             vgg16              |    4    |        pass        |
|  pytorch_CycleGAN_and_pix2pix  |    1    |        pass        |
|   mobilenet_v2_quantized_qat   |    4    |        pass        |
|             llama              |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy | 1048576 |        pass        |
|          BERT_pytorch          |    4    |        pass        |
|           moondream            |    4    |        pass        |
|          pytorch_unet          |    2    |        pass        |
|       soft_actor_critic        |   256   |        pass        |
|       speech_transformer       |    1    |        pass        |
|         squeezenet1_1          |    4    |        pass        |
|       timm_efficientnet        |    4    |        pass        |
|           timm_nfnet           |    4    |        pass        |
|          timm_regnet           |    4    |        pass        |
|          timm_resnest          |    4    |        pass        |
|    timm_vision_transformer     |    4    |        pass        |
|          timm_vovnet           |    4    |        pass        |
|     torch_multimodal_clip      |    4    |        pass        |
|          tts_angular           |    4    |        pass        |
|     resnet50_quantized_qat     |    4    |        pass        |
|       timm_efficientdet        |    0    | model_fail_to_load |
|              moco              |    0    | model_fail_to_load |
|         DALLE2_pytorch         |    0    | model_fail_to_load |
|          Super_SloMo           |    4    |    fail_to_run     |
|        vision_maskrcnn         |    1    |    fail_to_run     |
|             dcgan              |    4    |   fail_accuracy    |
|          densenet121           |    4    |   fail_accuracy    |
|       shufflenet_v2_x1_0       |    4    |   fail_accuracy    |
|          mobilenet_v2          |    4    |   fail_accuracy    |
|           mnasnet1_0           |    4    |   fail_accuracy    |
|        resnext50_32x4d         |    4    |   fail_accuracy    |
|            resnet50            |    4    |   fail_accuracy    |
|           resnet152            |    4    |   fail_accuracy    |
|            resnet18            |    4    |   fail_accuracy    |
+--------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           densenet121           |   64    | 105.143423 |
|           hf_BigBird            |    1    | 76.901663  |
|    detectron2_fcos_r_50_fpn     |    1    |  70.2439   |
|  timm_vision_transformer_large  |   32    | 64.267642  |
|           hf_T5_large           |    1    | 62.463745  |
|           timm_nfnet            |   128   |  57.54993  |
|              maml               |    1    | 53.754846  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 52.900664  |
|          hf_Longformer          |    1    | 51.193474  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 50.806082  |
|           hf_Reformer           |    1    |  47.40762  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 46.307147  |
|        phlippe_densenet         |   128   | 45.931353  |
|           hf_T5_base            |    1    | 44.868421  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  44.06601  |
|      torch_multimodal_clip      |   32    | 40.933501  |
|     pyhpc_isoneutral_mixing     | 1048576 | 39.585752  |
|        timm_efficientnet        |   64    | 39.453477  |
|       speech_transformer        |    1    | 39.412627  |
|          hf_GPT2_large          |    1    | 36.236257  |
|             yolov3              |    8    | 36.010071  |
|          BERT_pytorch           |    2    | 35.381004  |
|             demucs              |    1    | 34.891568  |
|            moondream            |    1    | 33.742028  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 33.527619  |
|         opacus_cifar10          |   64    | 31.895483  |
|              hf_T5              |    1    | 31.273925  |
|        hf_distil_whisper        |    1    | 30.485728  |
|      functorch_dp_cifar10       |   64    | 30.198489  |
|     timm_vision_transformer     |   32    | 29.251042  |
|       mobilenet_v3_large        |   32    | 28.022458  |
|          timm_resnest           |   32    | 27.830625  |
|       shufflenet_v2_x1_0        |   64    | 26.836997  |
|       doctr_det_predictor       |    1    | 26.723539  |
|          hf_Bert_large          |    1    | 26.450876  |
|           timm_regnet           |   32    | 25.375146  |
|       Background_Matting        |    1    | 23.982219  |
|           timm_vovnet           |   32    | 23.147279  |
|          fastNLP_Bert           |    1    |  22.76062  |
|             hf_Bart             |    1    | 22.617793  |
|          pytorch_unet           |    1    | 21.512412  |
|         pytorch_stargan         |   16    | 21.118693  |
|            resnet152            |   32    | 20.490168  |
|            hf_Albert            |    1    | 20.461999  |
|             hf_GPT2             |    1    | 19.788103  |
|          hf_DistilBert          |    1    | 19.159364  |
|             hf_Bert             |    1    | 19.066989  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 18.367999  |
|          squeezenet1_1          |   16    |  17.70189  |
|              vgg16              |    4    | 14.417428  |
|      doctr_reco_predictor       |    1    | 13.624282  |
|          mobilenet_v2           |   16    | 12.723112  |
|         resnext50_32x4d         |    8    | 12.524172  |
|            resnet50             |   32    | 12.505294  |
|             alexnet             |   128   | 12.162179  |
|          basic_gnn_gcn          |    1    | 11.965291  |
|          basic_gnn_gin          |    1    | 11.453173  |
|         basic_gnn_sage          |    1    | 11.423419  |
|               drq               |    1    | 11.032222  |
|              dlrm               |  2048   | 10.909981  |
|           mnasnet1_0            |   32    | 10.308061  |
|            resnet18             |    8    | 10.116205  |
|         LearningToPaint         |   96    |  9.985004  |
|     functorch_maml_omniglot     |    1    |  9.847784  |
|        basic_gnn_edgecnn        |    1    |  9.402118  |
|     pyhpc_equation_of_state     | 1048576 |  9.337999  |
|     nvidia_deeprecommender      |   256   |  8.927174  |
|          maml_omniglot          |    5    |  8.785928  |
|         phlippe_resnet          |   128   |  8.502971  |
|        soft_actor_critic        |   256   |  8.104183  |
|          lennard_jones          |  1000   |  6.892134  |
|              dcgan              |   256   |  6.078336  |
|           tts_angular           |   64    |  5.715434  |
|   mobilenet_v2_quantized_qat    |   96    |  0.134625  |
|     resnet50_quantized_qat      |   32    |  0.095109  |
|        timm_efficientdet        |    0    |    0.0     |
|              moco               |    0    |    0.0     |
|         DALLE2_pytorch          |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|  timm_vision_transformer_large  |   32    | 0.99622  |
|           timm_nfnet            |   128   | 0.993022 |
|              dlrm               |  2048   | 0.987992 |
|           hf_T5_base            |    1    | 0.98792  |
|        timm_efficientnet        |   64    | 0.984484 |
|           timm_regnet           |   32    | 0.982882 |
|       Background_Matting        |    1    | 0.982435 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981796 |
|            resnet152            |   32    | 0.980686 |
|             yolov3              |    8    | 0.979477 |
|             demucs              |    1    | 0.978759 |
|     nvidia_deeprecommender      |   256   | 0.978656 |
|          pytorch_unet           |    1    | 0.978201 |
|          hf_GPT2_large          |    1    | 0.978185 |
|           densenet121           |   64    | 0.977666 |
|      torch_multimodal_clip      |   32    | 0.977489 |
|           timm_vovnet           |   32    | 0.974702 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973946 |
|            resnet50             |   32    | 0.973003 |
|          timm_resnest           |   32    | 0.972867 |
|         LearningToPaint         |   96    | 0.970937 |
|        basic_gnn_edgecnn        |    1    | 0.969704 |
|       doctr_det_predictor       |    1    | 0.968713 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.965053 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.964832 |
|     timm_vision_transformer     |   32    | 0.963334 |
|           mnasnet1_0            |   32    | 0.962913 |
|   mobilenet_v2_quantized_qat    |   96    | 0.961931 |
|     resnet50_quantized_qat      |   32    | 0.961383 |
|       mobilenet_v3_large        |   32    | 0.96106  |
|          mobilenet_v2           |   16    | 0.959709 |
|       shufflenet_v2_x1_0        |   64    | 0.956496 |
|             alexnet             |   128   | 0.954872 |
|              vgg16              |    4    | 0.952482 |
|           hf_BigBird            |    1    | 0.950875 |
|         resnext50_32x4d         |    8    | 0.94691  |
|        phlippe_densenet         |   128   | 0.945526 |
|         pytorch_stargan         |   16    | 0.944307 |
|          basic_gnn_gcn          |    1    | 0.941388 |
|      doctr_reco_predictor       |    1    | 0.936189 |
|          BERT_pytorch           |    2    | 0.934634 |
|           tts_angular           |   64    | 0.930151 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.916691 |
|          squeezenet1_1          |   16    | 0.91498  |
|              dcgan              |   256   |  0.9135  |
|        hf_distil_whisper        |    1    | 0.911578 |
|     pyhpc_equation_of_state     | 1048576 | 0.907167 |
|            resnet18             |    8    | 0.898591 |
|         phlippe_resnet          |   128   | 0.896043 |
|         opacus_cifar10          |   64    | 0.891383 |
|        soft_actor_critic        |   256   | 0.889713 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.881932 |
|          lennard_jones          |  1000   | 0.869816 |
|          maml_omniglot          |    5    | 0.856835 |
|         basic_gnn_sage          |    1    | 0.855838 |
|     functorch_maml_omniglot     |    1    | 0.855603 |
|          basic_gnn_gin          |    1    | 0.853411 |
|          fastNLP_Bert           |    1    | 0.846674 |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  0.8456  |
|            moondream            |    1    | 0.824462 |
|      functorch_dp_cifar10       |   64    | 0.818056 |
|       speech_transformer        |    1    | 0.815529 |
|          hf_Bert_large          |    1    | 0.808102 |
|          hf_Longformer          |    1    | 0.800158 |
|              maml               |    1    | 0.798343 |
|             hf_Bert             |    1    | 0.79688  |
|            hf_Albert            |    1    | 0.791694 |
|           hf_T5_large           |    1    | 0.790063 |
|              hf_T5              |    1    | 0.772542 |
|               drq               |    1    | 0.767443 |
|             hf_Bart             |    1    | 0.766887 |
|          hf_DistilBert          |    1    | 0.762246 |
|             hf_GPT2             |    1    | 0.75974  |
|           hf_Reformer           |    1    | 0.732213 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.68673  |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4516.81571  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1435.054635 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1300.912933 |
|           hf_T5_base            |    1    | 1258.33244  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1170.012358 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1081.914347 |
|          hf_GPT2_large          |    1    | 552.737343  |
|           timm_nfnet            |   128   | 530.696468  |
|           hf_T5_large           |    1    | 400.022816  |
|            moondream            |    1    |  389.27524  |
|        hf_distil_whisper        |    1    | 345.418842  |
|       Background_Matting        |    1    |  342.35743  |
|          pytorch_unet           |    1    | 226.916621  |
|           timm_regnet           |   32    | 219.563162  |
|           densenet121           |   64    | 193.425139  |
|            resnet152            |   32    | 189.706378  |
|    detectron2_fcos_r_50_fpn     |    1    |  178.30031  |
|      torch_multimodal_clip      |   32    | 167.734835  |
|             yolov3              |    8    | 164.342648  |
|             demucs              |    1    | 140.778049  |
|           hf_BigBird            |    1    | 128.368761  |
|           timm_vovnet           |   32    | 117.314955  |
|     timm_vision_transformer     |   32    | 113.390461  |
|          hf_Bert_large          |    1    | 104.746179  |
|         pytorch_stargan         |   16    |  97.069805  |
|       doctr_det_predictor       |    1    |  86.786545  |
|            resnet50             |   32    |  75.189044  |
|          hf_Longformer          |    1    |  71.241771  |
|          timm_resnest           |   32    |  54.165656  |
|       speech_transformer        |    1    |  53.68887   |
|             hf_Bart             |    1    |  52.773436  |
|        timm_efficientnet        |   64    |  52.375158  |
|              maml               |    1    |  49.034401  |
|             alexnet             |   128   |  43.424231  |
|              hf_T5              |    1    |  43.148982  |
|             hf_Bert             |    1    |  41.038941  |
|   mobilenet_v2_quantized_qat    |   96    |  39.627816  |
|           hf_Reformer           |    1    |  37.134991  |
|         LearningToPaint         |   96    |  37.033238  |
|              vgg16              |    4    |  36.101703  |
|            hf_Albert            |    1    |  35.834821  |
|     nvidia_deeprecommender      |   256   |  34.147038  |
|          fastNLP_Bert           |    1    |  33.747713  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  32.96905   |
|          BERT_pytorch           |    2    |  28.837151  |
|     pyhpc_isoneutral_mixing     | 1048576 |  28.202125  |
|          hf_DistilBert          |    1    |   27.4045   |
|     resnet50_quantized_qat      |   32    |  25.289189  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  25.149432  |
|         resnext50_32x4d         |    8    |  24.756563  |
|             hf_GPT2             |    1    |  23.959612  |
|        phlippe_densenet         |   128   |  20.58504   |
|           tts_angular           |   64    |  19.504716  |
|        basic_gnn_edgecnn        |    1    |  18.555556  |
|              dcgan              |   256   |  18.375403  |
|       shufflenet_v2_x1_0        |   64    |  16.844668  |
|           mnasnet1_0            |   32    |  15.603292  |
|       mobilenet_v3_large        |   32    |  14.008356  |
|         opacus_cifar10          |   64    |  10.803556  |
|      functorch_dp_cifar10       |   64    |  10.797526  |
|          basic_gnn_gcn          |    1    |  9.915328   |
|          mobilenet_v2           |   16    |   9.78651   |
|            resnet18             |    8    |  9.768669   |
|              dlrm               |  2048   |  6.673006   |
|          squeezenet1_1          |   16    |  5.933335   |
|         basic_gnn_sage          |    1    |  4.985299   |
|          basic_gnn_gin          |    1    |  4.704373   |
|         phlippe_resnet          |   128   |  4.414644   |
|      doctr_reco_predictor       |    1    |  3.508102   |
|     pyhpc_equation_of_state     | 1048576 |  1.113466   |
|               drq               |    1    |  0.940646   |
|          maml_omniglot          |    5    |  0.553144   |
|        soft_actor_critic        |   256   |  0.521707   |
|     functorch_maml_omniglot     |    1    |  0.478493   |
|          lennard_jones          |  1000   |  0.203436   |
|        timm_efficientdet        |    0    |     0.0     |
|         DALLE2_pytorch          |    0    |     0.0     |
|              moco               |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.521683 |
|     MobileBertForQuestionAnswering      | 128 | 1.788897 |
|      GPT2ForSequenceClassification      |  4  | 1.740398 |
|           ElectraForCausalLM            | 32  | 1.657063 |
|       ElectraForQuestionAnswering       | 64  | 1.624537 |
|          MobileBertForMaskedLM          | 128 | 1.59867  |
|               DistillGPT2               | 16  | 1.493994 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.415729 |
|            YituTechConvBert             | 16  | 1.414747 |
|       RobertaForQuestionAnswering       | 16  | 1.373083 |
|    LayoutLMForSequenceClassification    | 16  | 1.370792 |
|           RobertaForCausalLM            | 16  | 1.364834 |
|        BertForQuestionAnswering         | 16  | 1.36475  |
|               GoogleFnet                | 16  | 1.341909 |
|           LayoutLMForMaskedLM           | 16  | 1.318982 |
|          AllenaiLongformerBase          |  4  | 1.305081 |
|             BertForMaskedLM             | 16  | 1.30219  |
|                CamemBert                | 16  | 1.301771 |
|         MegatronBertForCausalLM         |  4  | 1.271082 |
|    MegatronBertForQuestionAnswering     |  8  | 1.269143 |
|       DebertaForQuestionAnswering       | 16  | 1.238555 |
|     PLBartForConditionalGeneration      |  4  | 1.206055 |
|           DebertaForMaskedLM            |  8  | 1.186693 |
|      MBartForConditionalGeneration      |  2  | 1.181696 |
|             OPTForCausalLM              |  2  | 1.171991 |
|       MT5ForConditionalGeneration       | 16  | 1.171279 |
|       T5ForConditionalGeneration        |  4  | 1.16336  |
|                 T5Small                 |  4  | 1.150298 |
| BlenderbotSmallForConditionalGeneration | 64  | 1.130597 |
|       AlbertForQuestionAnswering        |  4  | 1.128424 |
|            AlbertForMaskedLM            |  4  | 1.127688 |
|          DistilBertForMaskedLM          | 128 | 1.096249 |
|          DebertaV2ForMaskedLM           |  2  | 1.080223 |
|         Speech2Text2ForCausalLM         | 256 | 1.078165 |
|       BlenderbotSmallForCausalLM        | 64  | 1.075072 |
|             XGLMForCausalLM             |  8  | 1.071872 |
|     M2M100ForConditionalGeneration      | 16  | 1.071275 |
|     DistilBertForQuestionAnswering      | 256 | 1.068022 |
|      BartForConditionalGeneration       |  2  | 1.063748 |
|            PLBartForCausalLM            |  8  |  1.0465  |
|     PegasusForConditionalGeneration     | 32  | 1.039032 |
|            TrOCRForCausalLM             | 32  | 1.038892 |
|             BartForCausalLM             |  4  | 1.026737 |
|           PegasusForCausalLM            | 32  | 1.02351  |
|          BlenderbotForCausalLM          |  4  | 1.022638 |
|            MBartForCausalLM             |  4  | 1.019887 |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 131.350605 |
|     PegasusForConditionalGeneration     | 32  | 73.010156  |
|          MobileBertForMaskedLM          | 128 | 70.132255  |
|      MBartForConditionalGeneration      |  2  |  69.90885  |
|     MobileBertForQuestionAnswering      | 128 | 69.244738  |
|     M2M100ForConditionalGeneration      | 16  | 68.582889  |
|       MT5ForConditionalGeneration       | 16  |  65.15064  |
|          BlenderbotForCausalLM          |  4  | 58.537182  |
|             XGLMForCausalLM             |  8  | 55.486962  |
|       T5ForConditionalGeneration        |  4  | 54.378626  |
|                 T5Small                 |  4  | 54.371327  |
| BlenderbotSmallForConditionalGeneration | 64  | 52.012857  |
|      BartForConditionalGeneration       |  2  | 50.192657  |
|          DebertaV2ForMaskedLM           |  2  | 49.647701  |
|    MegatronBertForQuestionAnswering     |  8  | 47.599829  |
|         MegatronBertForCausalLM         |  4  | 47.496161  |
|            YituTechConvBert             | 16  | 45.864071  |
|            XLNetLMHeadModel             |  8  | 44.495713  |
|     PLBartForConditionalGeneration      |  4  | 43.436254  |
|             OPTForCausalLM              |  2  | 39.656188  |
|           PegasusForCausalLM            | 32  | 38.140185  |
|            MBartForCausalLM             |  4  | 36.366898  |
|       DebertaForQuestionAnswering       | 16  | 33.480179  |
|           DebertaForMaskedLM            |  8  | 33.099694  |
|            TrOCRForCausalLM             | 32  | 32.983445  |
|      DebertaV2ForQuestionAnswering      |  1  | 32.025132  |
|      GPT2ForSequenceClassification      |  4  | 31.260773  |
|           RobertaForCausalLM            | 16  | 29.620954  |
|                CamemBert                | 16  | 28.922392  |
|       RobertaForQuestionAnswering       | 16  | 28.814505  |
|           ElectraForCausalLM            | 32  | 28.054201  |
|           LayoutLMForMaskedLM           | 16  | 27.745618  |
|             BertForMaskedLM             | 16  | 27.499879  |
|       ElectraForQuestionAnswering       | 64  | 27.456474  |
|       AlbertForQuestionAnswering        |  4  | 27.431452  |
|        BertForQuestionAnswering         | 16  | 27.429143  |
|            AlbertForMaskedLM            |  4  | 27.380186  |
|    LayoutLMForSequenceClassification    | 16  | 27.053136  |
|       BlenderbotSmallForCausalLM        | 64  | 26.201414  |
|     DistilBertForQuestionAnswering      | 256 | 25.803945  |
|               DistillGPT2               | 16  | 25.680369  |
|          DistilBertForMaskedLM          | 128 | 25.130115  |
|             BartForCausalLM             |  4  | 25.059969  |
|         Speech2Text2ForCausalLM         | 256 | 24.074225  |
|            PLBartForCausalLM            |  8  | 23.867377  |
|               GoogleFnet                | 16  | 21.634985  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  | 0.993759 |
|       AlbertForQuestionAnswering        |  4  | 0.99352  |
|     DistilBertForQuestionAnswering      | 256 | 0.993007 |
|           RobertaForCausalLM            | 16  | 0.992593 |
|            TrOCRForCausalLM             | 32  | 0.992221 |
|          DistilBertForMaskedLM          | 128 | 0.991806 |
|             OPTForCausalLM              |  2  | 0.991735 |
|           ElectraForCausalLM            | 32  | 0.991345 |
|               GoogleFnet                | 16  | 0.99128  |
|       ElectraForQuestionAnswering       | 64  | 0.990851 |
|                CamemBert                | 16  | 0.990833 |
|             BertForMaskedLM             | 16  | 0.990748 |
|               DistillGPT2               | 16  | 0.990646 |
|            PLBartForCausalLM            |  8  | 0.990525 |
|           LayoutLMForMaskedLM           | 16  | 0.990511 |
|    MegatronBertForQuestionAnswering     |  8  | 0.989963 |
|            MBartForCausalLM             |  4  | 0.989492 |
|       DebertaForQuestionAnswering       | 16  | 0.988833 |
|     PegasusForConditionalGeneration     | 32  | 0.988784 |
|            YituTechConvBert             | 16  | 0.988641 |
|       RobertaForQuestionAnswering       | 16  | 0.988573 |
|    LayoutLMForSequenceClassification    | 16  | 0.988443 |
|        BertForQuestionAnswering         | 16  | 0.988378 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.988048 |
|         Speech2Text2ForCausalLM         | 256 | 0.98785  |
|     PLBartForConditionalGeneration      |  4  | 0.987592 |
|           PegasusForCausalLM            | 32  | 0.987211 |
|             BartForCausalLM             |  4  | 0.987012 |
|      GPT2ForSequenceClassification      |  4  | 0.986782 |
|      MBartForConditionalGeneration      |  2  | 0.986316 |
|          BlenderbotForCausalLM          |  4  | 0.985841 |
|           DebertaForMaskedLM            |  8  | 0.985347 |
|       BlenderbotSmallForCausalLM        | 64  | 0.985289 |
|         MegatronBertForCausalLM         |  4  | 0.985252 |
|          MobileBertForMaskedLM          | 128 | 0.984529 |
|      BartForConditionalGeneration       |  2  | 0.982784 |
|            XLNetLMHeadModel             |  8  | 0.982221 |
|       MT5ForConditionalGeneration       | 16  | 0.981991 |
|       T5ForConditionalGeneration        |  4  | 0.98185  |
|                 T5Small                 |  4  | 0.981642 |
|     MobileBertForQuestionAnswering      | 128 | 0.978293 |
|     M2M100ForConditionalGeneration      | 16  | 0.97719  |
|          DebertaV2ForMaskedLM           |  2  | 0.974464 |
|             XGLMForCausalLM             |  8  | 0.971994 |
|          AllenaiLongformerBase          |  4  | 0.970753 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.869766 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|            AlbertForMaskedLM            |  4  | 2689.334756 |
|       AlbertForQuestionAnswering        |  4  | 2678.368757 |
|            XLNetLMHeadModel             |  8  | 1314.923997 |
|     PegasusForConditionalGeneration     | 32  | 1005.816889 |
|            TrOCRForCausalLM             | 32  | 981.439282  |
|     DistilBertForQuestionAnswering      | 256 | 905.686077  |
|    MegatronBertForQuestionAnswering     |  8  | 794.652782  |
|            MBartForCausalLM             |  4  | 695.061407  |
|          BlenderbotForCausalLM          |  4  | 680.288272  |
|      MBartForConditionalGeneration      |  2  | 679.453002  |
|          DistilBertForMaskedLM          | 128 | 677.034319  |
|          DebertaV2ForMaskedLM           |  2  | 622.365539  |
|           RobertaForCausalLM            | 16  |  608.36004  |
|     M2M100ForConditionalGeneration      | 16  | 607.735424  |
|      BartForConditionalGeneration       |  2  | 605.644783  |
|             OPTForCausalLM              |  2  | 601.242471  |
|            YituTechConvBert             | 16  | 585.413521  |
|                CamemBert                | 16  | 573.977351  |
|             BertForMaskedLM             | 16  | 568.124551  |
|           LayoutLMForMaskedLM           | 16  | 564.096874  |
|          AllenaiLongformerBase          |  4  | 541.387743  |
|       DebertaForQuestionAnswering       | 16  | 535.335935  |
|             BartForCausalLM             |  4  | 529.288659  |
|            PLBartForCausalLM            |  8  | 514.767631  |
| BlenderbotSmallForConditionalGeneration | 64  | 497.571358  |
|           PegasusForCausalLM            | 32  | 494.170337  |
|     PLBartForConditionalGeneration      |  4  | 483.210881  |
|        BertForQuestionAnswering         | 16  | 458.298483  |
|         MegatronBertForCausalLM         |  4  | 453.450565  |
|    LayoutLMForSequenceClassification    | 16  | 451.036856  |
|       RobertaForQuestionAnswering       | 16  |  439.7692   |
|               GoogleFnet                | 16  | 413.992547  |
|          MobileBertForMaskedLM          | 128 | 400.565668  |
|               DistillGPT2               | 16  | 387.070432  |
|             XGLMForCausalLM             |  8  | 384.014749  |
|           DebertaForMaskedLM            |  8  | 374.530126  |
|       ElectraForQuestionAnswering       | 64  | 341.363688  |
|                 T5Small                 |  4  | 339.803312  |
|       T5ForConditionalGeneration        |  4  | 335.071258  |
|         Speech2Text2ForCausalLM         | 256 | 282.445758  |
|      GPT2ForSequenceClassification      |  4  |  281.99499  |
|       BlenderbotSmallForCausalLM        | 64  |  280.57966  |
|           ElectraForCausalLM            | 32  | 259.556961  |
|     MobileBertForQuestionAnswering      | 128 | 248.321704  |
|       MT5ForConditionalGeneration       | 16  | 226.708873  |
|      DebertaV2ForQuestionAnswering      |  1  |  226.69189  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           fbnetc_100            | 512  | 3.910484 |
|           mnasnet_100           | 512  | 3.832764 |
|         mobilenetv2_100         | 128  | 3.732323 |
|            lcnet_050            | 256  | 3.73151  |
|      mobilenetv3_large_100      | 512  | 3.611475 |
|          spnasnet_100           | 128  | 3.481592 |
|            fbnetv3_b            | 256  | 3.419297 |
|           regnety_002           | 1024 | 3.271531 |
|           rexnet_100            | 256  | 3.05303  |
|       tf_efficientnet_b0        | 128  | 2.885309 |
|            tinynet_a            | 128  | 2.69748  |
|        ese_vovnet19b_dw         | 256  | 2.596903 |
|          botnet26t_256          | 128  | 2.55598  |
|          pnasnet5large          |  16  | 2.530712 |
|            hrnet_w18            | 128  | 2.50513  |
|           res2next50            | 128  | 2.37813  |
|          ghostnet_100           | 512  | 2.323943 |
|       eca_botnext26ts_256       | 128  | 2.311382 |
|       gluon_inception_v3        | 256  | 2.291342 |
|          inception_v3           | 128  | 2.225731 |
|           resnest101e           |  64  | 2.220417 |
|        adv_inception_v3         | 128  | 2.210056 |
|        eca_halonext26ts         | 128  | 2.204695 |
|             dla102              | 128  | 2.201319 |
|        res2net101_26w_4s        | 128  | 2.10796  |
|        res2net50_14w_8s         | 128  | 2.102305 |
|            repvgg_a2            | 128  | 2.065686 |
|          cspdarknet53           |  64  | 2.020263 |
|            nfnet_l0             | 128  | 1.999631 |
|        convmixer_768_32         |  32  | 1.942517 |
|            gernet_l             | 128  | 1.89478  |
|           dm_nfnet_f0           | 128  | 1.866512 |
|           tf_mixnet_l           | 128  | 1.811765 |
|           selecsls42b           | 128  | 1.773582 |
|        sebotnet33ts_256         |  64  | 1.755355 |
|            mixnet_l             | 128  | 1.65627  |
|         visformer_small         | 128  | 1.655436 |
|         poolformer_m36          |  64  | 1.647425 |
|           volo_d1_224           |  64  | 1.60478  |
|     swsl_resnext101_32x16d      |  32  | 1.56672  |
|             dpn107              |  64  | 1.49135  |
|            levit_128            | 1024 | 1.449278 |
|           mobilevit_s           |  64  | 1.433507 |
|          gmlp_s16_224           | 128  | 1.207466 |
|          resmlp_12_224          | 128  |  1.1834  |
|      xcit_large_24_p8_224       |  16  | 1.178565 |
|          gmixer_24_224          | 128  | 1.170351 |
|           convit_base           |  64  | 1.168114 |
|          cait_m36_384           |  4   | 1.139692 |
|  swin_base_patch4_window7_224   |  64  | 1.107395 |
|        tnt_s_patch16_224        | 128  | 1.089504 |
|        twins_pcpvt_base         | 128  | 1.071598 |
|          mixer_b16_224          | 128  | 1.056713 |
|          convnext_base          |  64  | 1.053843 |
|      beit_base_patch16_224      |  64  | 1.048797 |
| deit_base_distilled_patch16_224 |  64  | 1.025099 |
|            pit_b_224            |  64  | 1.024947 |
|      vit_base_patch16_224       |  64  | 1.009095 |
|          jx_nest_base           |  32  | 1.008376 |
|         crossvit_9_240          | 256  | 0.998945 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 8  |     pass      |
|           dm_nfnet_f0           | 8  |     pass      |
|            mixnet_l             | 8  |     pass      |
|       eca_botnext26ts_256       | 8  |     pass      |
|         crossvit_9_240          | 8  |     pass      |
| deit_base_distilled_patch16_224 | 8  |     pass      |
|          convnext_base          | 8  |     pass      |
|           regnety_002           | 8  |     pass      |
|           convit_base           | 8  |     pass      |
|          gmlp_s16_224           | 8  |     pass      |
|          cait_m36_384           | 8  |     pass      |
|             dpn107              | 8  |     pass      |
|      vit_base_patch16_224       | 8  |     pass      |
|          cspdarknet53           | 8  |     pass      |
|        eca_halonext26ts         | 8  |     pass      |
|        ese_vovnet19b_dw         | 8  |     pass      |
|           fbnetc_100            | 8  |     pass      |
|            fbnetv3_b            | 8  |     pass      |
|            gernet_l             | 8  |     pass      |
|          botnet26t_256          | 8  |     pass      |
|       gluon_inception_v3        | 8  |     pass      |
|          gmixer_24_224          | 8  |     pass      |
|          jx_nest_base           | 8  |     pass      |
|        convmixer_768_32         | 8  |     pass      |
|        twins_pcpvt_base         | 8  |     pass      |
|         poolformer_m36          | 8  |     pass      |
|            lcnet_050            | 8  |     pass      |
|          mixer_b16_224          | 8  |     pass      |
|        tnt_s_patch16_224        | 8  |     pass      |
|           mnasnet_100           | 8  |     pass      |
|         mobilenetv2_100         | 8  |     pass      |
|      mobilenetv3_large_100      | 8  |     pass      |
|           mobilevit_s           | 8  |     pass      |
|            nfnet_l0             | 8  |     pass      |
|            pit_b_224            | 8  |     pass      |
|      beit_base_patch16_224      | 8  |     pass      |
|            repvgg_a2            | 8  |     pass      |
|  swin_base_patch4_window7_224   | 8  |     pass      |
|          inception_v3           | 8  |     pass      |
|           tf_mixnet_l           | 8  |     pass      |
|       tf_efficientnet_b0        | 8  |     pass      |
|            tinynet_a            | 8  |     pass      |
|          spnasnet_100           | 8  |     pass      |
|        sebotnet33ts_256         | 8  |     pass      |
|          resmlp_12_224          | 8  |     pass      |
|         coat_lite_mini          | 8  |  fail_to_run  |
|           resnest101e           | 8  | fail_accuracy |
|            levit_128            | 8  | fail_accuracy |
|          ghostnet_100           | 8  | fail_accuracy |
|          pnasnet5large          | 8  | fail_accuracy |
|        res2net101_26w_4s        | 8  | fail_accuracy |
|        res2net50_14w_8s         | 8  | fail_accuracy |
|             dla102              | 8  | fail_accuracy |
|     swsl_resnext101_32x16d      | 8  | fail_accuracy |
|           rexnet_100            | 8  | fail_accuracy |
|           selecsls42b           | 8  | fail_accuracy |
|           res2next50            | 8  | fail_accuracy |
|            hrnet_w18            | 8  | fail_accuracy |
|           volo_d1_224           | 8  | fail_accuracy |
|         visformer_small         | 8  | fail_accuracy |
|      xcit_large_24_p8_224       | 8  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|  swin_base_patch4_window7_224   |  64  | 131.225337 |
|          pnasnet5large          |  16  | 108.67126  |
|           mobilevit_s           |  64  | 98.761145  |
|           tf_mixnet_l           | 128  | 98.556575  |
|          cait_m36_384           |  4   | 95.101889  |
|      xcit_large_24_p8_224       |  16  | 94.335887  |
|        twins_pcpvt_base         | 128  | 94.262797  |
|          jx_nest_base           |  32  | 88.593626  |
|             dpn107              |  64  | 87.848352  |
|        tnt_s_patch16_224        | 128  | 82.457019  |
|           volo_d1_224           |  64  |  78.70708  |
|            levit_128            | 1024 |  76.12498  |
|         crossvit_9_240          | 256  | 74.644385  |
|           rexnet_100            | 256  | 73.899488  |
|        res2net50_14w_8s         | 128  | 73.895721  |
|            mixnet_l             | 128  | 72.589285  |
|        eca_halonext26ts         | 128  | 72.179861  |
|         poolformer_m36          |  64  |  70.60069  |
|        sebotnet33ts_256         |  64  | 70.462706  |
|          ghostnet_100           | 512  | 68.332003  |
|           dm_nfnet_f0           | 128  | 60.551146  |
|           convit_base           |  64  | 59.231642  |
|            hrnet_w18            | 128  | 59.092188  |
|       eca_botnext26ts_256       | 128  | 57.409871  |
|          convnext_base          |  64  | 55.473985  |
|        res2net101_26w_4s        | 128  | 53.530829  |
|       tf_efficientnet_b0        | 128  | 49.416384  |
|            nfnet_l0             | 128  | 47.063096  |
|            pit_b_224            |  64  | 46.122547  |
|          gmixer_24_224          | 128  | 44.238395  |
|       gluon_inception_v3        | 256  | 44.176676  |
|          gmlp_s16_224           | 128  | 43.879943  |
|           resnest101e           |  64  | 43.273467  |
|            fbnetv3_b            | 256  | 42.992715  |
|          botnet26t_256          | 128  | 42.750209  |
|           res2next50            | 128  | 42.494174  |
|        adv_inception_v3         | 128  | 42.174282  |
|          inception_v3           | 128  | 42.159012  |
|            tinynet_a            | 128  |  42.10106  |
|         visformer_small         | 128  |  35.67077  |
|             dla102              | 128  | 33.325816  |
|          cspdarknet53           |  64  | 32.442733  |
|      vit_base_patch16_224       |  64  | 30.652294  |
| deit_base_distilled_patch16_224 |  64  |  30.58644  |
|      mobilenetv3_large_100      | 512  | 30.483534  |
|          mixer_b16_224          | 128  | 30.450543  |
|        ese_vovnet19b_dw         | 256  | 29.661711  |
|      beit_base_patch16_224      |  64  | 25.924305  |
|           regnety_002           | 1024 | 22.549201  |
|        convmixer_768_32         |  32  | 21.610559  |
|          resmlp_12_224          | 128  | 21.098062  |
|            repvgg_a2            | 128  | 17.678181  |
|            lcnet_050            | 256  |  17.4577   |
|           selecsls42b           | 128  | 17.333253  |
|     swsl_resnext101_32x16d      |  32  | 16.616526  |
|         mobilenetv2_100         | 128  | 13.341247  |
|          spnasnet_100           | 128  | 11.630914  |
|            gernet_l             | 128  | 11.442353  |
|           fbnetc_100            | 512  | 10.834491  |
|           mnasnet_100           | 512  | 10.081847  |
+---------------------------------+------+------------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.997703 |
|           fbnetc_100            | 512  | 0.997057 |
|           mnasnet_100           | 512  | 0.996627 |
|      mobilenetv3_large_100      | 512  | 0.99615  |
|            fbnetv3_b            | 256  | 0.996066 |
|          convnext_base          |  64  | 0.995903 |
|           regnety_002           | 1024 | 0.995862 |
|           dm_nfnet_f0           | 128  | 0.995828 |
|          ghostnet_100           | 512  | 0.995748 |
|            levit_128            | 1024 | 0.994846 |
|        res2net101_26w_4s        | 128  | 0.994504 |
|       eca_botnext26ts_256       | 128  | 0.994192 |
|        eca_halonext26ts         | 128  | 0.994002 |
|             dpn107              |  64  | 0.994001 |
|       gluon_inception_v3        | 256  | 0.993932 |
|           rexnet_100            | 256  | 0.99369  |
|             dla102              | 128  | 0.993442 |
|          gmlp_s16_224           | 128  | 0.993384 |
|           res2next50            | 128  | 0.993373 |
|        twins_pcpvt_base         | 128  | 0.993372 |
|          mixer_b16_224          | 128  | 0.993246 |
|      xcit_large_24_p8_224       |  16  | 0.993084 |
|           tf_mixnet_l           | 128  | 0.993059 |
|        res2net50_14w_8s         | 128  | 0.993054 |
|          botnet26t_256          | 128  | 0.992818 |
|        convmixer_768_32         |  32  | 0.992805 |
|           convit_base           |  64  | 0.992803 |
|            mixnet_l             | 128  | 0.992741 |
|          gmixer_24_224          | 128  | 0.992446 |
|       tf_efficientnet_b0        | 128  | 0.992254 |
|      beit_base_patch16_224      |  64  | 0.992152 |
|         visformer_small         | 128  | 0.992134 |
|          pnasnet5large          |  16  | 0.991964 |
|            gernet_l             | 128  | 0.991783 |
|           resnest101e           |  64  | 0.991781 |
|        sebotnet33ts_256         |  64  | 0.990971 |
|           mobilevit_s           |  64  | 0.990913 |
|            nfnet_l0             | 128  | 0.990424 |
|           selecsls42b           | 128  | 0.98957  |
|         mobilenetv2_100         | 128  | 0.98947  |
|          spnasnet_100           | 128  | 0.989336 |
|        tnt_s_patch16_224        | 128  | 0.989132 |
| deit_base_distilled_patch16_224 |  64  | 0.989047 |
|          resmlp_12_224          | 128  | 0.988782 |
|          cait_m36_384           |  4   | 0.988706 |
|            pit_b_224            |  64  | 0.988573 |
|      vit_base_patch16_224       |  64  | 0.988536 |
|          inception_v3           | 128  | 0.988519 |
|  swin_base_patch4_window7_224   |  64  | 0.988344 |
|         poolformer_m36          |  64  | 0.988267 |
|        adv_inception_v3         | 128  | 0.988193 |
|            tinynet_a            | 128  | 0.987798 |
|     swsl_resnext101_32x16d      |  32  | 0.987707 |
|            hrnet_w18            | 128  | 0.986641 |
|            lcnet_050            | 256  | 0.985305 |
|            repvgg_a2            | 128  | 0.984475 |
|           volo_d1_224           |  64  | 0.983741 |
|          jx_nest_base           |  32  | 0.983308 |
|          cspdarknet53           |  64  | 0.98123  |
|         crossvit_9_240          | 256  | 0.974243 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+-------------+
|              name               |  bs  |  inductor   |
+---------------------------------+------+-------------+
|      xcit_large_24_p8_224       |  16  | 1478.510991 |
|          convnext_base          |  64  | 1170.069706 |
|          cait_m36_384           |  4   | 1104.172764 |
|          mixer_b16_224          | 128  | 1047.815674 |
|           dm_nfnet_f0           | 128  | 945.563158  |
|           convit_base           |  64  |  935.22055  |
|             dpn107              |  64  | 925.886371  |
|  swin_base_patch4_window7_224   |  64  | 849.101784  |
|        twins_pcpvt_base         | 128  | 836.964029  |
|        tnt_s_patch16_224        | 128  | 832.378339  |
|       gluon_inception_v3        | 256  | 808.445534  |
|      vit_base_patch16_224       |  64  |  704.24405  |
| deit_base_distilled_patch16_224 |  64  | 696.872305  |
|      beit_base_patch16_224      |  64  | 684.858599  |
|        res2net101_26w_4s        | 128  | 647.005025  |
|     swsl_resnext101_32x16d      |  32  | 638.174684  |
|            nfnet_l0             | 128  | 604.546603  |
|          gmlp_s16_224           | 128  | 586.725727  |
|            levit_128            | 1024 | 569.990426  |
|          gmixer_24_224          | 128  |  565.45864  |
|            pit_b_224            |  64  |  558.90923  |
|        ese_vovnet19b_dw         | 256  | 551.590141  |
|          jx_nest_base           |  32  |  536.98047  |
|             dla102              | 128  | 527.234148  |
|         crossvit_9_240          | 256  | 514.606383  |
|           resnest101e           |  64  | 484.104396  |
|         poolformer_m36          |  64  | 474.671385  |
|        convmixer_768_32         |  32  | 445.358958  |
|            hrnet_w18            | 128  | 440.962248  |
|           volo_d1_224           |  64  |  438.70716  |
|          inception_v3           | 128  | 407.412975  |
|        adv_inception_v3         | 128  | 404.802503  |
|        res2net50_14w_8s         | 128  | 402.858091  |
|         visformer_small         | 128  | 391.811371  |
|            mixnet_l             | 128  | 368.102514  |
|           res2next50            | 128  | 361.805434  |
|          ghostnet_100           | 512  | 361.510299  |
|           tf_mixnet_l           | 128  | 359.479813  |
|          pnasnet5large          |  16  | 353.292152  |
|            repvgg_a2            | 128  | 334.025075  |
|        eca_halonext26ts         | 128  | 315.308085  |
|           fbnetc_100            | 512  | 307.262799  |
|       eca_botnext26ts_256       | 128  | 293.963999  |
|            gernet_l             | 128  | 292.331037  |
|           regnety_002           | 1024 | 282.118838  |
|        sebotnet33ts_256         |  64  | 281.372342  |
|          botnet26t_256          | 128  | 277.510883  |
|          resmlp_12_224          | 128  | 266.166918  |
|          cspdarknet53           |  64  | 263.731444  |
|           mobilevit_s           |  64  | 262.626707  |
|           mnasnet_100           | 512  | 260.202317  |
|            fbnetv3_b            | 256  | 244.589375  |
|           selecsls42b           | 128  | 230.937592  |
|      mobilenetv3_large_100      | 512  |  230.20846  |
|           rexnet_100            | 256  | 228.029279  |
|       tf_efficientnet_b0        | 128  | 120.755326  |
|            tinynet_a            | 128  |  87.340924  |
|         mobilenetv2_100         | 128  |  73.94386   |
|          spnasnet_100           | 128  |  67.064858  |
|            lcnet_050            | 256  |  27.679152  |
+---------------------------------+------+-------------+

@zxd1997066
Copy link
Contributor

[cppwrapper_static_shape] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-04-28 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 7478b7f1cac9686f00edf3db4667cf86d2421531
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 86%, 68/79 | 100%, 46/46 | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.58x    |    1.20x    |    1.53x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   55.01    |    36.98    |    33.95    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.86x    |    0.81x    |    0.82x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 55.820287 |
|     pyhpc_equation_of_state     |    1    | 30.238986 |
|          maml_omniglot          |    5    | 4.335877  |
|         basic_gnn_sage          |    1    | 3.667324  |
|          squeezenet1_1          |    1    | 3.653324  |
|     functorch_maml_omniglot     |    1    | 3.619388  |
|          basic_gnn_gin          |    1    |  3.52841  |
|          basic_gnn_gcn          |    1    | 2.875414  |
|           timm_nfnet            |    1    |  2.79494  |
|         opacus_cifar10          |    1    |  2.48319  |
|       shufflenet_v2_x1_0        |    1    | 2.426427  |
|      functorch_dp_cifar10       |    1    | 2.323941  |
|          lennard_jones          |    1    | 2.280519  |
|              dcgan              |    1    |  2.27375  |
|            resnet18             |    1    | 2.245272  |
|          mobilenet_v2           |    1    | 2.037251  |
|          timm_resnest           |    1    |  2.01906  |
|           mnasnet1_0            |    1    | 1.927042  |
|       mobilenet_v3_large        |    1    | 1.916351  |
|         phlippe_resnet          |    1    | 1.882416  |
|        phlippe_densenet         |    1    | 1.825706  |
|           densenet121           |    1    | 1.798117  |
|            resnet50             |    1    | 1.794432  |
|        timm_efficientnet        |    1    | 1.713946  |
|            resnet152            |    1    | 1.675215  |
|         LearningToPaint         |    1    | 1.616997  |
|           timm_vovnet           |    1    | 1.607973  |
|              llama              |    1    | 1.559562  |
|              dlrm               |    1    | 1.509305  |
|      doctr_reco_predictor       |    1    | 1.506169  |
|         resnext50_32x4d         |    1    | 1.483173  |
|           timm_regnet           |    1    | 1.476219  |
|              vgg16              |    1    |  1.44953  |
|        basic_gnn_edgecnn        |    1    | 1.397314  |
|             yolov3              |    1    | 1.374155  |
|             alexnet             |    1    | 1.368645  |
|       doctr_det_predictor       |    1    | 1.305044  |
|          BERT_pytorch           |    1    | 1.304395  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.301693  |
|               drq               |    1    | 1.294374  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.292469  |
|            hf_Albert            |    1    | 1.262654  |
|             hf_GPT2             |    1    | 1.243887  |
|              maml               |    1    | 1.242491  |
|          hf_GPT2_large          |    1    | 1.222126  |
|     timm_vision_transformer     |    1    | 1.218137  |
|            moondream            |    1    | 1.215656  |
|          fastNLP_Bert           |    1    |  1.21038  |
|         pytorch_stargan         |   16    | 1.192844  |
|        soft_actor_critic        |   256   | 1.192104  |
|  timm_vision_transformer_large  |    1    | 1.170383  |
|          hf_Bert_large          |    1    | 1.161493  |
|           hf_BigBird            |    1    | 1.153699  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.145789  |
|          hf_DistilBert          |    1    | 1.142903  |
|             hf_Bert             |    1    | 1.136063  |
|      torch_multimodal_clip      |    1    | 1.123367  |
|             hf_Bart             |    1    | 1.094403  |
|       speech_transformer        |    1    | 1.074102  |
|        hf_distil_whisper        |    1    | 1.070174  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.069991  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.058694  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.046027  |
|          pytorch_unet           |    1    | 1.041338  |
|          hf_Longformer          |    1    | 1.011846  |
|             demucs              |    1    | 1.002813  |
|           tts_angular           |    1    | 0.999805  |
|     resnet50_quantized_qat      |    1    | 0.997163  |
|   mobilenet_v2_quantized_qat    |    1    | 0.985836  |
|     nvidia_deeprecommender      |    1    | 0.927341  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.889571  |
|       Background_Matting        |    1    | 0.834124  |
|           hf_Reformer           |    1    | 0.829662  |
|           hf_T5_large           |    1    | 0.785001  |
|              hf_T5              |    1    | 0.708382  |
|           hf_T5_base            |    1    | 0.597567  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|         basic_gnn_sage          |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|               drq               |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|              llama              |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|              hf_T5              |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|       speech_transformer        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|           timm_regnet           |    1    |        pass        |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_fpn |    1    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
|           Super_SloMo           |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    1    |    fail_to_run     |
|         resnext50_32x4d         |    1    |   fail_accuracy    |
|            resnet152            |    1    |   fail_accuracy    |
|       shufflenet_v2_x1_0        |    1    |   fail_accuracy    |
|           mnasnet1_0            |    1    |   fail_accuracy    |
|          mobilenet_v2           |    1    |   fail_accuracy    |
|            resnet18             |    1    |   fail_accuracy    |
|            resnet50             |    1    |   fail_accuracy    |
|           densenet121           |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 474.817138 |
|    detectron2_fcos_r_50_fpn     |    1    | 410.918027 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 223.907108 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 218.526583 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 207.945328 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 203.274395 |
|              maml               |    1    | 143.47218  |
|           hf_T5_large           |    1    | 111.60395  |
|           hf_T5_base            |    1    | 105.313573 |
|       speech_transformer        |    1    | 89.849785  |
|          hf_Longformer          |    1    | 84.682062  |
|           hf_Reformer           |    1    | 80.705191  |
|          basic_gnn_gcn          |    1    | 61.457272  |
|           densenet121           |    1    | 56.401453  |
|          fastNLP_Bert           |    1    | 54.194115  |
|            resnet152            |    1    | 52.621578  |
|  timm_vision_transformer_large  |    1    | 49.086597  |
|       doctr_det_predictor       |    1    | 44.997015  |
|          hf_GPT2_large          |    1    | 42.904914  |
|          hf_Bert_large          |    1    | 42.385881  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 42.286518  |
|            moondream            |    1    | 41.674602  |
|        hf_distil_whisper        |    1    | 40.562294  |
|      torch_multimodal_clip      |    1    | 36.164042  |
|             demucs              |    1    |  34.69642  |
|           timm_regnet           |    1    |  33.09124  |
|           timm_nfnet            |    1    | 31.974003  |
|              hf_T5              |    1    |  31.39472  |
|             hf_Bart             |    1    | 30.383726  |
|       Background_Matting        |    1    | 29.137046  |
|             yolov3              |    1    | 28.920293  |
|        timm_efficientnet        |    1    | 27.868352  |
|          BERT_pytorch           |    1    | 27.831466  |
|             hf_Bert             |    1    |  26.30682  |
|        phlippe_densenet         |    1    | 25.835275  |
|      doctr_reco_predictor       |    1    | 25.613237  |
|       shufflenet_v2_x1_0        |    1    | 25.052377  |
|            hf_Albert            |    1    | 24.485577  |
|             hf_GPT2             |    1    | 23.922792  |
|       mobilenet_v3_large        |    1    | 23.208138  |
|              llama              |    1    | 22.996426  |
|     timm_vision_transformer     |    1    | 22.919127  |
|         opacus_cifar10          |    1    | 21.396318  |
|            resnet50             |    1    | 21.258583  |
|         resnext50_32x4d         |    1    | 21.257225  |
|           timm_vovnet           |    1    |  21.1211   |
|          timm_resnest           |    1    | 20.817592  |
|          mobilenet_v2           |    1    | 20.625634  |
|          hf_DistilBert          |    1    | 20.429439  |
|           mnasnet1_0            |    1    | 20.364231  |
|     pyhpc_isoneutral_mixing     |    1    | 20.236431  |
|      functorch_dp_cifar10       |    1    |  20.09656  |
|         pytorch_stargan         |   16    | 19.284807  |
|          pytorch_unet           |    1    | 19.029073  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 18.470013  |
|          squeezenet1_1          |    1    | 17.612915  |
|            resnet18             |    1    | 17.005829  |
|     pyhpc_equation_of_state     |    1    | 16.954571  |
|         LearningToPaint         |    1    | 16.862255  |
|              vgg16              |    1    | 16.779253  |
|         phlippe_resnet          |    1    | 16.690106  |
|             alexnet             |    1    | 16.378729  |
|               drq               |    1    | 15.816653  |
|     functorch_maml_omniglot     |    1    |  15.6225   |
|     nvidia_deeprecommender      |    1    | 15.404439  |
|          maml_omniglot          |    5    | 15.208903  |
|              dlrm               |    1    | 14.967527  |
|              dcgan              |    1    | 14.734053  |
|          basic_gnn_gin          |    1    | 14.434095  |
|        basic_gnn_edgecnn        |    1    |  14.39376  |
|         basic_gnn_sage          |    1    | 14.374733  |
|        soft_actor_critic        |   256   | 14.270062  |
|          lennard_jones          |    1    | 13.922013  |
|           tts_angular           |    1    | 13.719386  |
|   mobilenet_v2_quantized_qat    |    1    |  0.096427  |
|     resnet50_quantized_qat      |    1    |  0.070408  |
|        timm_efficientdet        |    0    |    0.0     |
|              moco               |    0    |    0.0     |
|         DALLE2_pytorch          |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.988087 |
|           hf_T5_base            |    1    | 0.988012 |
|             demucs              |    1    | 0.982915 |
|       Background_Matting        |    1    | 0.982589 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.982489 |
|          pytorch_unet           |    1    | 0.978305 |
|          hf_GPT2_large          |    1    | 0.974398 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.971876 |
|        basic_gnn_edgecnn        |    1    | 0.970343 |
|       doctr_det_predictor       |    1    | 0.969198 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.960801 |
|     resnet50_quantized_qat      |    1    | 0.955998 |
|         pytorch_stargan         |   16    | 0.94792  |
|         LearningToPaint         |    1    | 0.947329 |
|           hf_BigBird            |    1    | 0.944839 |
|      doctr_reco_predictor       |    1    | 0.942308 |
|          basic_gnn_gin          |    1    | 0.939349 |
|          basic_gnn_gcn          |    1    | 0.935439 |
|         basic_gnn_sage          |    1    | 0.933895 |
|   mobilenet_v2_quantized_qat    |    1    | 0.929351 |
|              llama              |    1    | 0.919458 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.916078 |
|      torch_multimodal_clip      |    1    | 0.901362 |
|        hf_distil_whisper        |    1    | 0.892625 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.887236 |
|           tts_angular           |    1    | 0.88511  |
|        soft_actor_critic        |   256   | 0.883895 |
|         opacus_cifar10          |    1    | 0.883032 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.874861 |
|        timm_efficientnet        |    1    | 0.874659 |
|          mobilenet_v2           |    1    | 0.865052 |
|          lennard_jones          |    1    | 0.860252 |
|          squeezenet1_1          |    1    | 0.85904  |
|          maml_omniglot          |    5    | 0.858065 |
|           mnasnet1_0            |    1    | 0.857355 |
|     functorch_maml_omniglot     |    1    | 0.854077 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.850221 |
|          timm_resnest           |    1    | 0.849928 |
|          fastNLP_Bert           |    1    | 0.848323 |
|       mobilenet_v3_large        |    1    | 0.848267 |
|              dcgan              |    1    | 0.844758 |
|       shufflenet_v2_x1_0        |    1    | 0.837838 |
|         phlippe_resnet          |    1    | 0.836623 |
|     pyhpc_equation_of_state     |    1    | 0.831321 |
|       speech_transformer        |    1    | 0.828959 |
|        phlippe_densenet         |    1    | 0.816662 |
|           timm_nfnet            |    1    | 0.81374  |
|     pyhpc_isoneutral_mixing     |    1    | 0.810064 |
|         resnext50_32x4d         |    1    | 0.809996 |
|           hf_T5_large           |    1    | 0.80706  |
|          hf_Bert_large          |    1    | 0.806619 |
|          hf_Longformer          |    1    | 0.805117 |
|            hf_Albert            |    1    | 0.80199  |
|     timm_vision_transformer     |    1    | 0.799858 |
|             hf_Bert             |    1    |  0.7971  |
|            moondream            |    1    | 0.796495 |
|              maml               |    1    | 0.795072 |
|             yolov3              |    1    | 0.779459 |
|          BERT_pytorch           |    1    | 0.778482 |
|          hf_DistilBert          |    1    | 0.777286 |
|            resnet50             |    1    | 0.768313 |
|             hf_GPT2             |    1    | 0.765608 |
|           densenet121           |    1    | 0.760839 |
|           timm_regnet           |    1    | 0.760733 |
|            resnet18             |    1    | 0.75965  |
|               drq               |    1    | 0.759587 |
|           timm_vovnet           |    1    | 0.759314 |
|             hf_Bart             |    1    | 0.758476 |
|              hf_T5              |    1    | 0.750431 |
|      functorch_dp_cifar10       |    1    | 0.744025 |
|             alexnet             |    1    | 0.739499 |
|           hf_Reformer           |    1    | 0.735344 |
|  timm_vision_transformer_large  |    1    | 0.732447 |
|              vgg16              |    1    | 0.719587 |
|            resnet152            |    1    | 0.692737 |
|     nvidia_deeprecommender      |    1    | 0.67234  |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26082.619496 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11586.067173 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11012.578296 |
|          hf_GPT2_large          |    1    | 10064.259015 |
|           hf_T5_large           |    1    | 7430.559949  |
|            moondream            |    1    |  7318.9163   |
|        hf_distil_whisper        |    1    |  6910.04135  |
|       Background_Matting        |    1    | 6641.100058  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5600.734388  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 4977.065811  |
|          pytorch_unet           |    1    | 4832.491929  |
|  timm_vision_transformer_large  |    1    |  2777.2461   |
|    detectron2_fcos_r_50_fpn     |    1    | 2519.961488  |
|             demucs              |    1    |  2195.1643   |
|         pytorch_stargan         |   16    | 2005.786071  |
|          hf_Bert_large          |    1    | 1756.097684  |
|       doctr_det_predictor       |    1    | 1719.273037  |
|           hf_BigBird            |    1    | 1460.016887  |
|      torch_multimodal_clip      |    1    | 1249.971402  |
|          hf_Longformer          |    1    | 1107.426574  |
|             hf_Bart             |    1    |  880.903553  |
|              hf_T5              |    1    |  754.857085  |
|             hf_Bert             |    1    |  685.781113  |
|       speech_transformer        |    1    |  668.264593  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  629.296966  |
|            hf_Albert            |    1    |  573.893601  |
|          fastNLP_Bert           |    1    |  522.654741  |
|             yolov3              |    1    |   426.0903   |
|           hf_Reformer           |    1    |  416.161513  |
|          hf_DistilBert          |    1    |  412.734829  |
|             hf_GPT2             |    1    |  353.787797  |
|        basic_gnn_edgecnn        |    1    |  228.792233  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  208.835224  |
|              vgg16              |    1    |  187.972782  |
|           timm_regnet           |    1    |  147.692343  |
|          BERT_pytorch           |    1    |  139.400109  |
|            resnet152            |    1    |  134.666412  |
|           timm_nfnet            |    1    |  94.753547   |
|           timm_vovnet           |    1    |  78.441841   |
|              maml               |    1    |  73.517509   |
|     timm_vision_transformer     |    1    |  57.414568   |
|     nvidia_deeprecommender      |    1    |  57.225322   |
|         resnext50_32x4d         |    1    |  56.123287   |
|           tts_angular           |    1    |  50.235992   |
|            resnet50             |    1    |  49.669512   |
|           densenet121           |    1    |  41.189619   |
|          basic_gnn_gcn          |    1    |  34.708267   |
|          timm_resnest           |    1    |  32.736482   |
|      doctr_reco_predictor       |    1    |  22.907694   |
|             alexnet             |    1    |  21.586174   |
|            resnet18             |    1    |  21.506797   |
|              llama              |    1    |  20.142673   |
|     resnet50_quantized_qat      |    1    |  18.004192   |
|          basic_gnn_gin          |    1    |  16.128457   |
|         basic_gnn_sage          |    1    |   15.85555   |
|        timm_efficientnet        |    1    |   12.39239   |
|         LearningToPaint         |    1    |   9.474081   |
|           mnasnet1_0            |    1    |   6.994136   |
|   mobilenet_v2_quantized_qat    |    1    |   6.905879   |
|          mobilenet_v2           |    1    |   6.843453   |
|       mobilenet_v3_large        |    1    |   6.465834   |
|          squeezenet1_1          |    1    |   5.439731   |
|       shufflenet_v2_x1_0        |    1    |   4.694951   |
|        soft_actor_critic        |   256   |   2.964363   |
|        phlippe_densenet         |    1    |   2.66758    |
|      functorch_dp_cifar10       |    1    |   2.142809   |
|         opacus_cifar10          |    1    |   2.117279   |
|               drq               |    1    |   1.752531   |
|              dcgan              |    1    |   1.607999   |
|         phlippe_resnet          |    1    |   1.180747   |
|     functorch_maml_omniglot     |    1    |   0.798892   |
|              dlrm               |    1    |   0.513427   |
|          maml_omniglot          |    5    |   0.500927   |
|     pyhpc_isoneutral_mixing     |    1    |   0.048596   |
|     pyhpc_equation_of_state     |    1    |   0.035186   |
|          lennard_jones          |    1    |   0.029907   |
|              moco               |    0    |     0.0      |
|         DALLE2_pytorch          |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 2.113761 |
|     MobileBertForQuestionAnswering      | 1  | 1.739199 |
|            XLNetLMHeadModel             | 1  | 1.389649 |
|         Speech2Text2ForCausalLM         | 1  | 1.359206 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.308402 |
|            YituTechConvBert             | 1  | 1.306986 |
|          DistilBertForMaskedLM          | 1  | 1.303598 |
|      GPT2ForSequenceClassification      | 1  | 1.302922 |
|     DistilBertForQuestionAnswering      | 1  | 1.29623  |
|       BlenderbotSmallForCausalLM        | 1  | 1.291971 |
|       DebertaForQuestionAnswering       | 1  | 1.290075 |
|           DebertaForMaskedLM            | 1  | 1.265099 |
|          BlenderbotForCausalLM          | 1  | 1.249268 |
|     M2M100ForConditionalGeneration      | 1  | 1.245572 |
|     PegasusForConditionalGeneration     | 1  | 1.244343 |
|             XGLMForCausalLM             | 1  | 1.238996 |
|           PegasusForCausalLM            | 1  | 1.237272 |
|       MT5ForConditionalGeneration       | 1  | 1.236065 |
|               GoogleFnet                | 1  | 1.22352  |
|       AlbertForQuestionAnswering        | 1  | 1.200045 |
|            AlbertForMaskedLM            | 1  | 1.198817 |
|           ElectraForCausalLM            | 1  | 1.191886 |
|               DistillGPT2               | 1  | 1.181152 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.177641 |
|        BertForQuestionAnswering         | 1  | 1.174271 |
|    MegatronBertForQuestionAnswering     | 1  | 1.166972 |
|         MegatronBertForCausalLM         | 1  | 1.165924 |
|          DebertaV2ForMaskedLM           | 1  | 1.165714 |
|       RobertaForQuestionAnswering       | 1  | 1.164654 |
|           RobertaForCausalLM            | 1  | 1.164082 |
|           LayoutLMForMaskedLM           | 1  | 1.15935  |
|                CamemBert                | 1  | 1.154527 |
|            TrOCRForCausalLM             | 1  | 1.149615 |
|             BertForMaskedLM             | 1  | 1.149524 |
|    LayoutLMForSequenceClassification    | 1  | 1.148657 |
|       ElectraForQuestionAnswering       | 1  | 1.143847 |
|     PLBartForConditionalGeneration      | 1  | 1.088251 |
|      MBartForConditionalGeneration      | 1  | 1.065128 |
|             BartForCausalLM             | 1  | 1.057182 |
|      BartForConditionalGeneration       | 1  | 1.053038 |
|             OPTForCausalLM              | 1  | 1.029648 |
|            PLBartForCausalLM            | 1  | 1.025695 |
|            MBartForCausalLM             | 1  | 1.021161 |
|          AllenaiLongformerBase          | 1  | 0.969904 |
|       T5ForConditionalGeneration        | 1  | 0.613871 |
|                 T5Small                 | 1  | 0.612569 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+------------+
|                  name                   | bs |  inductor  |
+-----------------------------------------+----+------------+
|          MobileBertForMaskedLM          | 1  | 120.637573 |
|     MobileBertForQuestionAnswering      | 1  | 118.76251  |
|          AllenaiLongformerBase          | 1  | 91.788215  |
|      BartForConditionalGeneration       | 1  | 56.036044  |
|     PegasusForConditionalGeneration     | 1  | 49.314641  |
|     M2M100ForConditionalGeneration      | 1  | 49.296684  |
|      MBartForConditionalGeneration      | 1  | 49.014705  |
|          BlenderbotForCausalLM          | 1  | 47.444744  |
|            XLNetLMHeadModel             | 1  | 45.931954  |
|             XGLMForCausalLM             | 1  | 45.925445  |
|          DebertaV2ForMaskedLM           | 1  | 43.077398  |
|         MegatronBertForCausalLM         | 1  | 43.041783  |
|      DebertaV2ForQuestionAnswering      | 1  |   42.944   |
|    MegatronBertForQuestionAnswering     | 1  | 42.914665  |
|       MT5ForConditionalGeneration       | 1  | 41.775736  |
| BlenderbotSmallForConditionalGeneration | 1  | 38.040219  |
|       T5ForConditionalGeneration        | 1  | 35.940725  |
|                 T5Small                 | 1  | 35.859311  |
|            YituTechConvBert             | 1  | 35.345416  |
|     PLBartForConditionalGeneration      | 1  | 31.583288  |
|            TrOCRForCausalLM             | 1  | 28.471696  |
|            MBartForCausalLM             | 1  | 28.453004  |
|             OPTForCausalLM              | 1  | 28.187882  |
|           PegasusForCausalLM            | 1  | 28.042126  |
|           ElectraForCausalLM            | 1  | 26.830242  |
|           LayoutLMForMaskedLM           | 1  | 26.612282  |
|                CamemBert                | 1  | 26.611845  |
|           RobertaForCausalLM            | 1  | 26.590693  |
|       ElectraForQuestionAnswering       | 1  | 26.529374  |
|             BertForMaskedLM             | 1  | 26.487682  |
|       RobertaForQuestionAnswering       | 1  | 26.461942  |
|        BertForQuestionAnswering         | 1  | 26.395605  |
|           DebertaForMaskedLM            | 1  | 26.326379  |
|    LayoutLMForSequenceClassification    | 1  | 26.286944  |
|       DebertaForQuestionAnswering       | 1  | 26.253945  |
|             BartForCausalLM             | 1  | 25.998262  |
|      GPT2ForSequenceClassification      | 1  | 22.966255  |
|       BlenderbotSmallForCausalLM        | 1  | 22.916105  |
|               GoogleFnet                | 1  | 21.384194  |
|            PLBartForCausalLM            | 1  | 21.199052  |
|          DistilBertForMaskedLM          | 1  | 21.005306  |
|     DistilBertForQuestionAnswering      | 1  | 20.877519  |
|         Speech2Text2ForCausalLM         | 1  | 20.524137  |
|               DistillGPT2               | 1  | 19.065841  |
|            AlbertForMaskedLM            | 1  | 18.070015  |
|       AlbertForQuestionAnswering        | 1  |  17.96147  |
+-----------------------------------------+----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.986596 |
|      MBartForConditionalGeneration      | 1  | 0.976935 |
|      GPT2ForSequenceClassification      | 1  | 0.953585 |
|          AllenaiLongformerBase          | 1  | 0.948594 |
|            MBartForCausalLM             | 1  | 0.925939 |
|            XLNetLMHeadModel             | 1  | 0.910529 |
|                 T5Small                 | 1  | 0.905641 |
|     PLBartForConditionalGeneration      | 1  | 0.905153 |
|       T5ForConditionalGeneration        | 1  | 0.904876 |
|            PLBartForCausalLM            | 1  | 0.904146 |
|       DebertaForQuestionAnswering       | 1  | 0.872812 |
|               GoogleFnet                | 1  | 0.855694 |
|       RobertaForQuestionAnswering       | 1  | 0.848969 |
|    LayoutLMForSequenceClassification    | 1  | 0.842798 |
|        BertForQuestionAnswering         | 1  | 0.840454 |
|       ElectraForQuestionAnswering       | 1  | 0.838662 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.835112 |
|    MegatronBertForQuestionAnswering     | 1  | 0.831803 |
|               DistillGPT2               | 1  | 0.829128 |
|           DebertaForMaskedLM            | 1  | 0.818905 |
|           LayoutLMForMaskedLM           | 1  | 0.813963 |
|         Speech2Text2ForCausalLM         | 1  | 0.813302 |
|           RobertaForCausalLM            | 1  | 0.811797 |
|            YituTechConvBert             | 1  | 0.810505 |
|         MegatronBertForCausalLM         | 1  | 0.810497 |
|                CamemBert                | 1  | 0.808209 |
|             BertForMaskedLM             | 1  | 0.806728 |
|           ElectraForCausalLM            | 1  | 0.803483 |
|     DistilBertForQuestionAnswering      | 1  | 0.799347 |
|          BlenderbotForCausalLM          | 1  | 0.797941 |
|          DebertaV2ForMaskedLM           | 1  | 0.797509 |
|             BartForCausalLM             | 1  | 0.794052 |
|       MT5ForConditionalGeneration       | 1  | 0.787932 |
|            TrOCRForCausalLM             | 1  | 0.787564 |
|      BartForConditionalGeneration       | 1  | 0.777262 |
|       BlenderbotSmallForCausalLM        | 1  | 0.762335 |
|           PegasusForCausalLM            | 1  | 0.750176 |
|          DistilBertForMaskedLM          | 1  | 0.746017 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.737222 |
|     MobileBertForQuestionAnswering      | 1  | 0.730919 |
|     PegasusForConditionalGeneration     | 1  | 0.715297 |
|     M2M100ForConditionalGeneration      | 1  | 0.704293 |
|          MobileBertForMaskedLM          | 1  | 0.70216  |
|             XGLMForCausalLM             | 1  | 0.697414 |
|            AlbertForMaskedLM            | 1  | 0.448363 |
|       AlbertForQuestionAnswering        | 1  | 0.442835 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12739.159356 |
|       AlbertForQuestionAnswering        | 1  | 12686.86742  |
|      MBartForConditionalGeneration      | 1  | 6126.369852  |
|      BartForConditionalGeneration       | 1  | 5651.755494  |
|             OPTForCausalLM              | 1  |  5192.40276  |
|          DebertaV2ForMaskedLM           | 1  | 5044.010449  |
|      DebertaV2ForQuestionAnswering      | 1  | 3949.125942  |
|            XLNetLMHeadModel             | 1  | 3086.994393  |
|            MBartForCausalLM             | 1  | 3026.824885  |
|          BlenderbotForCausalLM          | 1  | 2617.982155  |
|             BartForCausalLM             | 1  | 2544.098787  |
|                 T5Small                 | 1  | 2475.035664  |
|       T5ForConditionalGeneration        | 1  | 2471.552379  |
|          AllenaiLongformerBase          | 1  | 2392.070129  |
|     PLBartForConditionalGeneration      | 1  | 2164.335359  |
|         MegatronBertForCausalLM         | 1  | 2032.191011  |
|    MegatronBertForQuestionAnswering     | 1  | 1849.561606  |
|      GPT2ForSequenceClassification      | 1  | 1313.679037  |
|            PLBartForCausalLM            | 1  |  1216.17817  |
|             XGLMForCausalLM             | 1  |  827.171742  |
|           DebertaForMaskedLM            | 1  |  780.520589  |
|           RobertaForCausalLM            | 1  |  772.490843  |
|     M2M100ForConditionalGeneration      | 1  |  710.173044  |
|                CamemBert                | 1  |  687.869602  |
|            YituTechConvBert             | 1  |  683.084348  |
|             BertForMaskedLM             | 1  |  682.994662  |
|           LayoutLMForMaskedLM           | 1  |  680.976582  |
|     PegasusForConditionalGeneration     | 1  |  600.459222  |
|            TrOCRForCausalLM             | 1  |  582.60099   |
|       DebertaForQuestionAnswering       | 1  |  557.030655  |
|        BertForQuestionAnswering         | 1  |  541.774732  |
|    LayoutLMForSequenceClassification    | 1  |  541.72697   |
|       RobertaForQuestionAnswering       | 1  |  539.845299  |
|               DistillGPT2               | 1  |  500.624314  |
|               GoogleFnet                | 1  |  470.800512  |
|       MT5ForConditionalGeneration       | 1  |  299.374591  |
|           PegasusForCausalLM            | 1  |  298.244033  |
| BlenderbotSmallForConditionalGeneration | 1  |  142.647261  |
|           ElectraForCausalLM            | 1  |  133.013416  |
|          DistilBertForMaskedLM          | 1  |  98.874599   |
|       ElectraForQuestionAnswering       | 1  |  93.449023   |
|       BlenderbotSmallForCausalLM        | 1  |  83.457712   |
|     DistilBertForQuestionAnswering      | 1  |  63.138789   |
|          MobileBertForMaskedLM          | 1  |  62.826188   |
|     MobileBertForQuestionAnswering      | 1  |  36.582926   |
|         Speech2Text2ForCausalLM         | 1  |  18.133589   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.438052 |
|          inception_v3           | 1  | 2.275558 |
|       gluon_inception_v3        | 1  | 2.256363 |
|        adv_inception_v3         | 1  | 2.232624 |
|           dm_nfnet_f0           | 1  | 2.209935 |
|            nfnet_l0             | 1  | 2.189201 |
|         mobilenetv2_100         | 1  | 2.11511  |
|          ghostnet_100           | 1  | 2.071945 |
|          spnasnet_100           | 1  | 2.063889 |
|            levit_128            | 1  | 2.025375 |
|           mnasnet_100           | 1  | 2.017157 |
|           fbnetc_100            | 1  | 2.007317 |
|            lcnet_050            | 1  | 1.990666 |
|            hrnet_w18            | 1  | 1.984999 |
|            repvgg_a2            | 1  | 1.978506 |
|      mobilenetv3_large_100      | 1  | 1.953959 |
|           regnety_002           | 1  | 1.924051 |
|            fbnetv3_b            | 1  | 1.819121 |
|       tf_efficientnet_b0        | 1  | 1.800402 |
|           selecsls42b           | 1  | 1.781732 |
|           rexnet_100            | 1  | 1.773073 |
|             dla102              | 1  | 1.724229 |
|        ese_vovnet19b_dw         | 1  | 1.705927 |
|            tinynet_a            | 1  | 1.685647 |
|          botnet26t_256          | 1  | 1.684265 |
|       eca_botnext26ts_256       | 1  | 1.655748 |
|           resnest101e           | 1  | 1.638729 |
|          cspdarknet53           | 1  | 1.627017 |
|           res2next50            | 1  | 1.556932 |
|        eca_halonext26ts         | 1  | 1.542758 |
|         poolformer_m36          | 1  | 1.536719 |
|        res2net50_14w_8s         | 1  | 1.53515  |
|           volo_d1_224           | 1  | 1.506885 |
|        res2net101_26w_4s        | 1  |  1.4968  |
|           tf_mixnet_l           | 1  | 1.480195 |
|           mobilevit_s           | 1  | 1.473921 |
|         visformer_small         | 1  |  1.4256  |
|           convit_base           | 1  | 1.382318 |
|          gmixer_24_224          | 1  | 1.375919 |
|     swsl_resnext101_32x16d      | 1  | 1.349671 |
|        twins_pcpvt_base         | 1  | 1.33881  |
|            gernet_l             | 1  | 1.311857 |
|            mixnet_l             | 1  | 1.296564 |
|          resmlp_12_224          | 1  | 1.279289 |
|      beit_base_patch16_224      | 1  | 1.274332 |
|  swin_base_patch4_window7_224   | 1  | 1.263583 |
|        convmixer_768_32         | 1  | 1.250664 |
|         crossvit_9_240          | 1  | 1.204803 |
|          mixer_b16_224          | 1  | 1.204082 |
|      vit_base_patch16_224       | 1  | 1.198728 |
| deit_base_distilled_patch16_224 | 1  | 1.197105 |
|             dpn107              | 1  | 1.189172 |
|      xcit_large_24_p8_224       | 1  | 1.185698 |
|        tnt_s_patch16_224        | 1  | 1.183791 |
|          gmlp_s16_224           | 1  | 1.172889 |
|          jx_nest_base           | 1  | 1.168974 |
|            pit_b_224            | 1  | 1.161306 |
|          convnext_base          | 1  | 1.138153 |
|        sebotnet33ts_256         | 1  | 1.083967 |
|          cait_m36_384           | 1  | 0.990816 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|       eca_botnext26ts_256       | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          cait_m36_384           | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
|        eca_halonext26ts         | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|          botnet26t_256          | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|        convmixer_768_32         | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|        sebotnet33ts_256         | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|         coat_lite_mini          | 1  |  fail_to_run  |
|           resnest101e           | 1  | fail_accuracy |
|            levit_128            | 1  | fail_accuracy |
|          ghostnet_100           | 1  | fail_accuracy |
|          pnasnet5large          | 1  | fail_accuracy |
|        res2net101_26w_4s        | 1  | fail_accuracy |
|        res2net50_14w_8s         | 1  | fail_accuracy |
|             dla102              | 1  | fail_accuracy |
|     swsl_resnext101_32x16d      | 1  | fail_accuracy |
|           rexnet_100            | 1  | fail_accuracy |
|           selecsls42b           | 1  | fail_accuracy |
|           res2next50            | 1  | fail_accuracy |
|            hrnet_w18            | 1  | fail_accuracy |
|           volo_d1_224           | 1  | fail_accuracy |
|         visformer_small         | 1  | fail_accuracy |
|      xcit_large_24_p8_224       | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+------------+
|              name               | bs |  inductor  |
+---------------------------------+----+------------+
|          pnasnet5large          | 1  | 481.080356 |
|            hrnet_w18            | 1  | 263.532436 |
|        res2net101_26w_4s        | 1  | 78.797473  |
|           tf_mixnet_l           | 1  |  71.71143  |
|           resnest101e           | 1  |  70.91221  |
|            mixnet_l             | 1  | 67.717154  |
|          cait_m36_384           | 1  | 66.846219  |
|        res2net50_14w_8s         | 1  | 63.832244  |
|        twins_pcpvt_base         | 1  | 60.190997  |
|      xcit_large_24_p8_224       | 1  | 57.836276  |
|         poolformer_m36          | 1  | 56.014873  |
|  swin_base_patch4_window7_224   | 1  | 55.575525  |
|             dpn107              | 1  |  50.3724   |
|        tnt_s_patch16_224        | 1  | 45.273011  |
|          jx_nest_base           | 1  | 44.455207  |
|           mobilevit_s           | 1  | 42.878684  |
|            fbnetv3_b            | 1  | 42.173063  |
|          convnext_base          | 1  | 37.581832  |
|             dla102              | 1  | 36.693063  |
|          inception_v3           | 1  |  36.3823   |
|       gluon_inception_v3        | 1  | 36.297091  |
|        adv_inception_v3         | 1  | 36.288287  |
|          gmlp_s16_224           | 1  | 34.733482  |
|           volo_d1_224           | 1  | 34.480174  |
|          ghostnet_100           | 1  | 34.239749  |
|          gmixer_24_224          | 1  |  33.96075  |
|           res2next50            | 1  | 33.849254  |
|         crossvit_9_240          | 1  | 32.591554  |
|            tinynet_a            | 1  | 32.068317  |
|           dm_nfnet_f0           | 1  | 31.832617  |
|     swsl_resnext101_32x16d      | 1  | 31.674611  |
|        eca_halonext26ts         | 1  |  31.46007  |
|        sebotnet33ts_256         | 1  | 31.344934  |
|            levit_128            | 1  |  30.70978  |
|           rexnet_100            | 1  | 30.104411  |
|            nfnet_l0             | 1  | 29.701741  |
|        convmixer_768_32         | 1  | 29.668172  |
|       tf_efficientnet_b0        | 1  | 29.225028  |
|           convit_base           | 1  | 26.830989  |
|         visformer_small         | 1  | 26.030532  |
|       eca_botnext26ts_256       | 1  | 26.016179  |
|          cspdarknet53           | 1  | 25.176406  |
|           regnety_002           | 1  | 25.142063  |
|          botnet26t_256          | 1  | 25.081531  |
|            pit_b_224            | 1  | 25.078729  |
|      beit_base_patch16_224      | 1  | 24.149872  |
|      mobilenetv3_large_100      | 1  | 23.485607  |
| deit_base_distilled_patch16_224 | 1  | 23.226784  |
|      vit_base_patch16_224       | 1  | 23.137652  |
|           fbnetc_100            | 1  | 23.105855  |
|          spnasnet_100           | 1  | 22.868156  |
|          mixer_b16_224          | 1  | 22.861108  |
|            gernet_l             | 1  |  22.20291  |
|            repvgg_a2            | 1  | 22.093726  |
|         mobilenetv2_100         | 1  |  21.05914  |
|        ese_vovnet19b_dw         | 1  | 20.997999  |
|           mnasnet_100           | 1  | 20.867016  |
|          resmlp_12_224          | 1  | 19.954159  |
|           selecsls42b           | 1  | 19.326914  |
|            lcnet_050            | 1  | 18.207485  |
+---------------------------------+----+------------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.946115 |
|          pnasnet5large          | 1  | 0.929968 |
|        convmixer_768_32         | 1  | 0.918556 |
|            nfnet_l0             | 1  | 0.901468 |
|      xcit_large_24_p8_224       | 1  | 0.891273 |
|        ese_vovnet19b_dw         | 1  | 0.888476 |
|         mobilenetv2_100         | 1  | 0.882122 |
|            fbnetv3_b            | 1  | 0.876763 |
|           mnasnet_100           | 1  | 0.875886 |
|          spnasnet_100           | 1  | 0.874775 |
|       tf_efficientnet_b0        | 1  | 0.87464  |
|       eca_botnext26ts_256       | 1  | 0.870136 |
|           fbnetc_100            | 1  | 0.87012  |
|      mobilenetv3_large_100      | 1  | 0.867337 |
|            tinynet_a            | 1  | 0.86232  |
|           rexnet_100            | 1  | 0.861234 |
|            lcnet_050            | 1  | 0.859239 |
|         poolformer_m36          | 1  | 0.856467 |
|           dm_nfnet_f0           | 1  |  0.8541  |
|        eca_halonext26ts         | 1  | 0.853679 |
|           mobilevit_s           | 1  | 0.850403 |
|            mixnet_l             | 1  | 0.849892 |
|           tf_mixnet_l           | 1  | 0.848648 |
|          ghostnet_100           | 1  | 0.845709 |
|          botnet26t_256          | 1  | 0.841384 |
|           regnety_002           | 1  | 0.840866 |
|          resmlp_12_224          | 1  | 0.827353 |
|         visformer_small         | 1  | 0.818663 |
|           res2next50            | 1  | 0.817565 |
|            levit_128            | 1  | 0.808906 |
|          convnext_base          | 1  | 0.801304 |
|             dpn107              | 1  | 0.80088  |
|            hrnet_w18            | 1  | 0.799367 |
|        res2net50_14w_8s         | 1  | 0.796388 |
|          cspdarknet53           | 1  | 0.793692 |
|          gmlp_s16_224           | 1  | 0.792911 |
|          gmixer_24_224          | 1  | 0.790863 |
|           volo_d1_224           | 1  | 0.788059 |
|        tnt_s_patch16_224        | 1  | 0.784123 |
|         crossvit_9_240          | 1  | 0.783633 |
|        sebotnet33ts_256         | 1  | 0.782643 |
|           convit_base           | 1  | 0.780892 |
|          mixer_b16_224          | 1  | 0.77749  |
|        twins_pcpvt_base         | 1  | 0.776439 |
|             dla102              | 1  | 0.775679 |
|           resnest101e           | 1  | 0.775398 |
|          jx_nest_base           | 1  | 0.773385 |
|      beit_base_patch16_224      | 1  | 0.771681 |
|          inception_v3           | 1  | 0.762577 |
| deit_base_distilled_patch16_224 | 1  | 0.762493 |
|       gluon_inception_v3        | 1  | 0.762417 |
|        adv_inception_v3         | 1  | 0.761883 |
|      vit_base_patch16_224       | 1  | 0.75995  |
|            pit_b_224            | 1  | 0.754177 |
|           selecsls42b           | 1  | 0.740136 |
|  swin_base_patch4_window7_224   | 1  | 0.739625 |
|        res2net101_26w_4s        | 1  | 0.739578 |
|            gernet_l             | 1  | 0.735928 |
|            repvgg_a2            | 1  | 0.690867 |
|     swsl_resnext101_32x16d      | 1  | 0.640417 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 3536.465483 |
|      xcit_large_24_p8_224       | 1  | 1527.263702 |
|     swsl_resnext101_32x16d      | 1  |  436.36051  |
|          pnasnet5large          | 1  | 359.187724  |
|          convnext_base          | 1  | 306.278519  |
|             dpn107              | 1  | 259.554752  |
|        convmixer_768_32         | 1  | 241.856383  |
|          jx_nest_base           | 1  |  231.6869   |
|      beit_base_patch16_224      | 1  | 197.153992  |
| deit_base_distilled_patch16_224 | 1  | 196.011707  |
|  swin_base_patch4_window7_224   | 1  | 195.402108  |
|      vit_base_patch16_224       | 1  | 194.726434  |
|           convit_base           | 1  | 192.943444  |
|            pit_b_224            | 1  | 166.857284  |
|           resnest101e           | 1  |  161.70259  |
|           dm_nfnet_f0           | 1  | 156.317087  |
|          mixer_b16_224          | 1  | 139.384025  |
|         poolformer_m36          | 1  | 135.333998  |
|        res2net101_26w_4s        | 1  | 109.353868  |
|        twins_pcpvt_base         | 1  | 103.900511  |
|           volo_d1_224           | 1  |  93.381564  |
|            nfnet_l0             | 1  |  92.79485   |
|        tnt_s_patch16_224        | 1  |  91.576739  |
|             dla102              | 1  |  88.104097  |
|        sebotnet33ts_256         | 1  |  83.989304  |
|            hrnet_w18            | 1  |  82.260379  |
|          cspdarknet53           | 1  |  81.855299  |
|          inception_v3           | 1  |  71.281625  |
|        adv_inception_v3         | 1  |  71.130921  |
|       gluon_inception_v3        | 1  |  71.130828  |
|          gmlp_s16_224           | 1  |  67.709512  |
|         visformer_small         | 1  |  65.026948  |
|            repvgg_a2            | 1  |  62.606376  |
|        res2net50_14w_8s         | 1  |  61.751943  |
|          gmixer_24_224          | 1  |  61.621095  |
|           res2next50            | 1  |  57.494934  |
|            gernet_l             | 1  |  55.510764  |
|          botnet26t_256          | 1  |  43.688905  |
|           selecsls42b           | 1  |  43.549109  |
|        eca_halonext26ts         | 1  |  43.424461  |
|           mobilevit_s           | 1  |  40.948357  |
|       eca_botnext26ts_256       | 1  |  39.79775   |
|          resmlp_12_224          | 1  |  35.235414  |
|         crossvit_9_240          | 1  |  32.328929  |
|        ese_vovnet19b_dw         | 1  |  30.440251  |
|            mixnet_l             | 1  |  27.486721  |
|           tf_mixnet_l           | 1  |  26.774111  |
|            fbnetv3_b            | 1  |  14.245074  |
|           rexnet_100            | 1  |  12.473455  |
|       tf_efficientnet_b0        | 1  |  12.44522   |
|            tinynet_a            | 1  |  11.105364  |
|           fbnetc_100            | 1  |  8.455067   |
|            levit_128            | 1  |  8.164185   |
|          ghostnet_100           | 1  |  7.699794   |
|          spnasnet_100           | 1  |  7.589269   |
|           mnasnet_100           | 1  |  6.983407   |
|         mobilenetv2_100         | 1  |  6.821537   |
|      mobilenetv3_large_100      | 1  |  6.536221   |
|           regnety_002           | 1  |  5.698562   |
|            lcnet_050            | 1  |  2.228523   |
+---------------------------------+----+-------------+

@zxd1997066
Copy link
Contributor

[dynamic] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-04-28 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 7478b7f1cac9686f00edf3db4667cf86d2421531
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 86%, 68/79 | 100%, 46/46 | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.52x    |    1.20x    |    1.47x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   26.03    |    26.80    |    34.48    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.86x    |    0.81x    |    0.82x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 66.578558 |
|     pyhpc_equation_of_state     |    1    | 24.44511  |
|         basic_gnn_sage          |    1    | 3.605212  |
|          basic_gnn_gin          |    1    | 3.502769  |
|          squeezenet1_1          |    1    | 3.364795  |
|     functorch_maml_omniglot     |    1    | 3.350476  |
|          basic_gnn_gcn          |    1    | 2.858916  |
|           timm_nfnet            |    1    |  2.74568  |
|          maml_omniglot          |    5    | 2.744087  |
|         opacus_cifar10          |    1    |  2.22631  |
|            resnet18             |    1    | 2.184526  |
|              dcgan              |    1    | 2.162255  |
|       shufflenet_v2_x1_0        |    1    | 2.069298  |
|      functorch_dp_cifar10       |    1    | 1.968973  |
|          timm_resnest           |    1    | 1.964811  |
|          mobilenet_v2           |    1    | 1.870158  |
|          lennard_jones          |    1    | 1.823808  |
|            resnet50             |    1    | 1.773259  |
|           mnasnet1_0            |    1    | 1.749609  |
|       mobilenet_v3_large        |    1    | 1.704665  |
|         phlippe_resnet          |    1    | 1.659934  |
|            resnet152            |    1    | 1.657347  |
|           densenet121           |    1    | 1.640478  |
|        timm_efficientnet        |    1    | 1.607404  |
|           timm_vovnet           |    1    |  1.59196  |
|         LearningToPaint         |    1    | 1.569541  |
|      doctr_reco_predictor       |    1    | 1.490622  |
|         resnext50_32x4d         |    1    |  1.48451  |
|           timm_regnet           |    1    | 1.479792  |
|        phlippe_densenet         |    1    | 1.474649  |
|              vgg16              |    1    | 1.446705  |
|        basic_gnn_edgecnn        |    1    | 1.399028  |
|             yolov3              |    1    | 1.381054  |
|              llama              |    1    | 1.380079  |
|             alexnet             |    1    | 1.351448  |
|          BERT_pytorch           |    1    | 1.290307  |
|       doctr_det_predictor       |    1    | 1.281657  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.280469  |
|            hf_Albert            |    1    | 1.280168  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.277267  |
|              maml               |    1    | 1.250357  |
|             hf_GPT2             |    1    | 1.246574  |
|          hf_GPT2_large          |    1    | 1.222088  |
|            moondream            |    1    | 1.217769  |
|               drq               |    1    | 1.213718  |
|          fastNLP_Bert           |    1    | 1.201658  |
|         pytorch_stargan         |   16    | 1.192251  |
|     timm_vision_transformer     |    1    |  1.19041  |
|          hf_Bert_large          |    1    | 1.171629  |
|  timm_vision_transformer_large  |    1    |  1.16956  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.155335  |
|             hf_Bert             |    1    | 1.153443  |
|           hf_BigBird            |    1    | 1.153437  |
|          hf_DistilBert          |    1    | 1.141896  |
|              dlrm               |    1    |  1.13228  |
|      torch_multimodal_clip      |    1    | 1.131613  |
|             hf_Bart             |    1    | 1.092221  |
|       speech_transformer        |    1    |  1.0751   |
|        hf_distil_whisper        |    1    |  1.06849  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.060874  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.052111  |
|        soft_actor_critic        |   256   |  1.04379  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.035718  |
|          pytorch_unet           |    1    | 1.029416  |
|          hf_Longformer          |    1    | 1.018274  |
|             demucs              |    1    | 1.001616  |
|           tts_angular           |    1    | 0.999715  |
|     resnet50_quantized_qat      |    1    |  0.99558  |
|   mobilenet_v2_quantized_qat    |    1    | 0.977655  |
|     nvidia_deeprecommender      |    1    | 0.973097  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  0.90854  |
|           hf_Reformer           |    1    | 0.850767  |
|       Background_Matting        |    1    | 0.800127  |
|           hf_T5_large           |    1    | 0.788488  |
|              hf_T5              |    1    | 0.707373  |
|           hf_T5_base            |    1    |  0.5994   |
|        timm_efficientdet        |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
|              moco               |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|         basic_gnn_sage          |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|               drq               |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|              llama              |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|              hf_T5              |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|       speech_transformer        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|           timm_regnet           |    1    |        pass        |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_fpn |    1    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
|           Super_SloMo           |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    1    |    fail_to_run     |
|         resnext50_32x4d         |    1    |   fail_accuracy    |
|            resnet152            |    1    |   fail_accuracy    |
|       shufflenet_v2_x1_0        |    1    |   fail_accuracy    |
|           mnasnet1_0            |    1    |   fail_accuracy    |
|          mobilenet_v2           |    1    |   fail_accuracy    |
|            resnet18             |    1    |   fail_accuracy    |
|            resnet50             |    1    |   fail_accuracy    |
|           densenet121           |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           hf_T5_base            |    1    | 98.488572 |
|           densenet121           |    1    | 90.608981 |
|           hf_BigBird            |    1    | 75.891181 |
|           hf_T5_large           |    1    | 71.324146 |
|    detectron2_fcos_r_50_fpn     |    1    | 69.741467 |
|              maml               |    1    | 54.00281  |
|          hf_Longformer          |    1    | 51.87178  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 50.384091 |
|           timm_nfnet            |    1    | 48.479256 |
|           hf_Reformer           |    1    | 47.806647 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 43.62218  |
|        phlippe_densenet         |    1    | 42.909936 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 39.507231 |
|       speech_transformer        |    1    | 39.364594 |
|  timm_vision_transformer_large  |    1    | 36.212759 |
|      torch_multimodal_clip      |    1    | 35.798775 |
|             demucs              |    1    | 35.302845 |
|        timm_efficientnet        |    1    | 33.657216 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 33.470723 |
|              hf_T5              |    1    | 32.678603 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 32.006556 |
|       Background_Matting        |    1    | 30.286565 |
|             yolov3              |    1    | 28.977719 |
|         opacus_cifar10          |    1    | 28.687588 |
|        hf_distil_whisper        |    1    | 28.176453 |
|            moondream            |    1    | 27.802987 |
|      functorch_dp_cifar10       |    1    | 27.322731 |
|          hf_GPT2_large          |    1    | 26.86001  |
|          hf_Bert_large          |    1    | 25.717667 |
|          timm_resnest           |    1    | 25.619557 |
|       doctr_det_predictor       |    1    | 24.81239  |
|       shufflenet_v2_x1_0        |    1    | 24.157739 |
|       mobilenet_v3_large        |    1    | 24.115207 |
|              llama              |    1    | 23.96101  |
|          BERT_pytorch           |    1    | 22.902152 |
|          fastNLP_Bert           |    1    | 22.437674 |
|             hf_Bart             |    1    | 22.374856 |
|           timm_vovnet           |    1    | 21.704618 |
|           timm_regnet           |    1    | 21.338272 |
|     timm_vision_transformer     |    1    | 21.30465  |
|          pytorch_unet           |    1    | 21.105211 |
|            hf_Albert            |    1    | 20.095025 |
|             hf_GPT2             |    1    | 19.542618 |
|         pytorch_stargan         |   16    | 19.314272 |
|          hf_DistilBert          |    1    | 19.120913 |
|             hf_Bert             |    1    | 18.879341 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 17.981347 |
|          squeezenet1_1          |    1    | 16.057929 |
|            resnet152            |    1    | 15.636389 |
|              vgg16              |    1    | 15.052367 |
|      doctr_reco_predictor       |    1    | 13.716837 |
|             alexnet             |    1    | 12.663836 |
|     pyhpc_isoneutral_mixing     |    1    | 12.151191 |
|         resnext50_32x4d         |    1    | 11.53296  |
|            resnet50             |    1    | 11.502254 |
|               drq               |    1    | 11.08981  |
|              dlrm               |    1    | 10.70794  |
|            resnet18             |    1    | 10.167229 |
|          mobilenet_v2           |    1    | 10.041572 |
|           mnasnet1_0            |    1    | 9.875949  |
|     functorch_maml_omniglot     |    1    | 9.829261  |
|          basic_gnn_gcn          |    1    | 9.524705  |
|     nvidia_deeprecommender      |    1    | 9.322024  |
|          basic_gnn_gin          |    1    |  9.11122  |
|        basic_gnn_edgecnn        |    1    | 9.045797  |
|         LearningToPaint         |    1    | 9.004123  |
|     pyhpc_equation_of_state     |    1    | 8.834462  |
|          maml_omniglot          |    5    | 8.776954  |
|         phlippe_resnet          |    1    | 8.721136  |
|        soft_actor_critic        |   256   | 8.151572  |
|         basic_gnn_sage          |    1    | 7.991637  |
|          lennard_jones          |    1    |  5.86106  |
|              dcgan              |    1    | 5.793419  |
|           tts_angular           |    1    | 5.611985  |
|   mobilenet_v2_quantized_qat    |    1    |  0.09695  |
|     resnet50_quantized_qat      |    1    | 0.070084  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.98827  |
|           hf_T5_base            |    1    | 0.987838 |
|       Background_Matting        |    1    | 0.982721 |
|             demucs              |    1    | 0.981996 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981716 |
|          pytorch_unet           |    1    | 0.978263 |
|          hf_GPT2_large          |    1    | 0.977997 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.972757 |
|        basic_gnn_edgecnn        |    1    | 0.970317 |
|       doctr_det_predictor       |    1    | 0.969481 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.962137 |
|     resnet50_quantized_qat      |    1    | 0.955934 |
|         LearningToPaint         |    1    | 0.94917  |
|           hf_BigBird            |    1    | 0.946353 |
|         pytorch_stargan         |   16    | 0.943504 |
|      doctr_reco_predictor       |    1    | 0.942369 |
|          basic_gnn_gin          |    1    | 0.938452 |
|         basic_gnn_sage          |    1    | 0.936504 |
|          basic_gnn_gcn          |    1    | 0.934673 |
|   mobilenet_v2_quantized_qat    |    1    | 0.929187 |
|      torch_multimodal_clip      |    1    | 0.92486  |
|              llama              |    1    | 0.919588 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.918225 |
|        hf_distil_whisper        |    1    | 0.916807 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.887483 |
|        soft_actor_critic        |   256   | 0.887324 |
|           tts_angular           |    1    | 0.88674  |
|         opacus_cifar10          |    1    | 0.88225  |
|        timm_efficientnet        |    1    | 0.875998 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.875494 |
|          mobilenet_v2           |    1    | 0.86712  |
|          lennard_jones          |    1    | 0.86092  |
|          maml_omniglot          |    5    | 0.859459 |
|          squeezenet1_1          |    1    | 0.857416 |
|          fastNLP_Bert           |    1    | 0.85431  |
|     functorch_maml_omniglot     |    1    | 0.853921 |
|           mnasnet1_0            |    1    | 0.853526 |
|          timm_resnest           |    1    |  0.8514  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.850291 |
|              dcgan              |    1    | 0.844914 |
|       mobilenet_v3_large        |    1    | 0.843488 |
|       shufflenet_v2_x1_0        |    1    | 0.835739 |
|         phlippe_resnet          |    1    | 0.835697 |
|     pyhpc_equation_of_state     |    1    | 0.832006 |
|       speech_transformer        |    1    | 0.826526 |
|            moondream            |    1    | 0.825697 |
|        phlippe_densenet         |    1    | 0.817503 |
|           timm_nfnet            |    1    |  0.8141  |
|          hf_Bert_large          |    1    | 0.813577 |
|         resnext50_32x4d         |    1    | 0.812163 |
|     pyhpc_isoneutral_mixing     |    1    | 0.808527 |
|           hf_T5_large           |    1    | 0.807316 |
|          hf_Longformer          |    1    | 0.805227 |
|     timm_vision_transformer     |    1    | 0.803459 |
|             hf_Bert             |    1    | 0.803058 |
|            hf_Albert            |    1    | 0.801699 |
|              maml               |    1    | 0.791307 |
|             hf_Bart             |    1    | 0.783842 |
|             yolov3              |    1    | 0.779309 |
|          BERT_pytorch           |    1    | 0.777464 |
|          hf_DistilBert          |    1    | 0.776603 |
|            resnet50             |    1    | 0.769095 |
|             hf_GPT2             |    1    | 0.767272 |
|            resnet18             |    1    | 0.761679 |
|               drq               |    1    | 0.761095 |
|           timm_regnet           |    1    | 0.760944 |
|           timm_vovnet           |    1    | 0.757546 |
|           densenet121           |    1    | 0.754084 |
|              hf_T5              |    1    | 0.747274 |
|           hf_Reformer           |    1    | 0.744715 |
|      functorch_dp_cifar10       |    1    | 0.742977 |
|             alexnet             |    1    | 0.739389 |
|  timm_vision_transformer_large  |    1    | 0.732722 |
|              vgg16              |    1    | 0.720223 |
|            resnet152            |    1    | 0.692317 |
|     nvidia_deeprecommender      |    1    | 0.672463 |
|              moco               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26159.758976 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11789.795061 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11193.992077 |
|          hf_GPT2_large          |    1    | 10114.446553 |
|           hf_T5_large           |    1    | 7505.376771  |
|            moondream            |    1    | 7347.809799  |
|       Background_Matting        |    1    | 6969.580137  |
|        hf_distil_whisper        |    1    | 6950.017115  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5668.222352  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 5038.187497  |
|          pytorch_unet           |    1    | 4914.373383  |
|  timm_vision_transformer_large  |    1    | 2782.485488  |
|    detectron2_fcos_r_50_fpn     |    1    | 2544.569042  |
|             demucs              |    1    | 2370.141233  |
|         pytorch_stargan         |   16    | 2016.539393  |
|          hf_Bert_large          |    1    | 1766.017707  |
|       doctr_det_predictor       |    1    | 1740.920514  |
|           hf_BigBird            |    1    | 1475.844665  |
|      torch_multimodal_clip      |    1    | 1245.866869  |
|          hf_Longformer          |    1    | 1113.629884  |
|             hf_Bart             |    1    |  892.146302  |
|              hf_T5              |    1    |  765.805865  |
|             hf_Bert             |    1    |  678.757872  |
|       speech_transformer        |    1    |  675.194685  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  628.421833  |
|            hf_Albert            |    1    |  571.796644  |
|          fastNLP_Bert           |    1    |  523.728554  |
|             yolov3              |    1    |  429.153096  |
|          hf_DistilBert          |    1    |  415.423921  |
|           hf_Reformer           |    1    |  415.339092  |
|             hf_GPT2             |    1    |  355.282928  |
|        basic_gnn_edgecnn        |    1    |  232.906443  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  210.142058  |
|              vgg16              |    1    |  189.922965  |
|           timm_regnet           |    1    |  150.133828  |
|          BERT_pytorch           |    1    |  141.948793  |
|            resnet152            |    1    |  137.890312  |
|           timm_nfnet            |    1    |  97.330605   |
|           timm_vovnet           |    1    |  80.353371   |
|              maml               |    1    |  73.935719   |
|     timm_vision_transformer     |    1    |  59.169992   |
|     nvidia_deeprecommender      |    1    |  58.193684   |
|         resnext50_32x4d         |    1    |  57.202517   |
|           tts_angular           |    1    |  54.050406   |
|            resnet50             |    1    |  51.206882   |
|           densenet121           |    1    |  45.804359   |
|          basic_gnn_gcn          |    1    |  35.553027   |
|          timm_resnest           |    1    |  33.642584   |
|      doctr_reco_predictor       |    1    |  23.567953   |
|              llama              |    1    |  22.832175   |
|            resnet18             |    1    |  22.275414   |
|             alexnet             |    1    |  22.215602   |
|     resnet50_quantized_qat      |    1    |  18.194176   |
|          basic_gnn_gin          |    1    |  16.580051   |
|         basic_gnn_sage          |    1    |  16.489286   |
|        timm_efficientnet        |    1    |   13.70689   |
|         LearningToPaint         |    1    |   9.84373    |
|           mnasnet1_0            |    1    |   7.852941   |
|          mobilenet_v2           |    1    |   7.677192   |
|       mobilenet_v3_large        |    1    |   7.450555   |
|   mobilenet_v2_quantized_qat    |    1    |   7.023294   |
|          squeezenet1_1          |    1    |   5.937285   |
|       shufflenet_v2_x1_0        |    1    |   5.537944   |
|        phlippe_densenet         |    1    |   3.477018   |
|        soft_actor_critic        |   256   |   3.399957   |
|      functorch_dp_cifar10       |    1    |    2.558     |
|         opacus_cifar10          |    1    |   2.412909   |
|               drq               |    1    |   1.905732   |
|              dcgan              |    1    |   1.719469   |
|         phlippe_resnet          |    1    |   1.355113   |
|     functorch_maml_omniglot     |    1    |   0.859651   |
|          maml_omniglot          |    5    |   0.798828   |
|              dlrm               |    1    |   0.705422   |
|     pyhpc_equation_of_state     |    1    |   0.044329   |
|     pyhpc_isoneutral_mixing     |    1    |   0.04178    |
|          lennard_jones          |    1    |   0.038856   |
|              moco               |    0    |     0.0      |
|         DALLE2_pytorch          |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 1.974522 |
|     MobileBertForQuestionAnswering      | 1  | 1.571427 |
|            XLNetLMHeadModel             | 1  | 1.38139  |
|            YituTechConvBert             | 1  | 1.318624 |
|         Speech2Text2ForCausalLM         | 1  | 1.309479 |
|      GPT2ForSequenceClassification      | 1  | 1.308067 |
|          DistilBertForMaskedLM          | 1  | 1.298637 |
|       DebertaForQuestionAnswering       | 1  | 1.296045 |
|     DistilBertForQuestionAnswering      | 1  | 1.28692  |
| BlenderbotSmallForConditionalGeneration | 1  | 1.286615 |
|       BlenderbotSmallForCausalLM        | 1  | 1.26968  |
|           DebertaForMaskedLM            | 1  | 1.264681 |
|       MT5ForConditionalGeneration       | 1  | 1.252565 |
|          BlenderbotForCausalLM          | 1  | 1.247126 |
|     M2M100ForConditionalGeneration      | 1  | 1.236927 |
|     PegasusForConditionalGeneration     | 1  | 1.235291 |
|           PegasusForCausalLM            | 1  | 1.233967 |
|             XGLMForCausalLM             | 1  | 1.232036 |
|               GoogleFnet                | 1  | 1.231025 |
|            AlbertForMaskedLM            | 1  | 1.204472 |
|       AlbertForQuestionAnswering        | 1  | 1.201661 |
|               DistillGPT2               | 1  | 1.18669  |
|      DebertaV2ForQuestionAnswering      | 1  | 1.176705 |
|    MegatronBertForQuestionAnswering     | 1  | 1.170148 |
|             BertForMaskedLM             | 1  | 1.168826 |
|          DebertaV2ForMaskedLM           | 1  | 1.167853 |
|                CamemBert                | 1  | 1.167374 |
|        BertForQuestionAnswering         | 1  | 1.166408 |
|           ElectraForCausalLM            | 1  | 1.162757 |
|           LayoutLMForMaskedLM           | 1  | 1.160314 |
|       RobertaForQuestionAnswering       | 1  | 1.15808  |
|         MegatronBertForCausalLM         | 1  | 1.157557 |
|           RobertaForCausalLM            | 1  | 1.156802 |
|            TrOCRForCausalLM             | 1  | 1.147197 |
|       ElectraForQuestionAnswering       | 1  | 1.14362  |
|    LayoutLMForSequenceClassification    | 1  | 1.141428 |
|     PLBartForConditionalGeneration      | 1  | 1.082203 |
|             BartForCausalLM             | 1  | 1.067349 |
|      MBartForConditionalGeneration      | 1  | 1.062829 |
|      BartForConditionalGeneration       | 1  | 1.041643 |
|             OPTForCausalLM              | 1  | 1.032334 |
|            PLBartForCausalLM            | 1  | 1.02583  |
|            MBartForCausalLM             | 1  | 1.014814 |
|          AllenaiLongformerBase          | 1  | 0.971755 |
|       T5ForConditionalGeneration        | 1  | 0.618784 |
|                 T5Small                 | 1  | 0.617959 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 58.889381 |
|          MobileBertForMaskedLM          | 1  | 44.286654 |
|     MobileBertForQuestionAnswering      | 1  | 43.110002 |
|     PegasusForConditionalGeneration     | 1  | 40.874944 |
|     M2M100ForConditionalGeneration      | 1  | 40.382197 |
|      MBartForConditionalGeneration      | 1  | 39.567776 |
|       T5ForConditionalGeneration        | 1  | 38.427248 |
|                 T5Small                 | 1  | 38.41361  |
|          BlenderbotForCausalLM          | 1  | 37.946199 |
|       MT5ForConditionalGeneration       | 1  | 36.122393 |
|            XLNetLMHeadModel             | 1  | 34.826871 |
|             XGLMForCausalLM             | 1  | 34.060976 |
|          DebertaV2ForMaskedLM           | 1  | 31.407515 |
| BlenderbotSmallForConditionalGeneration | 1  | 30.961931 |
|      DebertaV2ForQuestionAnswering      | 1  | 30.142001 |
|      BartForConditionalGeneration       | 1  | 29.441834 |
|         MegatronBertForCausalLM         | 1  | 28.719422 |
|            YituTechConvBert             | 1  | 28.648008 |
|     PLBartForConditionalGeneration      | 1  | 28.028613 |
|    MegatronBertForQuestionAnswering     | 1  | 27.533371 |
|             OPTForCausalLM              | 1  | 25.322076 |
|           PegasusForCausalLM            | 1  | 25.00271  |
|            MBartForCausalLM             | 1  | 24.528785 |
|            TrOCRForCausalLM             | 1  | 22.619233 |
|           DebertaForMaskedLM            | 1  | 22.43868  |
|       DebertaForQuestionAnswering       | 1  | 21.19917  |
|           RobertaForCausalLM            | 1  | 20.431969 |
|           ElectraForCausalLM            | 1  | 20.398139 |
|                CamemBert                | 1  | 20.310345 |
|          DistilBertForMaskedLM          | 1  | 19.654119 |
|       BlenderbotSmallForCausalLM        | 1  | 19.651891 |
|      GPT2ForSequenceClassification      | 1  | 19.407871 |
|           LayoutLMForMaskedLM           | 1  | 19.14808  |
|             BertForMaskedLM             | 1  | 19.071122 |
|       RobertaForQuestionAnswering       | 1  | 19.039675 |
|       ElectraForQuestionAnswering       | 1  | 19.033597 |
|         Speech2Text2ForCausalLM         | 1  | 18.859625 |
|    LayoutLMForSequenceClassification    | 1  | 18.839736 |
|            PLBartForCausalLM            | 1  | 18.808039 |
|             BartForCausalLM             | 1  | 18.559413 |
|     DistilBertForQuestionAnswering      | 1  | 18.435074 |
|        BertForQuestionAnswering         | 1  | 17.91447  |
|               GoogleFnet                | 1  | 17.575886 |
|               DistillGPT2               | 1  | 17.383993 |
|            AlbertForMaskedLM            | 1  | 14.315515 |
|       AlbertForQuestionAnswering        | 1  | 12.936108 |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.986426 |
|      MBartForConditionalGeneration      | 1  | 0.97675  |
|      GPT2ForSequenceClassification      | 1  | 0.955202 |
|          AllenaiLongformerBase          | 1  | 0.947944 |
|            MBartForCausalLM             | 1  | 0.92592  |
|            XLNetLMHeadModel             | 1  | 0.910342 |
|     PLBartForConditionalGeneration      | 1  | 0.909544 |
|                 T5Small                 | 1  | 0.904987 |
|       T5ForConditionalGeneration        | 1  | 0.904554 |
|            PLBartForCausalLM            | 1  | 0.903461 |
|       DebertaForQuestionAnswering       | 1  | 0.872294 |
|      BartForConditionalGeneration       | 1  | 0.865015 |
|       RobertaForQuestionAnswering       | 1  | 0.856327 |
|               GoogleFnet                | 1  | 0.85594  |
|    LayoutLMForSequenceClassification    | 1  | 0.848169 |
|        BertForQuestionAnswering         | 1  | 0.847391 |
|       ElectraForQuestionAnswering       | 1  | 0.841259 |
|    MegatronBertForQuestionAnswering     | 1  | 0.840374 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.835064 |
|               DistillGPT2               | 1  | 0.828222 |
|         MegatronBertForCausalLM         | 1  | 0.819091 |
|           LayoutLMForMaskedLM           | 1  | 0.817887 |
|           DebertaForMaskedLM            | 1  | 0.81569  |
|           RobertaForCausalLM            | 1  | 0.814076 |
|             BertForMaskedLM             | 1  | 0.813224 |
|         Speech2Text2ForCausalLM         | 1  | 0.812796 |
|                CamemBert                | 1  | 0.81258  |
|            YituTechConvBert             | 1  | 0.811937 |
|           ElectraForCausalLM            | 1  | 0.803701 |
|             BartForCausalLM             | 1  | 0.801414 |
|     DistilBertForQuestionAnswering      | 1  | 0.800064 |
|          BlenderbotForCausalLM          | 1  | 0.798207 |
|          DebertaV2ForMaskedLM           | 1  | 0.797119 |
|       MT5ForConditionalGeneration       | 1  | 0.787399 |
|            TrOCRForCausalLM             | 1  | 0.786985 |
|       BlenderbotSmallForCausalLM        | 1  | 0.763321 |
|           PegasusForCausalLM            | 1  |   0.75   |
|          DistilBertForMaskedLM          | 1  | 0.745758 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.738312 |
|     MobileBertForQuestionAnswering      | 1  | 0.732818 |
|     PegasusForConditionalGeneration     | 1  | 0.715279 |
|     M2M100ForConditionalGeneration      | 1  | 0.711892 |
|          MobileBertForMaskedLM          | 1  | 0.70298  |
|             XGLMForCausalLM             | 1  | 0.69701  |
|            AlbertForMaskedLM            | 1  | 0.447818 |
|       AlbertForQuestionAnswering        | 1  | 0.443468 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12747.008658 |
|       AlbertForQuestionAnswering        | 1  | 12720.783331 |
|      MBartForConditionalGeneration      | 1  | 6147.641314  |
|      BartForConditionalGeneration       | 1  | 5721.616915  |
|             OPTForCausalLM              | 1  | 5211.998647  |
|          DebertaV2ForMaskedLM           | 1  |  5054.00689  |
|      DebertaV2ForQuestionAnswering      | 1  | 3963.308705  |
|            XLNetLMHeadModel             | 1  | 3117.057222  |
|            MBartForCausalLM             | 1  | 3041.202863  |
|          BlenderbotForCausalLM          | 1  | 2632.260612  |
|             BartForCausalLM             | 1  | 2554.093623  |
|                 T5Small                 | 1  | 2487.430946  |
|       T5ForConditionalGeneration        | 1  |  2486.21857  |
|          AllenaiLongformerBase          | 1  | 2409.593661  |
|     PLBartForConditionalGeneration      | 1  | 2186.490453  |
|         MegatronBertForCausalLM         | 1  | 2041.990521  |
|    MegatronBertForQuestionAnswering     | 1  | 1865.185701  |
|      GPT2ForSequenceClassification      | 1  | 1314.513332  |
|            PLBartForCausalLM            | 1  | 1215.993842  |
|             XGLMForCausalLM             | 1  |  834.208323  |
|           DebertaForMaskedLM            | 1  |  785.008139  |
|           RobertaForCausalLM            | 1  |  782.456554  |
|     M2M100ForConditionalGeneration      | 1  |  716.703394  |
|                CamemBert                | 1  |  693.122501  |
|           LayoutLMForMaskedLM           | 1  |  688.435133  |
|            YituTechConvBert             | 1  |  684.326803  |
|             BertForMaskedLM             | 1  |  683.677804  |
|     PegasusForConditionalGeneration     | 1  |  607.940906  |
|            TrOCRForCausalLM             | 1  |  588.217376  |
|       DebertaForQuestionAnswering       | 1  |  559.544001  |
|    LayoutLMForSequenceClassification    | 1  |  550.675415  |
|        BertForQuestionAnswering         | 1  |  546.957786  |
|       RobertaForQuestionAnswering       | 1  |  545.748539  |
|               DistillGPT2               | 1  |  503.870736  |
|               GoogleFnet                | 1  |  473.70625   |
|       MT5ForConditionalGeneration       | 1  |  301.956985  |
|           PegasusForCausalLM            | 1  |  300.664582  |
| BlenderbotSmallForConditionalGeneration | 1  |  146.245595  |
|           ElectraForCausalLM            | 1  |  138.072919  |
|          DistilBertForMaskedLM          | 1  |  100.561633  |
|       ElectraForQuestionAnswering       | 1  |  94.803308   |
|       BlenderbotSmallForCausalLM        | 1  |  85.147198   |
|          MobileBertForMaskedLM          | 1  |  67.322683   |
|     DistilBertForQuestionAnswering      | 1  |  64.414767   |
|     MobileBertForQuestionAnswering      | 1  |  40.989815   |
|         Speech2Text2ForCausalLM         | 1  |  19.050534   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.381097 |
|          inception_v3           | 1  | 2.220087 |
|       gluon_inception_v3        | 1  | 2.19664  |
|           dm_nfnet_f0           | 1  | 2.189084 |
|        adv_inception_v3         | 1  | 2.171995 |
|            nfnet_l0             | 1  | 2.168832 |
|            repvgg_a2            | 1  | 1.969213 |
|         mobilenetv2_100         | 1  | 1.944787 |
|            hrnet_w18            | 1  | 1.91416  |
|          spnasnet_100           | 1  | 1.895752 |
|           fbnetc_100            | 1  | 1.836531 |
|           mnasnet_100           | 1  | 1.832137 |
|            levit_128            | 1  | 1.79366  |
|          ghostnet_100           | 1  | 1.762216 |
|            lcnet_050            | 1  | 1.758917 |
|      mobilenetv3_large_100      | 1  | 1.738832 |
|           selecsls42b           | 1  | 1.738304 |
|           regnety_002           | 1  | 1.691764 |
|             dla102              | 1  | 1.681453 |
|          botnet26t_256          | 1  | 1.659955 |
|           rexnet_100            | 1  | 1.659379 |
|        ese_vovnet19b_dw         | 1  | 1.655671 |
|            fbnetv3_b            | 1  | 1.652915 |
|       tf_efficientnet_b0        | 1  | 1.638887 |
|       eca_botnext26ts_256       | 1  | 1.632749 |
|           resnest101e           | 1  | 1.608308 |
|          cspdarknet53           | 1  | 1.583423 |
|        eca_halonext26ts         | 1  | 1.530132 |
|           res2next50            | 1  | 1.51181  |
|         poolformer_m36          | 1  | 1.510069 |
|            tinynet_a            | 1  | 1.496994 |
|           volo_d1_224           | 1  | 1.47174  |
|        res2net50_14w_8s         | 1  | 1.471739 |
|        res2net101_26w_4s        | 1  | 1.455652 |
|           mobilevit_s           | 1  |  1.4245  |
|         visformer_small         | 1  | 1.392237 |
|           convit_base           | 1  | 1.380081 |
|          gmixer_24_224          | 1  | 1.349336 |
|     swsl_resnext101_32x16d      | 1  | 1.341092 |
|           tf_mixnet_l           | 1  | 1.326725 |
|        twins_pcpvt_base         | 1  | 1.297464 |
|            gernet_l             | 1  |  1.2963  |
|      beit_base_patch16_224      | 1  | 1.270286 |
|          resmlp_12_224          | 1  | 1.266475 |
|  swin_base_patch4_window7_224   | 1  | 1.255689 |
|        convmixer_768_32         | 1  | 1.236443 |
|          mixer_b16_224          | 1  | 1.207354 |
| deit_base_distilled_patch16_224 | 1  | 1.197651 |
|      vit_base_patch16_224       | 1  | 1.197608 |
|             dpn107              | 1  | 1.196379 |
|      xcit_large_24_p8_224       | 1  | 1.184584 |
|            mixnet_l             | 1  | 1.168176 |
|          jx_nest_base           | 1  | 1.157228 |
|            pit_b_224            | 1  | 1.150825 |
|        tnt_s_patch16_224        | 1  | 1.140854 |
|          convnext_base          | 1  | 1.140681 |
|         crossvit_9_240          | 1  | 1.134073 |
|          gmlp_s16_224           | 1  | 1.124914 |
|        sebotnet33ts_256         | 1  | 1.105773 |
|          cait_m36_384           | 1  | 0.978681 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|       eca_botnext26ts_256       | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          cait_m36_384           | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
|        eca_halonext26ts         | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|          botnet26t_256          | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|        convmixer_768_32         | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|        sebotnet33ts_256         | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|         coat_lite_mini          | 1  |  fail_to_run  |
|           resnest101e           | 1  | fail_accuracy |
|            levit_128            | 1  | fail_accuracy |
|          ghostnet_100           | 1  | fail_accuracy |
|          pnasnet5large          | 1  | fail_accuracy |
|        res2net101_26w_4s        | 1  | fail_accuracy |
|        res2net50_14w_8s         | 1  | fail_accuracy |
|             dla102              | 1  | fail_accuracy |
|     swsl_resnext101_32x16d      | 1  | fail_accuracy |
|           rexnet_100            | 1  | fail_accuracy |
|           selecsls42b           | 1  | fail_accuracy |
|           res2next50            | 1  | fail_accuracy |
|            hrnet_w18            | 1  | fail_accuracy |
|           volo_d1_224           | 1  | fail_accuracy |
|         visformer_small         | 1  | fail_accuracy |
|      xcit_large_24_p8_224       | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|          pnasnet5large          | 1  | 80.76115  |
|  swin_base_patch4_window7_224   | 1  | 80.225002 |
|           tf_mixnet_l           | 1  | 68.429688 |
|             dpn107              | 1  | 62.760632 |
|        twins_pcpvt_base         | 1  | 61.092084 |
|           mobilevit_s           | 1  | 58.452084 |
|          jx_nest_base           | 1  | 58.20862  |
|        res2net50_14w_8s         | 1  | 56.943018 |
|           rexnet_100            | 1  | 55.400644 |
|      xcit_large_24_p8_224       | 1  | 54.263465 |
|          cait_m36_384           | 1  | 53.748778 |
|          ghostnet_100           | 1  | 52.925419 |
|            mixnet_l             | 1  | 50.875319 |
|        sebotnet33ts_256         | 1  | 50.41345  |
|         poolformer_m36          | 1  | 50.280887 |
|            levit_128            | 1  | 49.153748 |
|           dm_nfnet_f0           | 1  | 48.337575 |
|        eca_halonext26ts         | 1  | 48.282175 |
|         crossvit_9_240          | 1  | 48.233365 |
|        tnt_s_patch16_224        | 1  | 46.943698 |
|           volo_d1_224           | 1  | 42.783759 |
|       eca_botnext26ts_256       | 1  | 41.13635  |
|            hrnet_w18            | 1  | 40.402214 |
|        res2net101_26w_4s        | 1  | 39.59125  |
|       tf_efficientnet_b0        | 1  | 39.266803 |
|            nfnet_l0             | 1  | 37.794782 |
|          convnext_base          | 1  | 37.756871 |
|           resnest101e           | 1  | 37.561384 |
|       gluon_inception_v3        | 1  | 35.929346 |
|          inception_v3           | 1  | 35.908614 |
|        adv_inception_v3         | 1  | 35.822729 |
|            tinynet_a            | 1  | 34.450079 |
|           res2next50            | 1  | 34.360219 |
|            pit_b_224            | 1  | 33.993781 |
|           convit_base           | 1  | 30.865617 |
|          botnet26t_256          | 1  | 30.148866 |
|          cspdarknet53           | 1  | 27.585245 |
|            fbnetv3_b            | 1  | 26.808077 |
|             dla102              | 1  | 26.75566  |
|          gmlp_s16_224           | 1  | 26.194823 |
|          gmixer_24_224          | 1  | 25.26632  |
|         visformer_small         | 1  | 24.506484 |
|        ese_vovnet19b_dw         | 1  | 23.478354 |
|      mobilenetv3_large_100      | 1  | 23.340678 |
|      vit_base_patch16_224       | 1  | 21.488075 |
| deit_base_distilled_patch16_224 | 1  | 20.48127  |
|      beit_base_patch16_224      | 1  | 20.230063 |
|          mixer_b16_224          | 1  | 19.311011 |
|           regnety_002           | 1  | 19.016759 |
|            repvgg_a2            | 1  | 17.31675  |
|          resmlp_12_224          | 1  | 17.154303 |
|        convmixer_768_32         | 1  | 17.113037 |
|           selecsls42b           | 1  | 16.520432 |
|            lcnet_050            | 1  | 14.103602 |
|     swsl_resnext101_32x16d      | 1  | 12.787054 |
|           fbnetc_100            | 1  | 10.929933 |
|            gernet_l             | 1  | 10.909782 |
|          spnasnet_100           | 1  | 10.849554 |
|         mobilenetv2_100         | 1  | 10.535822 |
|           mnasnet_100           | 1  | 10.354088 |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.945888 |
|          pnasnet5large          | 1  | 0.929163 |
|        convmixer_768_32         | 1  | 0.917859 |
|            nfnet_l0             | 1  | 0.899788 |
|      xcit_large_24_p8_224       | 1  | 0.890321 |
|        ese_vovnet19b_dw         | 1  | 0.88804  |
|           mnasnet_100           | 1  | 0.876734 |
|            fbnetv3_b            | 1  | 0.876015 |
|         mobilenetv2_100         | 1  | 0.87495  |
|       tf_efficientnet_b0        | 1  | 0.873951 |
|          spnasnet_100           | 1  | 0.872562 |
|       eca_botnext26ts_256       | 1  | 0.867933 |
|      mobilenetv3_large_100      | 1  | 0.867037 |
|           fbnetc_100            | 1  | 0.866354 |
|            tinynet_a            | 1  | 0.86242  |
|           rexnet_100            | 1  | 0.858229 |
|            lcnet_050            | 1  | 0.857903 |
|         poolformer_m36          | 1  | 0.855086 |
|           dm_nfnet_f0           | 1  | 0.853978 |
|        eca_halonext26ts         | 1  | 0.852152 |
|           mobilevit_s           | 1  | 0.848361 |
|          ghostnet_100           | 1  | 0.845823 |
|           tf_mixnet_l           | 1  | 0.844196 |
|           regnety_002           | 1  | 0.843466 |
|          botnet26t_256          | 1  | 0.841147 |
|            mixnet_l             | 1  | 0.828235 |
|          resmlp_12_224          | 1  | 0.828006 |
|         visformer_small         | 1  | 0.820248 |
|           res2next50            | 1  | 0.816721 |
|            levit_128            | 1  | 0.808835 |
|             dpn107              | 1  | 0.802714 |
|          convnext_base          | 1  | 0.801069 |
|            hrnet_w18            | 1  | 0.799519 |
|        sebotnet33ts_256         | 1  | 0.798552 |
|        res2net50_14w_8s         | 1  | 0.795927 |
|          gmlp_s16_224           | 1  | 0.79446  |
|          cspdarknet53           | 1  | 0.794122 |
|          gmixer_24_224          | 1  | 0.792504 |
|           volo_d1_224           | 1  | 0.786042 |
|           convit_base           | 1  | 0.780599 |
|        tnt_s_patch16_224        | 1  | 0.780469 |
|         crossvit_9_240          | 1  | 0.779771 |
|          mixer_b16_224          | 1  | 0.776464 |
|             dla102              | 1  | 0.776446 |
|        twins_pcpvt_base         | 1  | 0.775517 |
|          jx_nest_base           | 1  | 0.773135 |
|           resnest101e           | 1  | 0.772901 |
|      beit_base_patch16_224      | 1  | 0.769753 |
|       gluon_inception_v3        | 1  | 0.763236 |
|        adv_inception_v3         | 1  | 0.763003 |
| deit_base_distilled_patch16_224 | 1  | 0.762628 |
|          inception_v3           | 1  | 0.762492 |
|      vit_base_patch16_224       | 1  | 0.759774 |
|            pit_b_224            | 1  | 0.754698 |
|        res2net101_26w_4s        | 1  | 0.741231 |
|           selecsls42b           | 1  | 0.739849 |
|  swin_base_patch4_window7_224   | 1  | 0.739621 |
|            gernet_l             | 1  | 0.736236 |
|            repvgg_a2            | 1  | 0.691188 |
|     swsl_resnext101_32x16d      | 1  | 0.640419 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 3645.977216 |
|      xcit_large_24_p8_224       | 1  | 1540.513662 |
|     swsl_resnext101_32x16d      | 1  | 442.886099  |
|          pnasnet5large          | 1  | 370.147929  |
|          convnext_base          | 1  | 308.288985  |
|             dpn107              | 1  | 261.655408  |
|        convmixer_768_32         | 1  | 247.783438  |
|          jx_nest_base           | 1  | 236.911653  |
|  swin_base_patch4_window7_224   | 1  |  199.72355  |
|      beit_base_patch16_224      | 1  | 199.234012  |
| deit_base_distilled_patch16_224 | 1  | 197.255687  |
|      vit_base_patch16_224       | 1  | 197.040937  |
|           convit_base           | 1  |  195.89069  |
|            pit_b_224            | 1  | 169.956996  |
|           resnest101e           | 1  | 166.907789  |
|           dm_nfnet_f0           | 1  | 159.527092  |
|          mixer_b16_224          | 1  | 140.212566  |
|         poolformer_m36          | 1  | 139.249541  |
|        res2net101_26w_4s        | 1  | 113.786149  |
|        twins_pcpvt_base         | 1  | 108.667251  |
|        tnt_s_patch16_224        | 1  |  95.994844  |
|           volo_d1_224           | 1  |  95.777638  |
|            nfnet_l0             | 1  |  94.771416  |
|             dla102              | 1  |  91.423625  |
|            hrnet_w18            | 1  |  86.314184  |
|        sebotnet33ts_256         | 1  |  85.688711  |
|          cspdarknet53           | 1  |  83.786503  |
|        adv_inception_v3         | 1  |  73.88816   |
|       gluon_inception_v3        | 1  |  73.732656  |
|          inception_v3           | 1  |  73.605849  |
|          gmlp_s16_224           | 1  |  71.489622  |
|         visformer_small         | 1  |  67.294416  |
|        res2net50_14w_8s         | 1  |  65.55286   |
|          gmixer_24_224          | 1  |  63.649912  |
|            repvgg_a2            | 1  |  63.484558  |
|           res2next50            | 1  |  60.081544  |
|            gernet_l             | 1  |  57.084978  |
|           selecsls42b           | 1  |  45.478177  |
|          botnet26t_256          | 1  |  44.985911  |
|        eca_halonext26ts         | 1  |  44.540195  |
|           mobilevit_s           | 1  |  42.51702   |
|       eca_botnext26ts_256       | 1  |  40.882935  |
|          resmlp_12_224          | 1  |  35.895851  |
|         crossvit_9_240          | 1  |  34.330646  |
|            mixnet_l             | 1  |  32.483721  |
|        ese_vovnet19b_dw         | 1  |  31.759659  |
|           tf_mixnet_l           | 1  |  30.777974  |
|            fbnetv3_b            | 1  |  15.958931  |
|       tf_efficientnet_b0        | 1  |  13.860855  |
|           rexnet_100            | 1  |  13.736213  |
|            tinynet_a            | 1  |  12.592869  |
|           fbnetc_100            | 1  |  9.538893   |
|            levit_128            | 1  |  9.369658   |
|          ghostnet_100           | 1  |  9.251238   |
|          spnasnet_100           | 1  |  8.422946   |
|           mnasnet_100           | 1  |  7.848893   |
|         mobilenetv2_100         | 1  |  7.636463   |
|      mobilenetv3_large_100      | 1  |  7.471556   |
|           regnety_002           | 1  |  6.552157   |
|            lcnet_050            | 1  |  2.578891   |
+---------------------------------+----+-------------+

@zxd1997066
Copy link
Contributor

[cppwrapper_dynamic_shape] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-04-28 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 7478b7f1cac9686f00edf3db4667cf86d2421531
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000" 

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 79%, 62/78 | 100%, 46/46 | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.40x    |    1.29x    |    1.83x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   47.81    |    45.42    |    44.65    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.90x    |    0.98x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 11.179114 |
|          squeezenet1_1          |   16    | 3.118098  |
|       mobilenet_v3_large        |   32    | 3.057882  |
|        timm_efficientnet        |   64    | 3.041233  |
|          mobilenet_v2           |   16    | 3.017871  |
|           mnasnet1_0            |   32    | 2.911596  |
|       shufflenet_v2_x1_0        |   64    | 2.473994  |
|          timm_resnest           |   32    | 2.329578  |
|            resnet50             |   32    | 2.245839  |
|        phlippe_densenet         |   128   | 2.089458  |
|           densenet121           |   64    | 2.028622  |
|            resnet152            |   32    | 2.023776  |
|       doctr_det_predictor       |    1    | 1.977255  |
|             hf_GPT2             |    1    | 1.957764  |
|           timm_regnet           |   32    | 1.892319  |
|           timm_nfnet            |   128   | 1.862052  |
|         resnext50_32x4d         |    8    | 1.846913  |
|         phlippe_resnet          |   128   | 1.835781  |
|            resnet18             |    8    |  1.72694  |
|           timm_vovnet           |   32    | 1.694703  |
|      doctr_reco_predictor       |    1    |  1.63348  |
|             alexnet             |   128   | 1.619161  |
|          hf_Bert_large          |    1    | 1.611278  |
|            hf_Albert            |    1    | 1.593442  |
|          fastNLP_Bert           |    1    |   1.552   |
|     functorch_maml_omniglot     |    1    | 1.548259  |
|             yolov3              |    8    | 1.538548  |
|            moondream            |    1    | 1.529398  |
|          hf_GPT2_large          |    1    |  1.51388  |
|             hf_Bert             |    1    |  1.51055  |
|          hf_Longformer          |    1    | 1.490801  |
|              dcgan              |   256   | 1.483835  |
|        basic_gnn_edgecnn        |    1    | 1.446573  |
|          basic_gnn_gcn          |    1    | 1.438588  |
|         LearningToPaint         |   96    | 1.429775  |
|          hf_DistilBert          |    1    | 1.359199  |
|              vgg16              |    4    | 1.312559  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.301895  |
|             hf_Bart             |    1    | 1.292789  |
|           hf_BigBird            |    1    | 1.277885  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.234261  |
|        hf_distil_whisper        |    1    | 1.224015  |
|          BERT_pytorch           |    2    | 1.220817  |
|         basic_gnn_sage          |    1    | 1.217002  |
|           hf_T5_large           |    1    | 1.215574  |
|         pytorch_stargan         |   16    | 1.204433  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.195958  |
|          pytorch_unet           |    1    | 1.183205  |
|          maml_omniglot          |    5    | 1.182224  |
|              dlrm               |  2048   | 1.178908  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.171055  |
|          basic_gnn_gin          |    1    | 1.153495  |
|        soft_actor_critic        |   256   | 1.151744  |
|      torch_multimodal_clip      |   32    | 1.142202  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.138834  |
|              hf_T5              |    1    | 1.109956  |
|               drq               |    1    | 1.074692  |
| detectron2_fasterrcnn_r_101_dc5 |    1    |  1.0715   |
|     nvidia_deeprecommender      |   256   | 1.061675  |
|     timm_vision_transformer     |   32    | 1.060005  |
|       speech_transformer        |    1    | 1.058744  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  1.05255  |
|           hf_Reformer           |    1    | 1.044952  |
|          lennard_jones          |  1000   | 1.023766  |
|             demucs              |    1    | 1.019191  |
|  timm_vision_transformer_large  |   32    | 1.010085  |
|     resnet50_quantized_qat      |   32    | 1.006891  |
|           tts_angular           |   64    | 1.002087  |
|   mobilenet_v2_quantized_qat    |   96    | 0.994545  |
|       Background_Matting        |    1    | 0.808597  |
|           hf_T5_base            |    1    | 0.788802  |
|              maml               |    1    | 0.693837  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.683386  |
|         opacus_cifar10          |   64    | 0.659891  |
|      functorch_dp_cifar10       |   64    | 0.605297  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+--------------------------------+---------+--------------------+
|              name              |   bs    |      inductor      |
+--------------------------------+---------+--------------------+
|          hf_T5_large           |    4    |  pass_due_to_skip  |
|       Background_Matting       |    1    |  pass_due_to_skip  |
| timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|         hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml              |    1    |  pass_due_to_skip  |
|         basic_gnn_sage         |    1    |        pass        |
|           hf_T5_base           |    4    |        pass        |
|      doctr_det_predictor       |    4    |        pass        |
|              dlrm              |    4    |        pass        |
|    detectron2_fcos_r_50_fpn    |    4    |        pass        |
|             demucs             |    1    |        pass        |
|         basic_gnn_gcn          |    1    |        pass        |
|         basic_gnn_gin          |    1    |        pass        |
|              drq               |    1    |        pass        |
|       basic_gnn_edgecnn        |    1    |        pass        |
|        LearningToPaint         |    4    |        pass        |
|      functorch_dp_cifar10      |    4    |        pass        |
|      doctr_reco_predictor      |    4    |        pass        |
|             yolov3             |    4    |        pass        |
|          fastNLP_Bert          |    4    |        pass        |
|         maml_omniglot          |    5    |        pass        |
|    functorch_maml_omniglot     |    1    |        pass        |
|            hf_Bart             |    4    |        pass        |
|            hf_Bert             |    4    |        pass        |
|             hf_T5              |    4    |        pass        |
|         hf_Bert_large          |    4    |        pass        |
|          hf_Reformer           |    4    |        pass        |
|         hf_Longformer          |    4    |        pass        |
|           hf_BigBird           |    4    |        pass        |
|         hf_DistilBert          |    4    |        pass        |
|            hf_GPT2             |    2    |        pass        |
|           hf_Albert            |    4    |        pass        |
|       hf_distil_whisper        |    4    |        pass        |
|            alexnet             |    4    |        pass        |
|        pytorch_stargan         |   16    |        pass        |
|         lennard_jones          |    4    |        pass        |
|         opacus_cifar10         |    4    |        pass        |
|    pyhpc_isoneutral_mixing     |    4    |        pass        |
|    pyhpc_equation_of_state     |    4    |        pass        |
|         phlippe_resnet         |    4    |        pass        |
|        phlippe_densenet        |    4    |        pass        |
|       mobilenet_v3_large       |    4    |        pass        |
|     nvidia_deeprecommender     |    4    |        pass        |
|             vgg16              |    4    |        pass        |
|  pytorch_CycleGAN_and_pix2pix  |    1    |        pass        |
|   mobilenet_v2_quantized_qat   |    4    |        pass        |
|             llama              |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy | 1048576 |        pass        |
|          BERT_pytorch          |    4    |        pass        |
|           moondream            |    4    |        pass        |
|          pytorch_unet          |    2    |        pass        |
|       soft_actor_critic        |   256   |        pass        |
|       speech_transformer       |    1    |        pass        |
|         squeezenet1_1          |    4    |        pass        |
|       timm_efficientnet        |    4    |        pass        |
|           timm_nfnet           |    4    |        pass        |
|          timm_regnet           |    4    |        pass        |
|          timm_resnest          |    4    |        pass        |
|    timm_vision_transformer     |    4    |        pass        |
|          timm_vovnet           |    4    |        pass        |
|     torch_multimodal_clip      |    4    |        pass        |
|          tts_angular           |    4    |        pass        |
|     resnet50_quantized_qat     |    4    |        pass        |
|       timm_efficientdet        |    0    | model_fail_to_load |
|              moco              |    0    | model_fail_to_load |
|         DALLE2_pytorch         |    0    | model_fail_to_load |
|          Super_SloMo           |    4    |    fail_to_run     |
|        vision_maskrcnn         |    1    |    fail_to_run     |
|             dcgan              |    4    |   fail_accuracy    |
|          densenet121           |    4    |   fail_accuracy    |
|       shufflenet_v2_x1_0       |    4    |   fail_accuracy    |
|          mobilenet_v2          |    4    |   fail_accuracy    |
|           mnasnet1_0           |    4    |   fail_accuracy    |
|        resnext50_32x4d         |    4    |   fail_accuracy    |
|            resnet50            |    4    |   fail_accuracy    |
|           resnet152            |    4    |   fail_accuracy    |
|            resnet18            |    4    |   fail_accuracy    |
+--------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 471.383926 |
|    detectron2_fcos_r_50_fpn     |    1    | 408.623695 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 232.12658  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 224.497121 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 217.491289 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 207.966294 |
|              maml               |    1    | 143.414828 |
|           hf_T5_large           |    1    | 107.620315 |
|       speech_transformer        |    1    | 90.634482  |
|          hf_Longformer          |    1    |  85.42802  |
|           hf_Reformer           |    1    | 80.468474  |
|  timm_vision_transformer_large  |   32    | 68.642023  |
|      torch_multimodal_clip      |   32    | 65.507888  |
|          basic_gnn_gcn          |    1    | 61.399549  |
|           densenet121           |   64    | 59.431631  |
|            resnet152            |   32    | 58.555315  |
|          fastNLP_Bert           |    1    | 55.518628  |
|          hf_GPT2_large          |    1    | 53.670393  |
|           hf_T5_base            |    1    | 50.784807  |
|            moondream            |    1    | 49.600337  |
|     pyhpc_isoneutral_mixing     | 1048576 | 48.175228  |
|       doctr_det_predictor       |    1    | 46.694142  |
|          hf_Bert_large          |    1    | 44.496222  |
|        hf_distil_whisper        |    1    | 43.619309  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 42.346922  |
|           timm_nfnet            |   128   | 38.164246  |
|           timm_regnet           |   32    | 37.823071  |
|          BERT_pytorch           |    2    | 36.701987  |
|             yolov3              |    8    | 35.871639  |
|             demucs              |    1    |  34.65403  |
|        timm_efficientnet        |   64    | 33.286116  |
|             hf_Bart             |    1    | 31.553728  |
|              hf_T5              |    1    | 31.026942  |
|        phlippe_densenet         |   128   | 29.356838  |
|       shufflenet_v2_x1_0        |   64    | 29.033587  |
|     timm_vision_transformer     |   32    | 28.813906  |
|       mobilenet_v3_large        |   32    | 27.870584  |
|             hf_Bert             |    1    | 27.156183  |
|         pytorch_stargan         |   16    | 25.911832  |
|         opacus_cifar10          |   64    | 25.872371  |
|            hf_Albert            |    1    | 25.767092  |
|      doctr_reco_predictor       |    1    | 25.527637  |
|             hf_GPT2             |    1    | 24.773516  |
|       Background_Matting        |    1    | 24.420854  |
|          mobilenet_v2           |   16    | 24.242856  |
|      functorch_dp_cifar10       |   64    | 24.193786  |
|          timm_resnest           |   32    | 23.728762  |
|           timm_vovnet           |   32    | 23.438831  |
|         resnext50_32x4d         |    8    | 23.212217  |
|            resnet50             |   32    | 23.109142  |
|           mnasnet1_0            |   32    | 21.735769  |
|          hf_DistilBert          |    1    | 21.058843  |
|          pytorch_unet           |    1    | 19.965117  |
|          squeezenet1_1          |   16    | 19.345133  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 19.319904  |
|            resnet18             |    8    |  17.91642  |
|         LearningToPaint         |   96    | 17.745057  |
|         phlippe_resnet          |   128   | 17.488919  |
|     pyhpc_equation_of_state     | 1048576 | 17.247382  |
|              vgg16              |    4    | 17.212579  |
|             alexnet             |   128   | 16.884894  |
|               drq               |    1    | 15.783213  |
|     functorch_maml_omniglot     |    1    | 15.670873  |
|          maml_omniglot          |    5    | 15.590346  |
|              dlrm               |  2048   |  15.26957  |
|              dcgan              |   256   | 15.011048  |
|        basic_gnn_edgecnn        |    1    | 14.812387  |
|     nvidia_deeprecommender      |   256   | 14.791309  |
|          basic_gnn_gin          |    1    | 14.783147  |
|         basic_gnn_sage          |    1    | 14.725238  |
|        soft_actor_critic        |   256   | 14.372834  |
|          lennard_jones          |  1000   | 13.960944  |
|           tts_angular           |   64    | 13.843544  |
|   mobilenet_v2_quantized_qat    |   96    |  0.132662  |
|     resnet50_quantized_qat      |   32    |  0.105036  |
|        timm_efficientdet        |    0    |    0.0     |
|              moco               |    0    |    0.0     |
|         DALLE2_pytorch          |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|           timm_nfnet            |   128   | 0.993202 |
|  timm_vision_transformer_large  |   32    | 0.991447 |
|           hf_T5_base            |    1    | 0.987937 |
|              dlrm               |  2048   | 0.987866 |
|        timm_efficientnet        |   64    | 0.983774 |
|       Background_Matting        |    1    | 0.983299 |
|           timm_regnet           |   32    | 0.982325 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981466 |
|            resnet152            |   32    | 0.980824 |
|             yolov3              |    8    | 0.98072  |
|           densenet121           |   64    | 0.979412 |
|             demucs              |    1    | 0.979354 |
|          pytorch_unet           |    1    | 0.978475 |
|     nvidia_deeprecommender      |   256   | 0.978167 |
|      torch_multimodal_clip      |   32    | 0.977018 |
|           timm_vovnet           |   32    | 0.974991 |
|            resnet50             |   32    | 0.973453 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973278 |
|          timm_resnest           |   32    | 0.973042 |
|          hf_GPT2_large          |    1    | 0.972522 |
|        basic_gnn_edgecnn        |    1    | 0.970143 |
|         LearningToPaint         |   96    | 0.969895 |
|       doctr_det_predictor       |    1    | 0.967896 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.963506 |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  0.963   |
|     timm_vision_transformer     |   32    | 0.962682 |
|           mnasnet1_0            |   32    | 0.962585 |
|   mobilenet_v2_quantized_qat    |   96    | 0.96146  |
|     resnet50_quantized_qat      |   32    | 0.961372 |
|          mobilenet_v2           |   16    | 0.959952 |
|       mobilenet_v3_large        |   32    | 0.959516 |
|       shufflenet_v2_x1_0        |   64    | 0.956553 |
|             alexnet             |   128   | 0.954961 |
|           hf_BigBird            |    1    | 0.949396 |
|        phlippe_densenet         |   128   | 0.945843 |
|         resnext50_32x4d         |    8    | 0.945304 |
|         pytorch_stargan         |   16    | 0.94445  |
|              vgg16              |    4    | 0.941964 |
|          basic_gnn_gcn          |    1    | 0.939874 |
|      doctr_reco_predictor       |    1    | 0.934946 |
|          BERT_pytorch           |    2    | 0.933933 |
|           tts_angular           |   64    | 0.929152 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.916698 |
|              dcgan              |   256   | 0.914266 |
|          squeezenet1_1          |   16    | 0.913856 |
|     pyhpc_equation_of_state     | 1048576 | 0.906048 |
|            resnet18             |    8    | 0.894911 |
|         phlippe_resnet          |   128   | 0.894574 |
|        hf_distil_whisper        |    1    | 0.892958 |
|         opacus_cifar10          |   64    | 0.890933 |
|        soft_actor_critic        |   256   | 0.888046 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.883045 |
|          lennard_jones          |  1000   | 0.870819 |
|         basic_gnn_sage          |    1    | 0.857897 |
|          maml_omniglot          |    5    | 0.857759 |
|     functorch_maml_omniglot     |    1    | 0.853921 |
|          basic_gnn_gin          |    1    | 0.853601 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.84815  |
|          fastNLP_Bert           |    1    | 0.838428 |
|      functorch_dp_cifar10       |   64    | 0.822974 |
|       speech_transformer        |    1    | 0.814938 |
|          hf_Bert_large          |    1    | 0.802906 |
|          hf_Longformer          |    1    | 0.799794 |
|            moondream            |    1    | 0.797805 |
|             hf_Bert             |    1    | 0.792562 |
|            hf_Albert            |    1    | 0.791912 |
|              maml               |    1    | 0.785945 |
|           hf_T5_large           |    1    | 0.78495  |
|               drq               |    1    | 0.76769  |
|          hf_DistilBert          |    1    | 0.763349 |
|             hf_GPT2             |    1    | 0.760335 |
|              hf_T5              |    1    | 0.754711 |
|             hf_Bart             |    1    | 0.737569 |
|           hf_Reformer           |    1    | 0.734875 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.686187 |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4426.852196 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1423.816767 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1289.413721 |
|           hf_T5_base            |    1    | 1240.226756 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1157.541601 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1078.410095 |
|          hf_GPT2_large          |    1    | 541.744295  |
|           timm_nfnet            |   128   | 523.821809  |
|           hf_T5_large           |    1    | 390.267343  |
|            moondream            |    1    | 379.703573  |
|        hf_distil_whisper        |    1    | 337.780335  |
|       Background_Matting        |    1    | 336.654516  |
|          pytorch_unet           |    1    | 226.359402  |
|           timm_regnet           |   32    | 213.568533  |
|           densenet121           |   64    | 186.152491  |
|            resnet152            |   32    | 184.276307  |
|    detectron2_fcos_r_50_fpn     |    1    | 173.978219  |
|      torch_multimodal_clip      |   32    |  163.1823   |
|             yolov3              |    8    | 158.797824  |
|             demucs              |    1    | 144.719195  |
|           hf_BigBird            |    1    | 117.266867  |
|           timm_vovnet           |   32    | 112.699488  |
|          hf_Bert_large          |    1    | 101.780478  |
|     timm_vision_transformer     |   32    | 100.567019  |
|         pytorch_stargan         |   16    |  93.755014  |
|       doctr_det_predictor       |    1    |  83.497948  |
|            resnet50             |   32    |  72.933184  |
|          hf_Longformer          |    1    |  68.988098  |
|          timm_resnest           |   32    |  52.18671   |
|             hf_Bart             |    1    |  51.622149  |
|       speech_transformer        |    1    |  51.265669  |
|              maml               |    1    |  49.060001  |
|        timm_efficientnet        |   64    |  48.87942   |
|              hf_T5              |    1    |  42.952917  |
|             alexnet             |   128   |  42.490097  |
|             hf_Bert             |    1    |  39.786085  |
|   mobilenet_v2_quantized_qat    |   96    |  39.744678  |
|           hf_Reformer           |    1    |  36.691959  |
|         LearningToPaint         |   96    |  36.509366  |
|              vgg16              |    4    |  35.478016  |
|            hf_Albert            |    1    |  34.274885  |
|     nvidia_deeprecommender      |   256   |  33.856517  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  32.147247  |
|          fastNLP_Bert           |    1    |  32.033514  |
|     pyhpc_isoneutral_mixing     | 1048576 |  28.45956   |
|          BERT_pytorch           |    2    |  26.971543  |
|          hf_DistilBert          |    1    |  26.175986  |
|     resnet50_quantized_qat      |   32    |  25.036345  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  24.546892  |
|         resnext50_32x4d         |    8    |  23.687055  |
|             hf_GPT2             |    1    |  22.106189  |
|        basic_gnn_edgecnn        |    1    |  20.668467  |
|        phlippe_densenet         |   128   |  20.222354  |
|           tts_angular           |   64    |  19.44831   |
|              dcgan              |   256   |  18.369113  |
|       shufflenet_v2_x1_0        |   64    |  15.631917  |
|           mnasnet1_0            |   32    |  14.871918  |
|       mobilenet_v3_large        |   32    |  12.981058  |
|          basic_gnn_gcn          |    1    |  10.657612  |
|      functorch_dp_cifar10       |   64    |  10.403712  |
|         opacus_cifar10          |   64    |  10.349678  |
|            resnet18             |    8    |  9.380832   |
|          mobilenet_v2           |   16    |  9.355667   |
|              dlrm               |  2048   |  6.306495   |
|         basic_gnn_sage          |    1    |  5.640926   |
|          squeezenet1_1          |   16    |  5.488534   |
|          basic_gnn_gin          |    1    |  5.364772   |
|         phlippe_resnet          |   128   |  4.184868   |
|      doctr_reco_predictor       |    1    |  3.281984   |
|     pyhpc_equation_of_state     | 1048576 |  1.139611   |
|               drq               |    1    |  0.826106   |
|        soft_actor_critic        |   256   |  0.511134   |
|          maml_omniglot          |    5    |  0.490092   |
|     functorch_maml_omniglot     |    1    |  0.426093   |
|          lennard_jones          |  1000   |  0.221387   |
|        timm_efficientdet        |    0    |     0.0     |
|         DALLE2_pytorch          |    0    |     0.0     |
|              moco               |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.510772 |
|     MobileBertForQuestionAnswering      | 128 | 1.823605 |
|      GPT2ForSequenceClassification      |  4  | 1.820679 |
|           ElectraForCausalLM            | 32  | 1.69396  |
|       ElectraForQuestionAnswering       | 64  | 1.646173 |
|          MobileBertForMaskedLM          | 128 | 1.620665 |
|               DistillGPT2               | 16  | 1.512353 |
|            YituTechConvBert             | 16  | 1.451142 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.447027 |
|       RobertaForQuestionAnswering       | 16  | 1.395821 |
|    LayoutLMForSequenceClassification    | 16  | 1.38375  |
|        BertForQuestionAnswering         | 16  | 1.372545 |
|           RobertaForCausalLM            | 16  | 1.369523 |
|               GoogleFnet                | 16  | 1.364177 |
|          AllenaiLongformerBase          |  4  | 1.321561 |
|           LayoutLMForMaskedLM           | 16  | 1.314983 |
|                CamemBert                | 16  | 1.305592 |
|             BertForMaskedLM             | 16  | 1.304649 |
|    MegatronBertForQuestionAnswering     |  8  | 1.279441 |
|         MegatronBertForCausalLM         |  4  | 1.275984 |
|       DebertaForQuestionAnswering       | 16  | 1.256614 |
|     PLBartForConditionalGeneration      |  4  | 1.225823 |
|           DebertaForMaskedLM            |  8  | 1.205856 |
|      MBartForConditionalGeneration      |  2  | 1.187637 |
|                 T5Small                 |  4  | 1.185119 |
|             OPTForCausalLM              |  2  | 1.182613 |
|       MT5ForConditionalGeneration       | 16  | 1.181897 |
|       T5ForConditionalGeneration        |  4  | 1.176093 |
| BlenderbotSmallForConditionalGeneration | 64  | 1.156029 |
|       AlbertForQuestionAnswering        |  4  | 1.137244 |
|            AlbertForMaskedLM            |  4  | 1.131657 |
|          DebertaV2ForMaskedLM           |  2  | 1.107948 |
|          DistilBertForMaskedLM          | 128 | 1.102631 |
|         Speech2Text2ForCausalLM         | 256 | 1.090196 |
|       BlenderbotSmallForCausalLM        | 64  | 1.086478 |
|             XGLMForCausalLM             |  8  | 1.086291 |
|     DistilBertForQuestionAnswering      | 256 | 1.085314 |
|      BartForConditionalGeneration       |  2  | 1.083411 |
|     M2M100ForConditionalGeneration      | 16  | 1.083155 |
|            PLBartForCausalLM            |  8  | 1.054834 |
|     PegasusForConditionalGeneration     | 32  | 1.046404 |
|            MBartForCausalLM             |  4  | 1.045786 |
|            TrOCRForCausalLM             | 32  | 1.045713 |
|           PegasusForCausalLM            | 32  | 1.030797 |
|             BartForCausalLM             |  4  | 1.03003  |
|          BlenderbotForCausalLM          |  4  | 1.028432 |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|          AllenaiLongformerBase          |  4  | 161.01395 |
|          MobileBertForMaskedLM          | 128 | 80.651817 |
|     MobileBertForQuestionAnswering      | 128 | 80.650202 |
|     PegasusForConditionalGeneration     | 32  | 73.512196 |
|      MBartForConditionalGeneration      |  2  | 73.151846 |
|     M2M100ForConditionalGeneration      | 16  | 71.876208 |
|          BlenderbotForCausalLM          |  4  | 62.200714 |
|             XGLMForCausalLM             |  8  | 60.687586 |
|      BartForConditionalGeneration       |  2  | 59.575826 |
|          DebertaV2ForMaskedLM           |  2  | 58.274433 |
|            XLNetLMHeadModel             |  8  | 57.344761 |
|       MT5ForConditionalGeneration       | 16  | 55.705091 |
| BlenderbotSmallForConditionalGeneration | 64  | 52.222204 |
|         MegatronBertForCausalLM         |  4  | 50.74737  |
|    MegatronBertForQuestionAnswering     |  8  | 50.574658 |
|            YituTechConvBert             | 16  | 49.529369 |
|      DebertaV2ForQuestionAnswering      |  1  | 48.232717 |
|     PLBartForConditionalGeneration      |  4  | 43.49648  |
|       T5ForConditionalGeneration        |  4  | 42.243153 |
|                 T5Small                 |  4  | 42.011399 |
|             OPTForCausalLM              |  2  | 37.237699 |
|            TrOCRForCausalLM             | 32  | 36.850221 |
|       DebertaForQuestionAnswering       | 16  | 36.790962 |
|            MBartForCausalLM             |  4  | 36.779168 |
|           PegasusForCausalLM            | 32  | 36.672921 |
|           DebertaForMaskedLM            |  8  | 36.504945 |
|           ElectraForCausalLM            | 32  | 32.483665 |
|           RobertaForCausalLM            | 16  | 32.477803 |
|       ElectraForQuestionAnswering       | 64  | 31.970015 |
|                CamemBert                | 16  | 31.840625 |
|           LayoutLMForMaskedLM           | 16  | 31.816737 |
|       RobertaForQuestionAnswering       | 16  | 31.797351 |
|        BertForQuestionAnswering         | 16  | 31.723929 |
|             BertForMaskedLM             | 16  | 31.623094 |
|    LayoutLMForSequenceClassification    | 16  | 31.345331 |
|       AlbertForQuestionAnswering        |  4  | 31.018645 |
|      GPT2ForSequenceClassification      |  4  | 30.934568 |
|             BartForCausalLM             |  4  | 30.930969 |
|            AlbertForMaskedLM            |  4  | 30.888291 |
|       BlenderbotSmallForCausalLM        | 64  | 29.420316 |
|     DistilBertForQuestionAnswering      | 256 | 26.832296 |
|          DistilBertForMaskedLM          | 128 | 26.477482 |
|            PLBartForCausalLM            |  8  | 26.015636 |
|         Speech2Text2ForCausalLM         | 256 | 25.80947  |
|               GoogleFnet                | 16  | 25.744492 |
|               DistillGPT2               | 16  | 23.56008  |
+-----------------------------------------+-----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  | 0.993732 |
|       AlbertForQuestionAnswering        |  4  | 0.993609 |
|     DistilBertForQuestionAnswering      | 256 | 0.993083 |
|           RobertaForCausalLM            | 16  | 0.992577 |
|            TrOCRForCausalLM             | 32  | 0.991833 |
|          DistilBertForMaskedLM          | 128 | 0.991794 |
|             OPTForCausalLM              |  2  | 0.991544 |
|           ElectraForCausalLM            | 32  | 0.991418 |
|               DistillGPT2               | 16  | 0.991125 |
|               GoogleFnet                | 16  | 0.991033 |
|             BertForMaskedLM             | 16  | 0.990814 |
|            PLBartForCausalLM            |  8  | 0.990808 |
|                CamemBert                | 16  | 0.99078  |
|       ElectraForQuestionAnswering       | 64  | 0.990709 |
|           LayoutLMForMaskedLM           | 16  | 0.990414 |
|            MBartForCausalLM             |  4  | 0.990058 |
|       DebertaForQuestionAnswering       | 16  | 0.988833 |
|            YituTechConvBert             | 16  | 0.988758 |
|        BertForQuestionAnswering         | 16  | 0.988596 |
|       RobertaForQuestionAnswering       | 16  | 0.988513 |
|     PegasusForConditionalGeneration     | 32  | 0.988467 |
|         Speech2Text2ForCausalLM         | 256 | 0.988069 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.987888 |
|     PLBartForConditionalGeneration      |  4  | 0.987289 |
|           PegasusForCausalLM            | 32  | 0.987272 |
|    LayoutLMForSequenceClassification    | 16  | 0.987005 |
|             BartForCausalLM             |  4  | 0.986955 |
|      GPT2ForSequenceClassification      |  4  | 0.986779 |
|      MBartForConditionalGeneration      |  2  | 0.986511 |
|       BlenderbotSmallForCausalLM        | 64  | 0.985891 |
|          BlenderbotForCausalLM          |  4  | 0.985839 |
|         MegatronBertForCausalLM         |  4  | 0.98475  |
|          MobileBertForMaskedLM          | 128 | 0.984735 |
|           DebertaForMaskedLM            |  8  | 0.984166 |
|      BartForConditionalGeneration       |  2  | 0.982254 |
|            XLNetLMHeadModel             |  8  | 0.98204  |
|       MT5ForConditionalGeneration       | 16  | 0.98203  |
|       T5ForConditionalGeneration        |  4  | 0.981705 |
|                 T5Small                 |  4  | 0.981627 |
|    MegatronBertForQuestionAnswering     |  8  | 0.980851 |
|     MobileBertForQuestionAnswering      | 128 | 0.978353 |
|     M2M100ForConditionalGeneration      | 16  | 0.976622 |
|          DebertaV2ForMaskedLM           |  2  | 0.974576 |
|             XGLMForCausalLM             |  8  | 0.972887 |
|          AllenaiLongformerBase          |  4  | 0.972051 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.869405 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|            AlbertForMaskedLM            |  4  | 2646.846195 |
|       AlbertForQuestionAnswering        |  4  | 2633.222315 |
|            XLNetLMHeadModel             |  8  | 1284.756559 |
|     PegasusForConditionalGeneration     | 32  | 978.572002  |
|            TrOCRForCausalLM             | 32  | 964.093115  |
|     DistilBertForQuestionAnswering      | 256 |  877.97247  |
|    MegatronBertForQuestionAnswering     |  8  | 774.476911  |
|            MBartForCausalLM             |  4  | 668.949981  |
|      MBartForConditionalGeneration      |  2  | 667.030286  |
|          DistilBertForMaskedLM          | 128 | 664.068627  |
|          BlenderbotForCausalLM          |  4  | 662.894392  |
|           RobertaForCausalLM            | 16  | 598.996112  |
|          DebertaV2ForMaskedLM           |  2  | 597.884094  |
|     M2M100ForConditionalGeneration      | 16  | 592.842735  |
|             OPTForCausalLM              |  2  | 588.958449  |
|      BartForConditionalGeneration       |  2  | 586.886948  |
|            YituTechConvBert             | 16  |  563.56638  |
|                CamemBert                | 16  | 561.344741  |
|             BertForMaskedLM             | 16  | 557.623373  |
|           LayoutLMForMaskedLM           | 16  | 556.287718  |
|          AllenaiLongformerBase          |  4  | 528.786741  |
|       DebertaForQuestionAnswering       | 16  |  519.71066  |
|             BartForCausalLM             |  4  | 518.252698  |
|            PLBartForCausalLM            |  8  | 505.496544  |
|           PegasusForCausalLM            | 32  | 482.413586  |
| BlenderbotSmallForConditionalGeneration | 64  | 480.230535  |
|     PLBartForConditionalGeneration      |  4  | 468.344064  |
|         MegatronBertForCausalLM         |  4  | 446.932714  |
|        BertForQuestionAnswering         | 16  | 442.042175  |
|    LayoutLMForSequenceClassification    | 16  |  440.83545  |
|       RobertaForQuestionAnswering       | 16  | 426.826298  |
|               GoogleFnet                | 16  | 402.733794  |
|          MobileBertForMaskedLM          | 128 | 388.310613  |
|               DistillGPT2               | 16  | 378.920671  |
|             XGLMForCausalLM             |  8  |  374.78421  |
|           DebertaForMaskedLM            |  8  | 362.620739  |
|       ElectraForQuestionAnswering       | 64  |  332.63046  |
|       T5ForConditionalGeneration        |  4  | 329.386123  |
|                 T5Small                 |  4  | 326.946299  |
|         Speech2Text2ForCausalLM         | 256 |  276.42348  |
|       BlenderbotSmallForCausalLM        | 64  | 273.812846  |
|      GPT2ForSequenceClassification      |  4  | 266.720301  |
|           ElectraForCausalLM            | 32  | 251.872649  |
|     MobileBertForQuestionAnswering      | 128 | 240.703184  |
|       MT5ForConditionalGeneration       | 16  | 221.680814  |
|      DebertaV2ForQuestionAnswering      |  1  | 219.979084  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           fbnetc_100            | 512  | 3.952388 |
|           mnasnet_100           | 512  | 3.884393 |
|            lcnet_050            | 256  | 3.858437 |
|         mobilenetv2_100         | 128  | 3.810909 |
|      mobilenetv3_large_100      | 512  | 3.659238 |
|          spnasnet_100           | 128  | 3.57311  |
|            fbnetv3_b            | 256  | 3.461789 |
|           regnety_002           | 1024 | 3.324812 |
|           rexnet_100            | 256  | 3.109822 |
|       tf_efficientnet_b0        | 128  | 2.92876  |
|            tinynet_a            | 128  | 2.843267 |
|        ese_vovnet19b_dw         | 256  | 2.630066 |
|          pnasnet5large          |  16  | 2.607626 |
|          botnet26t_256          | 128  | 2.600749 |
|            hrnet_w18            | 128  | 2.57582  |
|           res2next50            | 128  | 2.433028 |
|          ghostnet_100           | 512  | 2.364234 |
|       eca_botnext26ts_256       | 128  | 2.351986 |
|       gluon_inception_v3        | 256  | 2.325034 |
|           resnest101e           |  64  | 2.266448 |
|          inception_v3           | 128  | 2.26108  |
|        eca_halonext26ts         | 128  | 2.24815  |
|             dla102              | 128  | 2.232192 |
|        adv_inception_v3         | 128  | 2.228278 |
|        res2net50_14w_8s         | 128  | 2.156353 |
|        res2net101_26w_4s        | 128  | 2.145088 |
|            repvgg_a2            | 128  | 2.094623 |
|          cspdarknet53           |  64  | 2.080832 |
|            nfnet_l0             | 128  | 2.044982 |
|        convmixer_768_32         |  32  | 1.983102 |
|            gernet_l             | 128  | 1.922668 |
|           dm_nfnet_f0           | 128  | 1.880829 |
|           tf_mixnet_l           | 128  | 1.871691 |
|           selecsls42b           | 128  | 1.802346 |
|        sebotnet33ts_256         |  64  | 1.785599 |
|            mixnet_l             | 128  | 1.694154 |
|         visformer_small         | 128  | 1.690211 |
|         poolformer_m36          |  64  | 1.683834 |
|           volo_d1_224           |  64  | 1.645582 |
|     swsl_resnext101_32x16d      |  32  | 1.578957 |
|             dpn107              |  64  | 1.49831  |
|            levit_128            | 1024 | 1.469516 |
|           mobilevit_s           |  64  | 1.455712 |
|          gmlp_s16_224           | 128  | 1.261076 |
|          resmlp_12_224          | 128  | 1.203075 |
|      xcit_large_24_p8_224       |  16  | 1.181433 |
|           convit_base           |  64  | 1.169968 |
|          cait_m36_384           |  4   | 1.166095 |
|          gmixer_24_224          | 128  | 1.160239 |
|  swin_base_patch4_window7_224   |  64  | 1.130028 |
|        tnt_s_patch16_224        | 128  | 1.104216 |
|        twins_pcpvt_base         | 128  | 1.093495 |
|          convnext_base          |  64  | 1.063926 |
|          mixer_b16_224          | 128  | 1.062848 |
|      beit_base_patch16_224      |  64  | 1.055868 |
|          jx_nest_base           |  32  | 1.040056 |
|            pit_b_224            |  64  | 1.033449 |
|      vit_base_patch16_224       |  64  | 1.029389 |
| deit_base_distilled_patch16_224 |  64  | 1.02892  |
|         crossvit_9_240          | 256  | 1.006697 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 8  |     pass      |
|           dm_nfnet_f0           | 8  |     pass      |
|            mixnet_l             | 8  |     pass      |
|       eca_botnext26ts_256       | 8  |     pass      |
|         crossvit_9_240          | 8  |     pass      |
| deit_base_distilled_patch16_224 | 8  |     pass      |
|          convnext_base          | 8  |     pass      |
|           regnety_002           | 8  |     pass      |
|           convit_base           | 8  |     pass      |
|          gmlp_s16_224           | 8  |     pass      |
|          cait_m36_384           | 8  |     pass      |
|             dpn107              | 8  |     pass      |
|      vit_base_patch16_224       | 8  |     pass      |
|          cspdarknet53           | 8  |     pass      |
|        eca_halonext26ts         | 8  |     pass      |
|        ese_vovnet19b_dw         | 8  |     pass      |
|           fbnetc_100            | 8  |     pass      |
|            fbnetv3_b            | 8  |     pass      |
|            gernet_l             | 8  |     pass      |
|          botnet26t_256          | 8  |     pass      |
|       gluon_inception_v3        | 8  |     pass      |
|          gmixer_24_224          | 8  |     pass      |
|          jx_nest_base           | 8  |     pass      |
|        convmixer_768_32         | 8  |     pass      |
|        twins_pcpvt_base         | 8  |     pass      |
|         poolformer_m36          | 8  |     pass      |
|            lcnet_050            | 8  |     pass      |
|          mixer_b16_224          | 8  |     pass      |
|        tnt_s_patch16_224        | 8  |     pass      |
|           mnasnet_100           | 8  |     pass      |
|         mobilenetv2_100         | 8  |     pass      |
|      mobilenetv3_large_100      | 8  |     pass      |
|           mobilevit_s           | 8  |     pass      |
|            nfnet_l0             | 8  |     pass      |
|            pit_b_224            | 8  |     pass      |
|      beit_base_patch16_224      | 8  |     pass      |
|            repvgg_a2            | 8  |     pass      |
|  swin_base_patch4_window7_224   | 8  |     pass      |
|          inception_v3           | 8  |     pass      |
|           tf_mixnet_l           | 8  |     pass      |
|       tf_efficientnet_b0        | 8  |     pass      |
|            tinynet_a            | 8  |     pass      |
|          spnasnet_100           | 8  |     pass      |
|        sebotnet33ts_256         | 8  |     pass      |
|          resmlp_12_224          | 8  |     pass      |
|         coat_lite_mini          | 8  |  fail_to_run  |
|           resnest101e           | 8  | fail_accuracy |
|            levit_128            | 8  | fail_accuracy |
|          ghostnet_100           | 8  | fail_accuracy |
|          pnasnet5large          | 8  | fail_accuracy |
|        res2net101_26w_4s        | 8  | fail_accuracy |
|        res2net50_14w_8s         | 8  | fail_accuracy |
|             dla102              | 8  | fail_accuracy |
|     swsl_resnext101_32x16d      | 8  | fail_accuracy |
|           rexnet_100            | 8  | fail_accuracy |
|           selecsls42b           | 8  | fail_accuracy |
|           res2next50            | 8  | fail_accuracy |
|            hrnet_w18            | 8  | fail_accuracy |
|           volo_d1_224           | 8  | fail_accuracy |
|         visformer_small         | 8  | fail_accuracy |
|      xcit_large_24_p8_224       | 8  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          pnasnet5large          |  16  | 495.459254 |
|            hrnet_w18            | 128  | 277.921957 |
|          cait_m36_384           |  4   | 105.37828  |
|      xcit_large_24_p8_224       |  16  | 99.741911  |
|  swin_base_patch4_window7_224   |  64  |  99.15661  |
|        res2net101_26w_4s        | 128  |  90.90981  |
|           tf_mixnet_l           | 128  | 90.577519  |
|            mixnet_l             | 128  | 80.449282  |
|        res2net50_14w_8s         | 128  | 77.357209  |
|           mobilevit_s           |  64  | 76.838648  |
|         poolformer_m36          |  64  | 75.262533  |
|           resnest101e           |  64  | 74.919887  |
|        tnt_s_patch16_224        | 128  |  70.1829   |
|          jx_nest_base           |  32  | 69.093196  |
|        twins_pcpvt_base         | 128  | 68.465047  |
|             dpn107              |  64  | 65.869148  |
|           volo_d1_224           |  64  | 56.752828  |
|            fbnetv3_b            | 256  | 55.540977  |
|        eca_halonext26ts         | 128  | 51.408354  |
|          gmixer_24_224          | 128  | 49.068429  |
|            levit_128            | 1024 | 48.528028  |
|         crossvit_9_240          | 256  | 47.342976  |
|          gmlp_s16_224           | 128  | 47.311111  |
|          convnext_base          |  64  | 46.782066  |
|           convit_base           |  64  | 44.410066  |
|        sebotnet33ts_256         |  64  |  43.54762  |
|        adv_inception_v3         | 128  |  41.20362  |
|          inception_v3           | 128  |  41.17818  |
|             dla102              | 128  | 40.174795  |
|          ghostnet_100           | 512  | 40.023392  |
|       gluon_inception_v3        | 256  | 39.904596  |
|           res2next50            | 128  | 39.354153  |
|            tinynet_a            | 128  | 38.280659  |
|           rexnet_100            | 256  | 37.584606  |
|           dm_nfnet_f0           | 128  | 37.461004  |
|       tf_efficientnet_b0        | 128  | 36.734505  |
|     swsl_resnext101_32x16d      |  32  | 36.561838  |
|         visformer_small         | 128  | 36.372352  |
|       eca_botnext26ts_256       | 128  | 35.883121  |
|        convmixer_768_32         |  32  | 35.420126  |
|            nfnet_l0             | 128  | 34.420295  |
|          botnet26t_256          | 128  | 32.747864  |
|            pit_b_224            |  64  | 31.863616  |
|          mixer_b16_224          | 128  | 31.466835  |
|      beit_base_patch16_224      |  64  | 30.705143  |
| deit_base_distilled_patch16_224 |  64  | 29.601395  |
|      vit_base_patch16_224       |  64  | 29.528564  |
|          cspdarknet53           |  64  |  29.3034   |
|           regnety_002           | 1024 | 27.988828  |
|      mobilenetv3_large_100      | 512  | 27.892883  |
|          spnasnet_100           | 128  | 24.714522  |
|          resmlp_12_224          | 128  | 24.685431  |
|         mobilenetv2_100         | 128  | 24.670793  |
|           fbnetc_100            | 512  | 23.785131  |
|            gernet_l             | 128  | 23.481778  |
|            repvgg_a2            | 128  | 23.279658  |
|        ese_vovnet19b_dw         | 256  | 22.812278  |
|            lcnet_050            | 256  | 22.229583  |
|           mnasnet_100           | 512  | 21.203556  |
|           selecsls42b           | 128  | 20.769957  |
+---------------------------------+------+------------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.997634 |
|           fbnetc_100            | 512  | 0.997048 |
|           mnasnet_100           | 512  | 0.996621 |
|      mobilenetv3_large_100      | 512  | 0.996247 |
|            fbnetv3_b            | 256  | 0.996066 |
|           regnety_002           | 1024 | 0.995844 |
|          ghostnet_100           | 512  | 0.99574  |
|          convnext_base          |  64  | 0.995706 |
|           dm_nfnet_f0           | 128  | 0.995212 |
|            levit_128            | 1024 | 0.994929 |
|        res2net101_26w_4s        | 128  | 0.994744 |
|       eca_botnext26ts_256       | 128  | 0.994193 |
|        eca_halonext26ts         | 128  | 0.994178 |
|             dpn107              |  64  | 0.994123 |
|       gluon_inception_v3        | 256  | 0.993945 |
|           rexnet_100            | 256  | 0.993701 |
|             dla102              | 128  | 0.993518 |
|           res2next50            | 128  | 0.993317 |
|          gmlp_s16_224           | 128  | 0.993177 |
|          mixer_b16_224          | 128  | 0.99317  |
|           tf_mixnet_l           | 128  | 0.993145 |
|        res2net50_14w_8s         | 128  | 0.993104 |
|      xcit_large_24_p8_224       |  16  | 0.993002 |
|        convmixer_768_32         |  32  | 0.992878 |
|          botnet26t_256          | 128  | 0.992768 |
|           convit_base           |  64  | 0.992752 |
|            mixnet_l             | 128  | 0.992735 |
|          gmixer_24_224          | 128  | 0.992461 |
|         visformer_small         | 128  | 0.992434 |
|       tf_efficientnet_b0        | 128  | 0.991951 |
|          pnasnet5large          |  16  | 0.991767 |
|            gernet_l             | 128  | 0.99166  |
|        twins_pcpvt_base         | 128  | 0.991527 |
|           resnest101e           |  64  | 0.991481 |
|        sebotnet33ts_256         |  64  | 0.990775 |
|      beit_base_patch16_224      |  64  | 0.990616 |
|           mobilevit_s           |  64  | 0.990508 |
|            nfnet_l0             | 128  | 0.990424 |
|         mobilenetv2_100         | 128  | 0.989634 |
|           selecsls42b           | 128  | 0.989615 |
|          spnasnet_100           | 128  | 0.98931  |
|        tnt_s_patch16_224        | 128  | 0.989098 |
|          resmlp_12_224          | 128  | 0.989084 |
|            pit_b_224            |  64  | 0.988733 |
|          cait_m36_384           |  4   | 0.988655 |
|         poolformer_m36          |  64  | 0.988413 |
|  swin_base_patch4_window7_224   |  64  | 0.988367 |
|          inception_v3           | 128  | 0.988199 |
|        adv_inception_v3         | 128  | 0.988189 |
| deit_base_distilled_patch16_224 |  64  | 0.988067 |
|            tinynet_a            | 128  | 0.987783 |
|      vit_base_patch16_224       |  64  | 0.987674 |
|     swsl_resnext101_32x16d      |  32  | 0.987611 |
|            hrnet_w18            | 128  | 0.986795 |
|            lcnet_050            | 256  | 0.985402 |
|            repvgg_a2            | 128  | 0.984322 |
|          jx_nest_base           |  32  | 0.983575 |
|           volo_d1_224           |  64  | 0.983544 |
|          cspdarknet53           |  64  | 0.977757 |
|         crossvit_9_240          | 256  | 0.973647 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+-------------+
|              name               |  bs  |  inductor   |
+---------------------------------+------+-------------+
|      xcit_large_24_p8_224       |  16  | 1459.606559 |
|          convnext_base          |  64  |  1146.603   |
|          cait_m36_384           |  4   | 1087.346187 |
|          mixer_b16_224          | 128  | 1025.801989 |
|           dm_nfnet_f0           | 128  | 930.818047  |
|           convit_base           |  64  | 929.008464  |
|             dpn107              |  64  | 912.644734  |
|  swin_base_patch4_window7_224   |  64  | 822.454356  |
|        twins_pcpvt_base         | 128  | 818.163343  |
|        tnt_s_patch16_224        | 128  | 812.855775  |
|       gluon_inception_v3        | 256  | 793.322942  |
| deit_base_distilled_patch16_224 |  64  | 681.902398  |
|      vit_base_patch16_224       |  64  | 679.606736  |
|      beit_base_patch16_224      |  64  | 672.240119  |
|        res2net101_26w_4s        | 128  | 629.780914  |
|     swsl_resnext101_32x16d      |  32  | 627.040886  |
|            nfnet_l0             | 128  | 601.374926  |
|            levit_128            | 1024 | 559.451488  |
|          gmlp_s16_224           | 128  | 555.958576  |
|          gmixer_24_224          | 128  | 552.858474  |
|            pit_b_224            |  64  | 546.970362  |
|        ese_vovnet19b_dw         | 256  | 543.380825  |
|             dla102              | 128  | 515.909782  |
|          jx_nest_base           |  32  | 514.443077  |
|         crossvit_9_240          | 256  | 503.020774  |
|           resnest101e           |  64  | 471.817802  |
|         poolformer_m36          |  64  |  461.68006  |
|        convmixer_768_32         |  32  | 433.274718  |
|            hrnet_w18            | 128  | 430.453063  |
|           volo_d1_224           |  64  | 425.747378  |
|          inception_v3           | 128  | 398.866326  |
|        adv_inception_v3         | 128  | 398.435494  |
|        res2net50_14w_8s         | 128  | 390.132011  |
|         visformer_small         | 128  | 381.801008  |
|            mixnet_l             | 128  | 360.432386  |
|          ghostnet_100           | 512  | 355.034371  |
|           res2next50            | 128  | 350.885226  |
|           tf_mixnet_l           | 128  | 348.222591  |
|          pnasnet5large          |  16  | 338.647611  |
|            repvgg_a2            | 128  | 327.597603  |
|        eca_halonext26ts         | 128  | 307.635036  |
|           fbnetc_100            | 512  | 303.034262  |
|       eca_botnext26ts_256       | 128  |  287.72456  |
|            gernet_l             | 128  | 286.466391  |
|           regnety_002           | 1024 | 276.180951  |
|        sebotnet33ts_256         |  64  | 275.865391  |
|          botnet26t_256          | 128  | 271.902815  |
|          resmlp_12_224          | 128  | 259.200148  |
|           mobilevit_s           |  64  | 256.976974  |
|           mnasnet_100           | 512  | 255.956321  |
|          cspdarknet53           |  64  | 255.068093  |
|            fbnetv3_b            | 256  | 240.567863  |
|      mobilenetv3_large_100      | 512  | 226.631603  |
|           selecsls42b           | 128  | 226.140866  |
|           rexnet_100            | 256  | 223.104845  |
|       tf_efficientnet_b0        | 128  | 118.756751  |
|            tinynet_a            | 128  |  82.692174  |
|         mobilenetv2_100         | 128  |  71.664885  |
|          spnasnet_100           | 128  |  65.072174  |
|            lcnet_050            | 256  |   26.8179   |
+---------------------------------+------+-------------+

@zxd1997066
Copy link
Contributor

[cppwrapper_dynamic_shape] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-04-28 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 7478b7f1cac9686f00edf3db4667cf86d2421531
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 86%, 68/79 | 100%, 46/46 | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.57x    |    1.20x    |    1.53x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   55.07    |    36.98    |    33.94    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.86x    |    0.81x    |    0.82x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 56.191561 |
|     pyhpc_equation_of_state     |    1    | 29.599591 |
|          squeezenet1_1          |    1    | 3.672301  |
|         basic_gnn_sage          |    1    | 3.609804  |
|     functorch_maml_omniglot     |    1    | 3.593359  |
|          basic_gnn_gin          |    1    | 3.483006  |
|          maml_omniglot          |    5    | 2.907588  |
|          basic_gnn_gcn          |    1    | 2.846698  |
|           timm_nfnet            |    1    | 2.798752  |
|         opacus_cifar10          |    1    | 2.506073  |
|       shufflenet_v2_x1_0        |    1    |  2.4283   |
|      functorch_dp_cifar10       |    1    | 2.331282  |
|              dcgan              |    1    | 2.284399  |
|            resnet18             |    1    | 2.243594  |
|          lennard_jones          |    1    | 2.154372  |
|          mobilenet_v2           |    1    | 2.044815  |
|          timm_resnest           |    1    | 2.016659  |
|           mnasnet1_0            |    1    | 1.921744  |
|       mobilenet_v3_large        |    1    | 1.897801  |
|        phlippe_densenet         |    1    | 1.877703  |
|         phlippe_resnet          |    1    | 1.859631  |
|            resnet50             |    1    | 1.795846  |
|           densenet121           |    1    | 1.787696  |
|        timm_efficientnet        |    1    | 1.714058  |
|            resnet152            |    1    | 1.676032  |
|         LearningToPaint         |    1    | 1.609237  |
|           timm_vovnet           |    1    | 1.607276  |
|              llama              |    1    | 1.559957  |
|              dlrm               |    1    | 1.511886  |
|      doctr_reco_predictor       |    1    | 1.507336  |
|         resnext50_32x4d         |    1    |  1.50056  |
|           timm_regnet           |    1    |  1.47564  |
|              vgg16              |    1    | 1.447845  |
|        basic_gnn_edgecnn        |    1    |  1.39794  |
|             yolov3              |    1    | 1.376217  |
|             alexnet             |    1    | 1.365335  |
|       doctr_det_predictor       |    1    |  1.3073   |
|          BERT_pytorch           |    1    | 1.306172  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.300593  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.294998  |
|               drq               |    1    | 1.288766  |
|            hf_Albert            |    1    |  1.26478  |
|              maml               |    1    | 1.247563  |
|             hf_GPT2             |    1    | 1.241112  |
|          hf_GPT2_large          |    1    | 1.221313  |
|            moondream            |    1    | 1.215726  |
|     timm_vision_transformer     |    1    | 1.212806  |
|          fastNLP_Bert           |    1    | 1.194639  |
|         pytorch_stargan         |   16    | 1.190242  |
|  timm_vision_transformer_large  |    1    | 1.167155  |
|          hf_Bert_large          |    1    | 1.158865  |
|             hf_Bert             |    1    | 1.152114  |
|           hf_BigBird            |    1    | 1.149456  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.148891  |
|          hf_DistilBert          |    1    | 1.138217  |
|      torch_multimodal_clip      |    1    |  1.10789  |
|             hf_Bart             |    1    | 1.092562  |
|       speech_transformer        |    1    | 1.073361  |
|        hf_distil_whisper        |    1    | 1.067372  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.065108  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.057734  |
|        soft_actor_critic        |   256   |  1.04521  |
|          pytorch_unet           |    1    | 1.044531  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.038579  |
|          hf_Longformer          |    1    | 1.011818  |
|             demucs              |    1    | 1.001201  |
|           tts_angular           |    1    | 1.000008  |
|     resnet50_quantized_qat      |    1    | 0.994901  |
|   mobilenet_v2_quantized_qat    |    1    | 0.985912  |
|     nvidia_deeprecommender      |    1    | 0.929633  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  0.87649  |
|           hf_Reformer           |    1    | 0.844847  |
|       Background_Matting        |    1    | 0.809305  |
|           hf_T5_large           |    1    |  0.7828   |
|              hf_T5              |    1    | 0.704225  |
|           hf_T5_base            |    1    |  0.58683  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|         basic_gnn_sage          |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|               drq               |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|              llama              |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|              hf_T5              |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|       speech_transformer        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|           timm_regnet           |    1    |        pass        |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_fpn |    1    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
|           Super_SloMo           |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    1    |    fail_to_run     |
|         resnext50_32x4d         |    1    |   fail_accuracy    |
|            resnet152            |    1    |   fail_accuracy    |
|       shufflenet_v2_x1_0        |    1    |   fail_accuracy    |
|           mnasnet1_0            |    1    |   fail_accuracy    |
|          mobilenet_v2           |    1    |   fail_accuracy    |
|            resnet18             |    1    |   fail_accuracy    |
|            resnet50             |    1    |   fail_accuracy    |
|           densenet121           |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 474.770497 |
|    detectron2_fcos_r_50_fpn     |    1    | 410.808613 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 223.823572 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 218.584823 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 207.677013 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 203.162144 |
|              maml               |    1    | 143.367765 |
|           hf_T5_large           |    1    | 110.994543 |
|           hf_T5_base            |    1    | 105.894506 |
|       speech_transformer        |    1    | 89.799056  |
|          hf_Longformer          |    1    | 84.226326  |
|           hf_Reformer           |    1    | 80.585431  |
|          basic_gnn_gcn          |    1    | 61.581089  |
|           densenet121           |    1    | 56.300751  |
|          fastNLP_Bert           |    1    | 54.145254  |
|            resnet152            |    1    |  52.53662  |
|  timm_vision_transformer_large  |    1    | 49.144924  |
|       doctr_det_predictor       |    1    | 44.953027  |
|          hf_GPT2_large          |    1    | 43.021362  |
|          hf_Bert_large          |    1    | 42.256751  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 42.176367  |
|            moondream            |    1    | 41.656171  |
|        hf_distil_whisper        |    1    |  40.50431  |
|      torch_multimodal_clip      |    1    | 36.347449  |
|             demucs              |    1    | 34.681521  |
|           timm_regnet           |    1    | 33.030003  |
|           timm_nfnet            |    1    | 32.016997  |
|              hf_T5              |    1    | 31.392825  |
|             hf_Bart             |    1    | 30.274784  |
|       Background_Matting        |    1    | 29.938999  |
|             yolov3              |    1    | 28.925248  |
|          BERT_pytorch           |    1    |  27.9014   |
|        timm_efficientnet        |    1    | 27.848488  |
|             hf_Bert             |    1    | 26.301712  |
|        phlippe_densenet         |    1    | 25.796213  |
|      doctr_reco_predictor       |    1    | 25.592925  |
|       shufflenet_v2_x1_0        |    1    | 25.029842  |
|            hf_Albert            |    1    | 24.499825  |
|             hf_GPT2             |    1    | 23.943083  |
|         pytorch_stargan         |   16    | 23.601412  |
|       mobilenet_v3_large        |    1    |  23.17579  |
|              llama              |    1    | 23.042124  |
|     timm_vision_transformer     |    1    |  22.97059  |
|         opacus_cifar10          |    1    | 21.410342  |
|         resnext50_32x4d         |    1    | 21.268478  |
|            resnet50             |    1    | 21.243381  |
|           timm_vovnet           |    1    | 21.132548  |
|          timm_resnest           |    1    | 20.833065  |
|          mobilenet_v2           |    1    | 20.640428  |
|          hf_DistilBert          |    1    | 20.438238  |
|           mnasnet1_0            |    1    |  20.3382   |
|     pyhpc_isoneutral_mixing     |    1    | 20.232531  |
|      functorch_dp_cifar10       |    1    | 20.090538  |
|          pytorch_unet           |    1    | 19.126926  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 18.408245  |
|          squeezenet1_1          |    1    | 17.603644  |
|            resnet18             |    1    | 17.002953  |
|     pyhpc_equation_of_state     |    1    | 16.990044  |
|         LearningToPaint         |    1    | 16.871567  |
|              vgg16              |    1    |  16.77236  |
|         phlippe_resnet          |    1    | 16.692788  |
|             alexnet             |    1    | 16.386024  |
|               drq               |    1    | 15.792763  |
|     functorch_maml_omniglot     |    1    | 15.610817  |
|          maml_omniglot          |    5    | 15.525511  |
|     nvidia_deeprecommender      |    1    | 15.377347  |
|              dlrm               |    1    | 14.962855  |
|              dcgan              |    1    |  14.72219  |
|          basic_gnn_gin          |    1    | 14.414776  |
|        basic_gnn_edgecnn        |    1    | 14.401619  |
|         basic_gnn_sage          |    1    | 14.373441  |
|        soft_actor_critic        |   256   | 14.361048  |
|          lennard_jones          |    1    | 13.925968  |
|           tts_angular           |    1    | 13.724201  |
|   mobilenet_v2_quantized_qat    |    1    |  0.096001  |
|     resnet50_quantized_qat      |    1    |  0.07013   |
|        timm_efficientdet        |    0    |    0.0     |
|              moco               |    0    |    0.0     |
|         DALLE2_pytorch          |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|           hf_T5_base            |    1    | 0.987976 |
|              dlrm               |    1    | 0.987904 |
|       Background_Matting        |    1    | 0.982802 |
|             demucs              |    1    | 0.982643 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.982567 |
|          pytorch_unet           |    1    | 0.978331 |
|          hf_GPT2_large          |    1    | 0.973866 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.972163 |
|        basic_gnn_edgecnn        |    1    | 0.970332 |
|       doctr_det_predictor       |    1    | 0.969457 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.961627 |
|     resnet50_quantized_qat      |    1    | 0.955938 |
|         LearningToPaint         |    1    | 0.948411 |
|           hf_BigBird            |    1    | 0.945747 |
|         pytorch_stargan         |   16    | 0.943683 |
|      doctr_reco_predictor       |    1    | 0.94193  |
|          basic_gnn_gin          |    1    | 0.938456 |
|         basic_gnn_sage          |    1    | 0.935578 |
|          basic_gnn_gcn          |    1    | 0.933907 |
|   mobilenet_v2_quantized_qat    |    1    | 0.929206 |
|              llama              |    1    | 0.91953  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.917431 |
|      torch_multimodal_clip      |    1    | 0.899875 |
|        hf_distil_whisper        |    1    | 0.89363  |
|        soft_actor_critic        |   256   | 0.887324 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.886176 |
|           tts_angular           |    1    | 0.885609 |
|         opacus_cifar10          |    1    | 0.882948 |
|        timm_efficientnet        |    1    | 0.87258  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.870925 |
|          mobilenet_v2           |    1    | 0.865801 |
|          lennard_jones          |    1    | 0.860412 |
|           mnasnet1_0            |    1    | 0.858787 |
|          maml_omniglot          |    5    | 0.858531 |
|          squeezenet1_1          |    1    | 0.857728 |
|          timm_resnest           |    1    | 0.855282 |
|     functorch_maml_omniglot     |    1    | 0.854995 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.851944 |
|          fastNLP_Bert           |    1    | 0.847661 |
|       mobilenet_v3_large        |    1    | 0.845903 |
|              dcgan              |    1    | 0.844601 |
|       shufflenet_v2_x1_0        |    1    | 0.839385 |
|         phlippe_resnet          |    1    | 0.834066 |
|     pyhpc_equation_of_state     |    1    | 0.83013  |
|       speech_transformer        |    1    | 0.821441 |
|        phlippe_densenet         |    1    | 0.818715 |
|           timm_nfnet            |    1    | 0.813917 |
|         resnext50_32x4d         |    1    | 0.812014 |
|     pyhpc_isoneutral_mixing     |    1    | 0.809988 |
|           hf_T5_large           |    1    | 0.807228 |
|          hf_Bert_large          |    1    | 0.806483 |
|          hf_Longformer          |    1    | 0.805321 |
|            hf_Albert            |    1    | 0.801764 |
|     timm_vision_transformer     |    1    | 0.799052 |
|             hf_Bert             |    1    | 0.797286 |
|            moondream            |    1    | 0.796221 |
|              maml               |    1    | 0.791957 |
|             yolov3              |    1    | 0.779805 |
|          BERT_pytorch           |    1    | 0.777728 |
|          hf_DistilBert          |    1    | 0.776699 |
|           densenet121           |    1    | 0.774784 |
|            resnet50             |    1    | 0.769429 |
|             hf_GPT2             |    1    | 0.765627 |
|           timm_regnet           |    1    | 0.761363 |
|               drq               |    1    | 0.75941  |
|            resnet18             |    1    | 0.758696 |
|             hf_Bart             |    1    | 0.758139 |
|           timm_vovnet           |    1    | 0.758086 |
|              hf_T5              |    1    | 0.750754 |
|      functorch_dp_cifar10       |    1    | 0.742879 |
|           hf_Reformer           |    1    | 0.741965 |
|             alexnet             |    1    | 0.738651 |
|  timm_vision_transformer_large  |    1    | 0.732516 |
|              vgg16              |    1    | 0.719484 |
|            resnet152            |    1    | 0.691569 |
|     nvidia_deeprecommender      |    1    | 0.672272 |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26196.909723 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11567.884678 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 10983.294564 |
|          hf_GPT2_large          |    1    | 10031.17065  |
|           hf_T5_large           |    1    | 7435.202884  |
|            moondream            |    1    | 7307.206381  |
|        hf_distil_whisper        |    1    | 6910.664777  |
|       Background_Matting        |    1    | 6834.642809  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5608.384747  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 4979.620995  |
|          pytorch_unet           |    1    |  4806.40363  |
|  timm_vision_transformer_large  |    1    |  2779.50541  |
|    detectron2_fcos_r_50_fpn     |    1    | 2512.823959  |
|             demucs              |    1    | 2155.850937  |
|         pytorch_stargan         |   16    | 2004.840891  |
|          hf_Bert_large          |    1    | 1753.069766  |
|       doctr_det_predictor       |    1    | 1706.103518  |
|           hf_BigBird            |    1    | 1457.283861  |
|      torch_multimodal_clip      |    1    | 1261.523703  |
|          hf_Longformer          |    1    | 1104.311123  |
|             hf_Bart             |    1    |  880.526896  |
|              hf_T5              |    1    |  759.692829  |
|             hf_Bert             |    1    |  674.359036  |
|       speech_transformer        |    1    |  666.676777  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  628.014834  |
|            hf_Albert            |    1    |  569.875112  |
|          fastNLP_Bert           |    1    |  521.464578  |
|             yolov3              |    1    |  425.091135  |
|          hf_DistilBert          |    1    |  413.133486  |
|           hf_Reformer           |    1    |  408.09122   |
|             hf_GPT2             |    1    |  353.562696  |
|        basic_gnn_edgecnn        |    1    |  225.602724  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  208.245809  |
|              vgg16              |    1    |  188.016978  |
|           timm_regnet           |    1    |  147.369495  |
|          BERT_pytorch           |    1    |  138.422522  |
|            resnet152            |    1    |  134.375077  |
|           timm_nfnet            |    1    |  94.349243   |
|           timm_vovnet           |    1    |  78.403929   |
|              maml               |    1    |   73.33697   |
|     timm_vision_transformer     |    1    |  57.632684   |
|     nvidia_deeprecommender      |    1    |  57.001363   |
|         resnext50_32x4d         |    1    |  55.988204   |
|           tts_angular           |    1    |  50.214243   |
|            resnet50             |    1    |  49.471628   |
|           densenet121           |    1    |  41.251744   |
|          basic_gnn_gcn          |    1    |  34.663503   |
|          timm_resnest           |    1    |   32.36937   |
|      doctr_reco_predictor       |    1    |  22.743092   |
|            resnet18             |    1    |  21.656792   |
|             alexnet             |    1    |  21.517529   |
|              llama              |    1    |  19.843364   |
|     resnet50_quantized_qat      |    1    |  17.915748   |
|          basic_gnn_gin          |    1    |  16.083034   |
|         basic_gnn_sage          |    1    |  15.853184   |
|        timm_efficientnet        |    1    |  12.157734   |
|         LearningToPaint         |    1    |   9.484408   |
|           mnasnet1_0            |    1    |   7.000135   |
|          mobilenet_v2           |    1    |   6.837552   |
|   mobilenet_v2_quantized_qat    |    1    |   6.797219   |
|       mobilenet_v3_large        |    1    |   6.463553   |
|          squeezenet1_1          |    1    |   5.417961   |
|       shufflenet_v2_x1_0        |    1    |   4.69953    |
|        soft_actor_critic        |   256   |   3.353769   |
|        phlippe_densenet         |    1    |   2.678728   |
|      functorch_dp_cifar10       |    1    |   2.133693   |
|         opacus_cifar10          |    1    |   2.090726   |
|               drq               |    1    |   1.752174   |
|              dcgan              |    1    |   1.597653   |
|         phlippe_resnet          |    1    |   1.187294   |
|     functorch_maml_omniglot     |    1    |   0.798728   |
|          maml_omniglot          |    5    |   0.74576    |
|              dlrm               |    1    |   0.512985   |
|     pyhpc_isoneutral_mixing     |    1    |   0.048012   |
|     pyhpc_equation_of_state     |    1    |   0.036439   |
|          lennard_jones          |    1    |   0.031538   |
|              moco               |    0    |     0.0      |
|         DALLE2_pytorch          |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 2.144507 |
|     MobileBertForQuestionAnswering      | 1  | 1.729305 |
|            XLNetLMHeadModel             | 1  |  1.3812  |
|         Speech2Text2ForCausalLM         | 1  | 1.366808 |
|      GPT2ForSequenceClassification      | 1  | 1.302406 |
|          DistilBertForMaskedLM          | 1  | 1.298827 |
|       BlenderbotSmallForCausalLM        | 1  | 1.298316 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.296557 |
|            YituTechConvBert             | 1  | 1.291725 |
|       DebertaForQuestionAnswering       | 1  | 1.290944 |
|     DistilBertForQuestionAnswering      | 1  | 1.285006 |
|           DebertaForMaskedLM            | 1  | 1.255148 |
|       MT5ForConditionalGeneration       | 1  | 1.250383 |
|     M2M100ForConditionalGeneration      | 1  | 1.250151 |
|          BlenderbotForCausalLM          | 1  | 1.244697 |
|     PegasusForConditionalGeneration     | 1  | 1.241261 |
|           PegasusForCausalLM            | 1  | 1.232838 |
|             XGLMForCausalLM             | 1  | 1.228677 |
|               GoogleFnet                | 1  | 1.213367 |
|            AlbertForMaskedLM            | 1  | 1.196531 |
|       AlbertForQuestionAnswering        | 1  | 1.196267 |
|           ElectraForCausalLM            | 1  | 1.188228 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.172365 |
|               DistillGPT2               | 1  | 1.170047 |
|         MegatronBertForCausalLM         | 1  | 1.165743 |
|        BertForQuestionAnswering         | 1  | 1.16537  |
|    MegatronBertForQuestionAnswering     | 1  | 1.16457  |
|          DebertaV2ForMaskedLM           | 1  | 1.164294 |
|       RobertaForQuestionAnswering       | 1  | 1.157736 |
|           RobertaForCausalLM            | 1  | 1.151564 |
|           LayoutLMForMaskedLM           | 1  | 1.150944 |
|             BertForMaskedLM             | 1  | 1.149539 |
|            TrOCRForCausalLM             | 1  | 1.148396 |
|    LayoutLMForSequenceClassification    | 1  | 1.145165 |
|                CamemBert                | 1  | 1.140417 |
|       ElectraForQuestionAnswering       | 1  | 1.139313 |
|     PLBartForConditionalGeneration      | 1  | 1.075575 |
|      MBartForConditionalGeneration      | 1  | 1.064006 |
|             BartForCausalLM             | 1  | 1.058616 |
|      BartForConditionalGeneration       | 1  | 1.057183 |
|             OPTForCausalLM              | 1  | 1.027033 |
|            PLBartForCausalLM            | 1  | 1.025707 |
|            MBartForCausalLM             | 1  | 1.01223  |
|          AllenaiLongformerBase          | 1  | 0.965358 |
|                 T5Small                 | 1  | 0.611378 |
|       T5ForConditionalGeneration        | 1  | 0.611289 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+------------+
|                  name                   | bs |  inductor  |
+-----------------------------------------+----+------------+
|          MobileBertForMaskedLM          | 1  | 120.032952 |
|     MobileBertForQuestionAnswering      | 1  | 119.022626 |
|          AllenaiLongformerBase          | 1  | 91.705526  |
|      BartForConditionalGeneration       | 1  | 56.029726  |
|     PegasusForConditionalGeneration     | 1  | 49.309155  |
|     M2M100ForConditionalGeneration      | 1  |  49.02957  |
|      MBartForConditionalGeneration      | 1  | 48.997697  |
|          BlenderbotForCausalLM          | 1  | 47.591202  |
|             XGLMForCausalLM             | 1  | 45.953655  |
|            XLNetLMHeadModel             | 1  | 45.912344  |
|         MegatronBertForCausalLM         | 1  | 43.356689  |
|          DebertaV2ForMaskedLM           | 1  | 43.164039  |
|      DebertaV2ForQuestionAnswering      | 1  | 43.060354  |
|    MegatronBertForQuestionAnswering     | 1  | 43.004287  |
|       MT5ForConditionalGeneration       | 1  | 41.735918  |
| BlenderbotSmallForConditionalGeneration | 1  |  37.99942  |
|                 T5Small                 | 1  | 35.988632  |
|       T5ForConditionalGeneration        | 1  | 35.930358  |
|            YituTechConvBert             | 1  | 35.345325  |
|     PLBartForConditionalGeneration      | 1  |  31.52859  |
|            MBartForCausalLM             | 1  | 28.526369  |
|            TrOCRForCausalLM             | 1  | 28.456626  |
|             OPTForCausalLM              | 1  | 28.098838  |
|           PegasusForCausalLM            | 1  |  27.98899  |
|           ElectraForCausalLM            | 1  | 26.803388  |
|           LayoutLMForMaskedLM           | 1  | 26.620117  |
|           RobertaForCausalLM            | 1  | 26.609499  |
|                CamemBert                | 1  |  26.57865  |
|       ElectraForQuestionAnswering       | 1  | 26.562747  |
|       RobertaForQuestionAnswering       | 1  | 26.545229  |
|             BertForMaskedLM             | 1  | 26.459566  |
|        BertForQuestionAnswering         | 1  | 26.346667  |
|           DebertaForMaskedLM            | 1  | 26.318779  |
|    LayoutLMForSequenceClassification    | 1  | 26.249207  |
|       DebertaForQuestionAnswering       | 1  | 26.236647  |
|             BartForCausalLM             | 1  | 25.884689  |
|      GPT2ForSequenceClassification      | 1  | 22.961453  |
|       BlenderbotSmallForCausalLM        | 1  | 22.900092  |
|               GoogleFnet                | 1  | 21.386115  |
|            PLBartForCausalLM            | 1  | 21.184279  |
|          DistilBertForMaskedLM          | 1  | 21.002147  |
|     DistilBertForQuestionAnswering      | 1  | 20.864018  |
|         Speech2Text2ForCausalLM         | 1  | 20.518038  |
|               DistillGPT2               | 1  | 19.135186  |
|            AlbertForMaskedLM            | 1  | 18.382055  |
|       AlbertForQuestionAnswering        | 1  | 17.963982  |
+-----------------------------------------+----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.986858 |
|      MBartForConditionalGeneration      | 1  | 0.976906 |
|      GPT2ForSequenceClassification      | 1  | 0.953748 |
|          AllenaiLongformerBase          | 1  | 0.94867  |
|            MBartForCausalLM             | 1  | 0.924819 |
|            XLNetLMHeadModel             | 1  | 0.910783 |
|     PLBartForConditionalGeneration      | 1  | 0.905148 |
|                 T5Small                 | 1  | 0.90514  |
|       T5ForConditionalGeneration        | 1  | 0.904926 |
|            PLBartForCausalLM            | 1  | 0.90408  |
|       DebertaForQuestionAnswering       | 1  | 0.873051 |
|               GoogleFnet                | 1  | 0.855642 |
|       RobertaForQuestionAnswering       | 1  | 0.849057 |
|       ElectraForQuestionAnswering       | 1  | 0.841788 |
|    LayoutLMForSequenceClassification    | 1  | 0.840897 |
|        BertForQuestionAnswering         | 1  | 0.840303 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.835321 |
|    MegatronBertForQuestionAnswering     | 1  | 0.832182 |
|               DistillGPT2               | 1  | 0.828327 |
|           DebertaForMaskedLM            | 1  | 0.815856 |
|           LayoutLMForMaskedLM           | 1  | 0.813654 |
|         Speech2Text2ForCausalLM         | 1  | 0.812792 |
|           RobertaForCausalLM            | 1  | 0.811546 |
|         MegatronBertForCausalLM         | 1  | 0.810718 |
|                CamemBert                | 1  | 0.808638 |
|             BertForMaskedLM             | 1  | 0.806918 |
|           ElectraForCausalLM            | 1  | 0.805006 |
|            YituTechConvBert             | 1  | 0.801227 |
|     DistilBertForQuestionAnswering      | 1  | 0.80017  |
|          BlenderbotForCausalLM          | 1  | 0.798237 |
|          DebertaV2ForMaskedLM           | 1  | 0.796965 |
|             BartForCausalLM             | 1  | 0.793964 |
|       MT5ForConditionalGeneration       | 1  | 0.787741 |
|            TrOCRForCausalLM             | 1  | 0.786705 |
|      BartForConditionalGeneration       | 1  | 0.778227 |
|       BlenderbotSmallForCausalLM        | 1  | 0.763612 |
|           PegasusForCausalLM            | 1  | 0.750139 |
|          DistilBertForMaskedLM          | 1  | 0.745852 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.736823 |
|     MobileBertForQuestionAnswering      | 1  | 0.731196 |
|     PegasusForConditionalGeneration     | 1  | 0.715542 |
|     M2M100ForConditionalGeneration      | 1  | 0.705035 |
|          MobileBertForMaskedLM          | 1  | 0.701732 |
|             XGLMForCausalLM             | 1  | 0.697753 |
|            AlbertForMaskedLM            | 1  | 0.447797 |
|       AlbertForQuestionAnswering        | 1  | 0.442715 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12718.422196 |
|       AlbertForQuestionAnswering        | 1  | 12678.53272  |
|      MBartForConditionalGeneration      | 1  | 6101.789945  |
|      BartForConditionalGeneration       | 1  | 5694.716895  |
|             OPTForCausalLM              | 1  | 5174.190512  |
|          DebertaV2ForMaskedLM           | 1  | 5034.246952  |
|      DebertaV2ForQuestionAnswering      | 1  | 3951.844118  |
|            XLNetLMHeadModel             | 1  | 3092.230518  |
|            MBartForCausalLM             | 1  | 3019.465758  |
|          BlenderbotForCausalLM          | 1  | 2616.816748  |
|             BartForCausalLM             | 1  | 2552.822487  |
|                 T5Small                 | 1  | 2472.539461  |
|       T5ForConditionalGeneration        | 1  | 2469.993618  |
|          AllenaiLongformerBase          | 1  | 2396.315765  |
|     PLBartForConditionalGeneration      | 1  | 2177.828528  |
|         MegatronBertForCausalLM         | 1  | 2028.144752  |
|    MegatronBertForQuestionAnswering     | 1  | 1851.199026  |
|      GPT2ForSequenceClassification      | 1  | 1307.926782  |
|            PLBartForCausalLM            | 1  | 1205.958065  |
|             XGLMForCausalLM             | 1  |  829.331677  |
|           DebertaForMaskedLM            | 1  |  785.216572  |
|           RobertaForCausalLM            | 1  |  778.160448  |
|     M2M100ForConditionalGeneration      | 1  |  708.625557  |
|                CamemBert                | 1  |  697.592102  |
|           LayoutLMForMaskedLM           | 1  |  685.39113   |
|             BertForMaskedLM             | 1  |  683.351743  |
|            YituTechConvBert             | 1  |  681.17796   |
|     PegasusForConditionalGeneration     | 1  |  599.102247  |
|            TrOCRForCausalLM             | 1  |  583.480383  |
|       DebertaForQuestionAnswering       | 1  |  554.961169  |
|        BertForQuestionAnswering         | 1  |  543.78092   |
|       RobertaForQuestionAnswering       | 1  |  543.635728  |
|    LayoutLMForSequenceClassification    | 1  |  541.794892  |
|               DistillGPT2               | 1  |  503.991208  |
|               GoogleFnet                | 1  |  470.679256  |
|       MT5ForConditionalGeneration       | 1  |  297.869543  |
|           PegasusForCausalLM            | 1  |  297.661061  |
| BlenderbotSmallForConditionalGeneration | 1  |  143.454061  |
|           ElectraForCausalLM            | 1  |  132.991081  |
|          DistilBertForMaskedLM          | 1  |  98.636709   |
|       ElectraForQuestionAnswering       | 1  |  93.497692   |
|       BlenderbotSmallForCausalLM        | 1  |  83.190226   |
|     DistilBertForQuestionAnswering      | 1  |   63.43946   |
|          MobileBertForMaskedLM          | 1  |  62.692236   |
|     MobileBertForQuestionAnswering      | 1  |  36.506838   |
|         Speech2Text2ForCausalLM         | 1  |  18.088095   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.438565 |
|          inception_v3           | 1  | 2.308357 |
|       gluon_inception_v3        | 1  | 2.26615  |
|        adv_inception_v3         | 1  | 2.241109 |
|            nfnet_l0             | 1  | 2.215267 |
|           dm_nfnet_f0           | 1  | 2.212433 |
|         mobilenetv2_100         | 1  | 2.140964 |
|          ghostnet_100           | 1  | 2.118815 |
|          spnasnet_100           | 1  | 2.058835 |
|            levit_128            | 1  | 2.054534 |
|            lcnet_050            | 1  | 2.020239 |
|           mnasnet_100           | 1  | 2.011081 |
|           fbnetc_100            | 1  | 1.99156  |
|            hrnet_w18            | 1  | 1.988617 |
|            repvgg_a2            | 1  | 1.981623 |
|      mobilenetv3_large_100      | 1  | 1.963036 |
|           regnety_002           | 1  | 1.913447 |
|            fbnetv3_b            | 1  | 1.816423 |
|       tf_efficientnet_b0        | 1  | 1.791008 |
|           selecsls42b           | 1  | 1.782373 |
|           rexnet_100            | 1  | 1.770335 |
|             dla102              | 1  | 1.727665 |
|        ese_vovnet19b_dw         | 1  | 1.701149 |
|       eca_botnext26ts_256       | 1  | 1.678613 |
|          botnet26t_256          | 1  | 1.668669 |
|            tinynet_a            | 1  | 1.651359 |
|           resnest101e           | 1  | 1.637764 |
|          cspdarknet53           | 1  | 1.622124 |
|        eca_halonext26ts         | 1  | 1.560703 |
|           res2next50            | 1  | 1.550184 |
|        res2net50_14w_8s         | 1  | 1.546854 |
|         poolformer_m36          | 1  | 1.53804  |
|        res2net101_26w_4s        | 1  | 1.499462 |
|           mobilevit_s           | 1  | 1.492835 |
|           volo_d1_224           | 1  | 1.492783 |
|           tf_mixnet_l           | 1  | 1.453737 |
|         visformer_small         | 1  | 1.42957  |
|           convit_base           | 1  | 1.371577 |
|          gmixer_24_224          | 1  | 1.371143 |
|     swsl_resnext101_32x16d      | 1  | 1.348304 |
|        twins_pcpvt_base         | 1  | 1.333138 |
|            gernet_l             | 1  | 1.308863 |
|            mixnet_l             | 1  | 1.298338 |
|          resmlp_12_224          | 1  | 1.282873 |
|  swin_base_patch4_window7_224   | 1  | 1.274929 |
|      beit_base_patch16_224      | 1  | 1.273017 |
|        convmixer_768_32         | 1  | 1.254603 |
|             dpn107              | 1  | 1.210575 |
|         crossvit_9_240          | 1  | 1.201489 |
| deit_base_distilled_patch16_224 | 1  | 1.197246 |
|      vit_base_patch16_224       | 1  | 1.195988 |
|          mixer_b16_224          | 1  | 1.189784 |
|      xcit_large_24_p8_224       | 1  | 1.184612 |
|          jx_nest_base           | 1  | 1.174839 |
|        tnt_s_patch16_224        | 1  | 1.172103 |
|          gmlp_s16_224           | 1  | 1.161902 |
|            pit_b_224            | 1  | 1.161716 |
|          convnext_base          | 1  | 1.143382 |
|        sebotnet33ts_256         | 1  | 1.101194 |
|          cait_m36_384           | 1  | 0.991849 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|       eca_botnext26ts_256       | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          cait_m36_384           | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
|        eca_halonext26ts         | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|          botnet26t_256          | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|        convmixer_768_32         | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|        sebotnet33ts_256         | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|         coat_lite_mini          | 1  |  fail_to_run  |
|           resnest101e           | 1  | fail_accuracy |
|            levit_128            | 1  | fail_accuracy |
|          ghostnet_100           | 1  | fail_accuracy |
|          pnasnet5large          | 1  | fail_accuracy |
|        res2net101_26w_4s        | 1  | fail_accuracy |
|        res2net50_14w_8s         | 1  | fail_accuracy |
|             dla102              | 1  | fail_accuracy |
|     swsl_resnext101_32x16d      | 1  | fail_accuracy |
|           rexnet_100            | 1  | fail_accuracy |
|           selecsls42b           | 1  | fail_accuracy |
|           res2next50            | 1  | fail_accuracy |
|            hrnet_w18            | 1  | fail_accuracy |
|           volo_d1_224           | 1  | fail_accuracy |
|         visformer_small         | 1  | fail_accuracy |
|      xcit_large_24_p8_224       | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+------------+
|              name               | bs |  inductor  |
+---------------------------------+----+------------+
|          pnasnet5large          | 1  | 482.184614 |
|            hrnet_w18            | 1  | 263.215205 |
|        res2net101_26w_4s        | 1  | 78.655752  |
|           tf_mixnet_l           | 1  | 71.457458  |
|           resnest101e           | 1  | 70.296897  |
|            mixnet_l             | 1  | 67.943105  |
|          cait_m36_384           | 1  | 66.404378  |
|        res2net50_14w_8s         | 1  | 63.859601  |
|        twins_pcpvt_base         | 1  | 60.390066  |
|      xcit_large_24_p8_224       | 1  | 57.725947  |
|         poolformer_m36          | 1  | 55.961703  |
|  swin_base_patch4_window7_224   | 1  | 55.489283  |
|             dpn107              | 1  | 50.456311  |
|        tnt_s_patch16_224        | 1  |  45.21752  |
|          jx_nest_base           | 1  | 44.473884  |
|           mobilevit_s           | 1  | 42.936526  |
|            fbnetv3_b            | 1  | 42.092633  |
|          convnext_base          | 1  | 37.571136  |
|             dla102              | 1  | 36.638368  |
|          inception_v3           | 1  | 36.390601  |
|        adv_inception_v3         | 1  | 36.332648  |
|       gluon_inception_v3        | 1  | 36.319665  |
|          gmlp_s16_224           | 1  | 34.687811  |
|           volo_d1_224           | 1  | 34.549127  |
|          ghostnet_100           | 1  | 34.258216  |
|          gmixer_24_224          | 1  | 34.012827  |
|           res2next50            | 1  | 33.856715  |
|         crossvit_9_240          | 1  | 32.686803  |
|            tinynet_a            | 1  | 32.093968  |
|           dm_nfnet_f0           | 1  |  31.75281  |
|     swsl_resnext101_32x16d      | 1  | 31.682602  |
|        eca_halonext26ts         | 1  | 31.497655  |
|        sebotnet33ts_256         | 1  |  31.4166   |
|            levit_128            | 1  | 30.670489  |
|           rexnet_100            | 1  | 30.069825  |
|        convmixer_768_32         | 1  | 29.639141  |
|            nfnet_l0             | 1  | 29.635653  |
|       tf_efficientnet_b0        | 1  | 29.286714  |
|           convit_base           | 1  | 26.824602  |
|         visformer_small         | 1  | 26.046716  |
|       eca_botnext26ts_256       | 1  | 26.034714  |
|          cspdarknet53           | 1  | 25.210911  |
|            pit_b_224            | 1  | 25.094791  |
|           regnety_002           | 1  | 25.078664  |
|          botnet26t_256          | 1  | 25.012623  |
|      beit_base_patch16_224      | 1  |  24.10631  |
|      mobilenetv3_large_100      | 1  | 23.471417  |
| deit_base_distilled_patch16_224 | 1  | 23.211042  |
|           fbnetc_100            | 1  | 23.105157  |
|      vit_base_patch16_224       | 1  | 23.094378  |
|          spnasnet_100           | 1  | 22.871575  |
|          mixer_b16_224          | 1  | 22.835365  |
|            gernet_l             | 1  |  22.21059  |
|            repvgg_a2            | 1  | 22.095855  |
|         mobilenetv2_100         | 1  | 21.078016  |
|        ese_vovnet19b_dw         | 1  | 21.014606  |
|           mnasnet_100           | 1  | 20.825864  |
|          resmlp_12_224          | 1  | 19.955391  |
|           selecsls42b           | 1  | 19.321771  |
|            lcnet_050            | 1  | 18.212933  |
+---------------------------------+----+------------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.945841 |
|          pnasnet5large          | 1  | 0.930089 |
|        convmixer_768_32         | 1  | 0.917727 |
|            nfnet_l0             | 1  | 0.902126 |
|      xcit_large_24_p8_224       | 1  | 0.890378 |
|        ese_vovnet19b_dw         | 1  | 0.887897 |
|         mobilenetv2_100         | 1  | 0.876339 |
|           mnasnet_100           | 1  | 0.876281 |
|          spnasnet_100           | 1  | 0.875186 |
|       tf_efficientnet_b0        | 1  | 0.874833 |
|            fbnetv3_b            | 1  | 0.874341 |
|       eca_botnext26ts_256       | 1  | 0.868609 |
|      mobilenetv3_large_100      | 1  | 0.866651 |
|           fbnetc_100            | 1  | 0.865194 |
|            tinynet_a            | 1  | 0.86304  |
|           rexnet_100            | 1  | 0.863005 |
|            lcnet_050            | 1  | 0.85799  |
|         poolformer_m36          | 1  | 0.855753 |
|           dm_nfnet_f0           | 1  | 0.854098 |
|        eca_halonext26ts         | 1  | 0.852411 |
|           tf_mixnet_l           | 1  | 0.848675 |
|           mobilevit_s           | 1  | 0.848352 |
|          ghostnet_100           | 1  | 0.845942 |
|          botnet26t_256          | 1  | 0.841232 |
|           regnety_002           | 1  | 0.840761 |
|            mixnet_l             | 1  | 0.839821 |
|          resmlp_12_224          | 1  | 0.827669 |
|         visformer_small         | 1  | 0.81842  |
|           res2next50            | 1  | 0.816348 |
|            levit_128            | 1  | 0.808609 |
|             dpn107              | 1  | 0.803928 |
|          convnext_base          | 1  | 0.801446 |
|        sebotnet33ts_256         | 1  | 0.799343 |
|        res2net50_14w_8s         | 1  | 0.799161 |
|            hrnet_w18            | 1  | 0.798254 |
|          cspdarknet53           | 1  | 0.793925 |
|          gmlp_s16_224           | 1  | 0.792821 |
|          gmixer_24_224          | 1  | 0.790565 |
|           volo_d1_224           | 1  | 0.787197 |
|        tnt_s_patch16_224        | 1  | 0.782104 |
|           convit_base           | 1  | 0.781327 |
|         crossvit_9_240          | 1  | 0.781169 |
|          mixer_b16_224          | 1  | 0.776418 |
|             dla102              | 1  | 0.776309 |
|           resnest101e           | 1  | 0.775845 |
|          jx_nest_base           | 1  | 0.773852 |
|        twins_pcpvt_base         | 1  | 0.772576 |
|      beit_base_patch16_224      | 1  | 0.769802 |
|        adv_inception_v3         | 1  | 0.763479 |
|       gluon_inception_v3        | 1  | 0.762764 |
| deit_base_distilled_patch16_224 | 1  | 0.762615 |
|          inception_v3           | 1  | 0.762549 |
|      vit_base_patch16_224       | 1  | 0.759674 |
|            pit_b_224            | 1  | 0.749565 |
|           selecsls42b           | 1  | 0.740793 |
|  swin_base_patch4_window7_224   | 1  | 0.739245 |
|        res2net101_26w_4s        | 1  | 0.737728 |
|            gernet_l             | 1  | 0.734749 |
|            repvgg_a2            | 1  | 0.692146 |
|     swsl_resnext101_32x16d      | 1  | 0.640265 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 3528.147593 |
|      xcit_large_24_p8_224       | 1  | 1526.366493 |
|     swsl_resnext101_32x16d      | 1  | 435.788041  |
|          pnasnet5large          | 1  | 358.054079  |
|          convnext_base          | 1  | 304.115494  |
|             dpn107              | 1  | 254.627365  |
|        convmixer_768_32         | 1  |  241.43949  |
|          jx_nest_base           | 1  | 230.074328  |
|      beit_base_patch16_224      | 1  | 196.751472  |
| deit_base_distilled_patch16_224 | 1  | 195.576812  |
|      vit_base_patch16_224       | 1  | 194.802638  |
|           convit_base           | 1  | 194.161864  |
|  swin_base_patch4_window7_224   | 1  | 193.273987  |
|            pit_b_224            | 1  | 166.709859  |
|           resnest101e           | 1  | 161.332575  |
|           dm_nfnet_f0           | 1  | 156.367501  |
|          mixer_b16_224          | 1  | 140.491804  |
|         poolformer_m36          | 1  | 135.131396  |
|        res2net101_26w_4s        | 1  |  109.05595  |
|        twins_pcpvt_base         | 1  |  103.71367  |
|           volo_d1_224           | 1  |  93.247267  |
|            nfnet_l0             | 1  |  92.155444  |
|        tnt_s_patch16_224        | 1  |  91.926868  |
|             dla102              | 1  |  87.897577  |
|        sebotnet33ts_256         | 1  |  83.445401  |
|            hrnet_w18            | 1  |  82.041429  |
|          cspdarknet53           | 1  |  81.653138  |
|       gluon_inception_v3        | 1  |  70.885623  |
|        adv_inception_v3         | 1  |  70.799252  |
|          inception_v3           | 1  |  70.720971  |
|          gmlp_s16_224           | 1  |  68.165962  |
|         visformer_small         | 1  |  64.698702  |
|            repvgg_a2            | 1  |  62.050651  |
|          gmixer_24_224          | 1  |  61.601975  |
|        res2net50_14w_8s         | 1  |  61.507068  |
|           res2next50            | 1  |  57.422505  |
|            gernet_l             | 1  |  55.436346  |
|          botnet26t_256          | 1  |  43.648344  |
|           selecsls42b           | 1  |  43.323374  |
|        eca_halonext26ts         | 1  |  42.864208  |
|           mobilevit_s           | 1  |  40.592861  |
|       eca_botnext26ts_256       | 1  |  39.323375  |
|          resmlp_12_224          | 1  |  34.995516  |
|         crossvit_9_240          | 1  |  32.394117  |
|        ese_vovnet19b_dw         | 1  |  30.448777  |
|            mixnet_l             | 1  |  27.677737  |
|           tf_mixnet_l           | 1  |   26.916    |
|            fbnetv3_b            | 1  |  14.151157  |
|       tf_efficientnet_b0        | 1  |  12.461897  |
|           rexnet_100            | 1  |  12.386916  |
|            tinynet_a            | 1  |  11.103746  |
|           fbnetc_100            | 1  |  8.498529   |
|            levit_128            | 1  |  8.049356   |
|          ghostnet_100           | 1  |  7.595638   |
|          spnasnet_100           | 1  |  7.562486   |
|           mnasnet_100           | 1  |  6.971507   |
|         mobilenetv2_100         | 1  |  6.786079   |
|      mobilenetv3_large_100      | 1  |  6.458055   |
|           regnety_002           | 1  |  5.700907   |
|            lcnet_050            | 1  |  2.196947   |
+---------------------------------+----+-------------+

@zxd1997066
Copy link
Contributor

[default] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-04-28 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 7478b7f1cac9686f00edf3db4667cf86d2421531
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000" 

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 85%, 67/79 | 100%, 46/46 | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.39x    |    1.35x    |    1.87x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   26.90    |    28.90    |    40.53    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.89x    |    0.98x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 11.015947 |
|       mobilenet_v3_large        |   32    | 2.930317  |
|        timm_efficientnet        |   64    | 2.850725  |
|          squeezenet1_1          |   16    | 2.839942  |
|          mobilenet_v2           |   16    | 2.807273  |
|           mnasnet1_0            |   32    | 2.743769  |
|       shufflenet_v2_x1_0        |   64    |  2.51896  |
|          timm_resnest           |   32    | 2.297859  |
|            resnet50             |   32    | 2.167645  |
|        phlippe_densenet         |   128   | 2.058647  |
|        soft_actor_critic        |   256   | 2.024655  |
|         resnext50_32x4d         |    8    | 1.966747  |
|            resnet152            |   32    | 1.962061  |
|           densenet121           |   64    |  1.93304  |
|         phlippe_resnet          |   128   | 1.917847  |
|       doctr_det_predictor       |    1    | 1.900573  |
|            resnet18             |    8    | 1.887874  |
|           timm_regnet           |   32    | 1.871451  |
|             hf_GPT2             |    1    | 1.802883  |
|           timm_nfnet            |   128   | 1.768134  |
|           timm_vovnet           |   32    | 1.652083  |
|             alexnet             |   128   | 1.601059  |
|             yolov3              |    8    | 1.573574  |
|          BERT_pytorch           |    2    | 1.572743  |
|          maml_omniglot          |    5    | 1.570102  |
|        basic_gnn_edgecnn        |    1    | 1.538316  |
|            moondream            |    1    | 1.527308  |
|      doctr_reco_predictor       |    1    | 1.517186  |
|            hf_Albert            |    1    |  1.51364  |
|          hf_Bert_large          |    1    | 1.496357  |
|          hf_GPT2_large          |    1    |  1.4647   |
|          fastNLP_Bert           |    1    | 1.452391  |
|          hf_Longformer          |    1    | 1.434061  |
|             hf_Bert             |    1    | 1.427604  |
|              vgg16              |    4    | 1.425166  |
|     functorch_maml_omniglot     |    1    | 1.404023  |
|         LearningToPaint         |   96    |  1.38914  |
|              llama              |   32    | 1.364385  |
|              dcgan              |   256   | 1.347897  |
|          basic_gnn_gcn          |    1    | 1.322528  |
|          hf_DistilBert          |    1    | 1.313329  |
|      torch_multimodal_clip      |   32    | 1.297788  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  1.29231  |
|          lennard_jones          |  1000   | 1.237514  |
|             hf_Bart             |    1    | 1.235106  |
|     timm_vision_transformer     |   32    |  1.23326  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.228333  |
|        hf_distil_whisper        |    1    |  1.21584  |
|     nvidia_deeprecommender      |   256   | 1.213797  |
|         basic_gnn_sage          |    1    | 1.204837  |
|         pytorch_stargan         |   16    | 1.200396  |
|           hf_BigBird            |    1    | 1.190127  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.189485  |
|           hf_T5_large           |    1    | 1.185928  |
|          pytorch_unet           |    1    | 1.162966  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.144773  |
|              hf_T5              |    1    | 1.131357  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.121873  |
|              dlrm               |  2048   |  1.11339  |
|          basic_gnn_gin          |    1    | 1.103752  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.070058  |
|  timm_vision_transformer_large  |   32    | 1.051622  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.043603  |
|       speech_transformer        |    1    | 1.021227  |
|             demucs              |    1    | 1.011915  |
|     resnet50_quantized_qat      |   32    |  1.00906  |
|           tts_angular           |   64    | 0.996654  |
|   mobilenet_v2_quantized_qat    |   96    | 0.995579  |
|           hf_Reformer           |    1    | 0.993709  |
|               drq               |    1    | 0.933362  |
|           hf_T5_base            |    1    | 0.802035  |
|       Background_Matting        |    1    | 0.792929  |
|         opacus_cifar10          |   64    | 0.720085  |
|              maml               |    1    | 0.698416  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.679507  |
|      functorch_dp_cifar10       |   64    | 0.665902  |
|        timm_efficientdet        |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
|              moco               |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|         basic_gnn_sage          |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    4    |        pass        |
|          hf_Longformer          |    4    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    4    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    4    |        pass        |
|             demucs              |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|              dlrm               |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|           hf_T5_base            |    4    |        pass        |
|               drq               |    1    |        pass        |
|            hf_Albert            |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|             yolov3              |    4    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|          lennard_jones          |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|            moondream            |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|              llama              |    4    |        pass        |
|        hf_distil_whisper        |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|              hf_T5              |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|       speech_transformer        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_fpn |    4    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
|           Super_SloMo           |    4    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    4    |    fail_to_run     |
|         resnext50_32x4d         |    4    |   fail_accuracy    |
|            resnet152            |    4    |   fail_accuracy    |
|       shufflenet_v2_x1_0        |    4    |   fail_accuracy    |
|           mnasnet1_0            |    4    |   fail_accuracy    |
|              dcgan              |    4    |   fail_accuracy    |
|          mobilenet_v2           |    4    |   fail_accuracy    |
|            resnet18             |    4    |   fail_accuracy    |
|            resnet50             |    4    |   fail_accuracy    |
|           densenet121           |    4    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           densenet121           |   64    | 98.494457 |
|           hf_BigBird            |    1    | 76.737414 |
|    detectron2_fcos_r_50_fpn     |    1    | 70.233678 |
|           hf_T5_large           |    1    | 62.309282 |
|              maml               |    1    | 53.805668 |
|           timm_nfnet            |   128   | 53.253739 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 52.902905 |
|          hf_Longformer          |    1    | 51.363855 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 50.668187 |
|           hf_Reformer           |    1    | 47.232192 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 46.283298 |
|           hf_T5_base            |    1    | 44.927798 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 43.985708 |
|        phlippe_densenet         |   128   | 43.444327 |
|  timm_vision_transformer_large  |   32    | 42.44337  |
|     pyhpc_isoneutral_mixing     | 1048576 | 39.366775 |
|       speech_transformer        |    1    | 39.343085 |
|      torch_multimodal_clip      |   32    | 39.28281  |
|          hf_GPT2_large          |    1    | 36.252494 |
|        timm_efficientnet        |   64    | 36.104905 |
|             demucs              |    1    | 34.828991 |
|            moondream            |    1    | 33.561789 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 33.45046  |
|              hf_T5              |    1    | 31.256867 |
|             yolov3              |    8    | 30.853204 |
|        hf_distil_whisper        |    1    | 30.418491 |
|         opacus_cifar10          |   64    | 29.163972 |
|      functorch_dp_cifar10       |   64    | 27.828719 |
|          timm_resnest           |   32    | 27.272209 |
|       doctr_det_predictor       |    1    | 26.674194 |
|          hf_Bert_large          |    1    | 26.401494 |
|       shufflenet_v2_x1_0        |   64    | 25.219764 |
|       mobilenet_v3_large        |   32    | 24.749537 |
|       Background_Matting        |    1    | 24.044689 |
|          BERT_pytorch           |    2    |  23.7816  |
|          fastNLP_Bert           |    1    | 22.759879 |
|           timm_vovnet           |   32    | 22.682673 |
|             hf_Bart             |    1    | 22.503363 |
|              llama              |   32    | 22.368952 |
|           timm_regnet           |   32    | 22.329015 |
|     timm_vision_transformer     |   32    | 21.735037 |
|          pytorch_unet           |    1    | 21.529761 |
|            hf_Albert            |    1    | 20.445236 |
|             hf_GPT2             |    1    | 19.764164 |
|          hf_DistilBert          |    1    | 19.099596 |
|             hf_Bert             |    1    | 19.006902 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 18.321518 |
|          squeezenet1_1          |   16    | 16.557016 |
|         pytorch_stargan         |   16    | 16.496867 |
|            resnet152            |   32    | 16.014458 |
|              vgg16              |    4    | 14.936582 |
|      doctr_reco_predictor       |    1    | 13.563714 |
|             alexnet             |   128   | 12.909701 |
|          basic_gnn_gcn          |    1    | 12.132502 |
|            resnet50             |   32    | 11.794288 |
|         resnext50_32x4d         |    8    | 11.760487 |
|          basic_gnn_gin          |    1    | 11.398157 |
|         basic_gnn_sage          |    1    | 11.374456 |
|               drq               |    1    | 11.001211 |
|              dlrm               |  2048   | 10.873414 |
|            resnet18             |    8    | 10.427719 |
|          mobilenet_v2           |   16    | 10.195256 |
|           mnasnet1_0            |   32    | 9.984101  |
|     functorch_maml_omniglot     |    1    | 9.793475  |
|          maml_omniglot          |    5    | 9.433639  |
|        basic_gnn_edgecnn        |    1    | 9.388274  |
|     pyhpc_equation_of_state     | 1048576 | 9.286728  |
|         LearningToPaint         |   96    | 9.197877  |
|     nvidia_deeprecommender      |   256   | 8.780251  |
|         phlippe_resnet          |   128   | 8.726479  |
|        soft_actor_critic        |   256   | 6.902728  |
|          lennard_jones          |  1000   | 6.816783  |
|              dcgan              |   256   | 5.800233  |
|           tts_angular           |   64    | 5.664828  |
|   mobilenet_v2_quantized_qat    |   96    |  0.10795  |
|     resnet50_quantized_qat      |   32    | 0.071165  |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|           timm_nfnet            |   128   | 0.993136 |
|              dlrm               |  2048   | 0.988084 |
|           hf_T5_base            |    1    | 0.987871 |
|        timm_efficientnet        |   64    | 0.98484  |
|       Background_Matting        |    1    | 0.983367 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981986 |
|  timm_vision_transformer_large  |   32    | 0.979659 |
|          pytorch_unet           |    1    | 0.978731 |
|             demucs              |    1    | 0.978608 |
|          hf_GPT2_large          |    1    | 0.978225 |
|           densenet121           |   64    | 0.976982 |
|             yolov3              |    8    | 0.975313 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.974315 |
|            resnet50             |   32    | 0.971733 |
|            resnet152            |   32    | 0.970816 |
|          timm_resnest           |   32    | 0.970413 |
|         LearningToPaint         |   96    | 0.97019  |
|        basic_gnn_edgecnn        |    1    | 0.969303 |
|           timm_vovnet           |   32    | 0.969155 |
|       doctr_det_predictor       |    1    | 0.965913 |
|      torch_multimodal_clip      |   32    | 0.964984 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.963668 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.963015 |
|   mobilenet_v2_quantized_qat    |   96    | 0.962987 |
|     resnet50_quantized_qat      |   32    | 0.962526 |
|     timm_vision_transformer     |   32    | 0.962364 |
|           timm_regnet           |   32    | 0.962203 |
|           mnasnet1_0            |   32    | 0.960182 |
|          mobilenet_v2           |   16    | 0.956652 |
|       shufflenet_v2_x1_0        |   64    | 0.95516  |
|       mobilenet_v3_large        |   32    | 0.953417 |
|           hf_BigBird            |    1    | 0.950545 |
|         pytorch_stargan         |   16    | 0.948365 |
|         resnext50_32x4d         |    8    | 0.945516 |
|        phlippe_densenet         |   128   | 0.944083 |
|          basic_gnn_gcn          |    1    | 0.940614 |
|      doctr_reco_predictor       |    1    | 0.937347 |
|              llama              |   32    | 0.931905 |
|           tts_angular           |   64    | 0.928613 |
|          squeezenet1_1          |   16    | 0.91727  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.916725 |
|              dcgan              |   256   | 0.916252 |
|        hf_distil_whisper        |    1    | 0.911451 |
|     pyhpc_equation_of_state     | 1048576 | 0.907668 |
|            resnet18             |    8    | 0.898506 |
|             alexnet             |   128   | 0.896076 |
|         phlippe_resnet          |   128   | 0.895468 |
|         opacus_cifar10          |   64    | 0.890159 |
|        soft_actor_critic        |   256   | 0.883736 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.881858 |
|          lennard_jones          |  1000   | 0.861556 |
|          maml_omniglot          |    5    | 0.856835 |
|         basic_gnn_sage          |    1    | 0.855805 |
|     functorch_maml_omniglot     |    1    | 0.854077 |
|          basic_gnn_gin          |    1    | 0.85292  |
|          fastNLP_Bert           |    1    | 0.84667  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.845888 |
|            moondream            |    1    | 0.824356 |
|       speech_transformer        |    1    | 0.815768 |
|          hf_Bert_large          |    1    | 0.80821  |
|          BERT_pytorch           |    2    | 0.800015 |
|          hf_Longformer          |    1    | 0.798231 |
|             hf_Bert             |    1    | 0.796648 |
|      functorch_dp_cifar10       |   64    | 0.795457 |
|              maml               |    1    | 0.792945 |
|            hf_Albert            |    1    | 0.791531 |
|           hf_T5_large           |    1    | 0.790352 |
|     nvidia_deeprecommender      |   256   | 0.775716 |
|              vgg16              |    4    | 0.772127 |
|               drq               |    1    | 0.767795 |
|             hf_Bart             |    1    | 0.766459 |
|          hf_DistilBert          |    1    | 0.762543 |
|             hf_GPT2             |    1    | 0.760201 |
|              hf_T5              |    1    | 0.752366 |
|           hf_Reformer           |    1    | 0.738062 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.686895 |
|        timm_efficientdet        |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
|              moco               |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4432.140179 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1443.600725 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1310.064652 |
|           hf_T5_base            |    1    | 1267.770798 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1175.875994 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1090.437398 |
|          hf_GPT2_large          |    1    | 572.681075  |
|           timm_nfnet            |   128   | 558.614786  |
|           hf_T5_large           |    1    | 411.552971  |
|            moondream            |    1    |  397.45585  |
|        hf_distil_whisper        |    1    | 355.800881  |
|       Background_Matting        |    1    | 349.988018  |
|          pytorch_unet           |    1    | 233.416975  |
|           timm_regnet           |   32    | 219.007455  |
|            resnet152            |   32    | 192.383544  |
|           densenet121           |   64    | 190.409325  |
|    detectron2_fcos_r_50_fpn     |    1    | 180.614852  |
|             yolov3              |    8    | 159.255668  |
|      torch_multimodal_clip      |   32    | 150.860219  |
|             demucs              |    1    |  143.9542   |
|           hf_BigBird            |    1    | 129.603945  |
|           timm_vovnet           |   32    |  117.97214  |
|          hf_Bert_large          |    1    | 111.049598  |
|         pytorch_stargan         |   16    |  95.034851  |
|     timm_vision_transformer     |   32    |  90.278134  |
|       doctr_det_predictor       |    1    |  87.793389  |
|            resnet50             |   32    |  76.482346  |
|          hf_Longformer          |    1    |  72.299142  |
|             hf_Bart             |    1    |  54.638343  |
|       speech_transformer        |    1    |  54.008567  |
|          timm_resnest           |   32    |  53.272446  |
|        timm_efficientnet        |   64    |  49.463081  |
|              maml               |    1    |  48.796915  |
|              hf_T5              |    1    |  43.848198  |
|             alexnet             |   128   |  43.321461  |
|             hf_Bert             |    1    |  41.526468  |
|   mobilenet_v2_quantized_qat    |   96    |  38.445152  |
|         LearningToPaint         |   96    |  37.587662  |
|           hf_Reformer           |    1    |  36.943478  |
|            hf_Albert            |    1    |  36.494097  |
|          fastNLP_Bert           |    1    |  34.244194  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  33.536572  |
|              vgg16              |    4    |  32.77807   |
|     nvidia_deeprecommender      |   256   |  30.04258   |
|     pyhpc_isoneutral_mixing     | 1048576 |  28.629106  |
|          hf_DistilBert          |    1    |  27.69711   |
|     resnet50_quantized_qat      |   32    |  25.18818   |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  24.985439  |
|             hf_GPT2             |    1    |  24.24909   |
|              llama              |   32    |  22.947392  |
|         resnext50_32x4d         |    8    |   22.5768   |
|          BERT_pytorch           |    2    |  20.927238  |
|           tts_angular           |   64    |  20.184839  |
|        phlippe_densenet         |   128   |  19.565492  |
|              dcgan              |   256   |  18.874339  |
|        basic_gnn_edgecnn        |    1    |  18.04342   |
|       shufflenet_v2_x1_0        |   64    |  15.650006  |
|           mnasnet1_0            |   32    |  14.57711   |
|       mobilenet_v3_large        |   32    |  13.06538   |
|      functorch_dp_cifar10       |   64    |  9.507316   |
|         opacus_cifar10          |   64    |  9.469891   |
|          basic_gnn_gcn          |    1    |  9.332949   |
|          mobilenet_v2           |   16    |  8.972687   |
|            resnet18             |    8    |  8.858536   |
|              dlrm               |  2048   |  6.742726   |
|          squeezenet1_1          |   16    |  5.947557   |
|         basic_gnn_sage          |    1    |  5.008777   |
|          basic_gnn_gin          |    1    |  4.711308   |
|         phlippe_resnet          |   128   |  4.024972   |
|      doctr_reco_predictor       |    1    |  3.510442   |
|     pyhpc_equation_of_state     | 1048576 |  1.118697   |
|               drq               |    1    |   0.96271   |
|     functorch_maml_omniglot     |    1    |  0.483913   |
|          maml_omniglot          |    5    |  0.371447   |
|        soft_actor_critic        |   256   |  0.305675   |
|          lennard_jones          |  1000   |  0.188238   |
|        timm_efficientdet        |    0    |     0.0     |
|         DALLE2_pytorch          |    0    |     0.0     |
|              moco               |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.702938 |
|     MobileBertForQuestionAnswering      | 128 | 1.980719 |
|      GPT2ForSequenceClassification      |  4  | 1.951068 |
|           ElectraForCausalLM            | 32  | 1.890224 |
|       ElectraForQuestionAnswering       | 64  | 1.767578 |
|          MobileBertForMaskedLM          | 128 | 1.709145 |
|               DistillGPT2               | 16  | 1.522493 |
|       RobertaForQuestionAnswering       | 16  | 1.436455 |
|               GoogleFnet                | 16  | 1.423068 |
|    LayoutLMForSequenceClassification    | 16  | 1.413482 |
|            YituTechConvBert             | 16  | 1.40768  |
|        BertForQuestionAnswering         | 16  | 1.395873 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.386455 |
|           RobertaForCausalLM            | 16  | 1.385399 |
|    MegatronBertForQuestionAnswering     |  8  | 1.372278 |
|           LayoutLMForMaskedLM           | 16  | 1.365917 |
|          AllenaiLongformerBase          |  4  | 1.364453 |
|         MegatronBertForCausalLM         |  4  | 1.36282  |
|                CamemBert                | 16  | 1.352879 |
|             BertForMaskedLM             | 16  | 1.351026 |
|           DebertaForMaskedLM            |  8  | 1.298962 |
|             XGLMForCausalLM             |  8  | 1.290617 |
|       DebertaForQuestionAnswering       | 16  | 1.286267 |
|     PLBartForConditionalGeneration      |  4  | 1.280044 |
|       AlbertForQuestionAnswering        |  4  | 1.246695 |
|            AlbertForMaskedLM            |  4  | 1.244913 |
|      MBartForConditionalGeneration      |  2  | 1.238134 |
| BlenderbotSmallForConditionalGeneration | 64  | 1.231734 |
|          BlenderbotForCausalLM          |  4  | 1.21882  |
|             OPTForCausalLM              |  2  | 1.217794 |
|          DebertaV2ForMaskedLM           |  2  | 1.216144 |
|         Speech2Text2ForCausalLM         | 256 | 1.201275 |
|          DistilBertForMaskedLM          | 128 | 1.200368 |
|     DistilBertForQuestionAnswering      | 256 | 1.175695 |
|     M2M100ForConditionalGeneration      | 16  | 1.164625 |
|       MT5ForConditionalGeneration       | 16  | 1.15975  |
|     PegasusForConditionalGeneration     | 32  | 1.147377 |
|       BlenderbotSmallForCausalLM        | 64  | 1.144147 |
|             BartForCausalLM             |  4  | 1.141104 |
|      BartForConditionalGeneration       |  2  | 1.129724 |
|           PegasusForCausalLM            | 32  | 1.124311 |
|            MBartForCausalLM             |  4  | 1.105761 |
|            TrOCRForCausalLM             | 32  | 1.072669 |
|            PLBartForCausalLM            |  8  | 1.057296 |
|       T5ForConditionalGeneration        |  4  | 0.999381 |
|                 T5Small                 |  4  | 0.99707  |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|          AllenaiLongformerBase          |  4  | 94.314031 |
|          MobileBertForMaskedLM          | 128 | 46.846754 |
|     MobileBertForQuestionAnswering      | 128 | 44.536488 |
|     PegasusForConditionalGeneration     | 32  | 43.623338 |
|     M2M100ForConditionalGeneration      | 16  | 43.151178 |
|       MT5ForConditionalGeneration       | 16  | 42.761063 |
|      MBartForConditionalGeneration      |  2  | 42.073225 |
|                 T5Small                 |  4  | 38.779013 |
|          BlenderbotForCausalLM          |  4  | 38.729638 |
|       T5ForConditionalGeneration        |  4  | 38.728983 |
|             XGLMForCausalLM             |  8  | 34.80141  |
|          DebertaV2ForMaskedLM           |  2  | 34.237516 |
| BlenderbotSmallForConditionalGeneration | 64  | 33.629645 |
|      DebertaV2ForQuestionAnswering      |  1  | 32.009731 |
|     PLBartForConditionalGeneration      |  4  | 31.310726 |
|      BartForConditionalGeneration       |  2  | 30.399495 |
|            YituTechConvBert             | 16  | 30.191817 |
|         MegatronBertForCausalLM         |  4  | 29.862527 |
|    MegatronBertForQuestionAnswering     |  8  | 28.47861  |
|             OPTForCausalLM              |  2  | 27.559969 |
|           PegasusForCausalLM            | 32  | 25.997468 |
|            MBartForCausalLM             |  4  | 25.381748 |
|            TrOCRForCausalLM             | 32  | 24.884318 |
|           DebertaForMaskedLM            |  8  | 24.283957 |
|       DebertaForQuestionAnswering       | 16  | 23.377701 |
|           RobertaForCausalLM            | 16  | 21.739817 |
|           ElectraForCausalLM            | 32  | 21.587745 |
|          DistilBertForMaskedLM          | 128 | 21.412805 |
|      GPT2ForSequenceClassification      |  4  | 21.399501 |
|            AlbertForMaskedLM            |  4  | 21.137205 |
|                CamemBert                | 16  | 21.102858 |
|       BlenderbotSmallForCausalLM        | 64  | 21.002368 |
|            PLBartForCausalLM            |  8  | 20.362671 |
|     DistilBertForQuestionAnswering      | 256 | 20.354048 |
|         Speech2Text2ForCausalLM         | 256 | 20.328655 |
|           LayoutLMForMaskedLM           | 16  | 19.891766 |
|             BertForMaskedLM             | 16  | 19.856439 |
|       ElectraForQuestionAnswering       | 64  |  19.6719  |
|       AlbertForQuestionAnswering        |  4  | 19.638167 |
|    LayoutLMForSequenceClassification    | 16  | 19.616465 |
|       RobertaForQuestionAnswering       | 16  | 19.61531  |
|             BartForCausalLM             |  4  | 19.264221 |
|               DistillGPT2               | 16  | 19.247732 |
|               GoogleFnet                | 16  | 18.744806 |
|        BertForQuestionAnswering         | 16  | 18.402236 |
|            XLNetLMHeadModel             |  8  | 15.020554 |
+-----------------------------------------+-----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  | 0.994427 |
|       AlbertForQuestionAnswering        |  4  | 0.994342 |
|     DistilBertForQuestionAnswering      | 256 | 0.993253 |
|             OPTForCausalLM              |  2  | 0.993235 |
|           RobertaForCausalLM            | 16  | 0.992306 |
|            TrOCRForCausalLM             | 32  | 0.992104 |
|               DistillGPT2               | 16  | 0.99208  |
|          DistilBertForMaskedLM          | 128 | 0.991766 |
|               GoogleFnet                | 16  | 0.991348 |
|           ElectraForCausalLM            | 32  | 0.991093 |
|       ElectraForQuestionAnswering       | 64  | 0.990904 |
|                CamemBert                | 16  | 0.99083  |
|            PLBartForCausalLM            |  8  | 0.990733 |
|             BertForMaskedLM             | 16  | 0.990328 |
|           LayoutLMForMaskedLM           | 16  | 0.99024  |
|            MBartForCausalLM             |  4  | 0.990077 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.988982 |
|       RobertaForQuestionAnswering       | 16  | 0.988835 |
|            YituTechConvBert             | 16  | 0.988623 |
|    MegatronBertForQuestionAnswering     |  8  | 0.988572 |
|        BertForQuestionAnswering         | 16  | 0.988547 |
|         Speech2Text2ForCausalLM         | 256 | 0.988534 |
|    LayoutLMForSequenceClassification    | 16  | 0.98852  |
|       DebertaForQuestionAnswering       | 16  | 0.988499 |
|     PLBartForConditionalGeneration      |  4  | 0.98808  |
|           PegasusForCausalLM            | 32  | 0.987259 |
|      GPT2ForSequenceClassification      |  4  | 0.987197 |
|          MobileBertForMaskedLM          | 128 | 0.986922 |
|       BlenderbotSmallForCausalLM        | 64  | 0.986501 |
|           DebertaForMaskedLM            |  8  | 0.985661 |
|             BartForCausalLM             |  4  | 0.98559  |
|            XLNetLMHeadModel             |  8  | 0.985484 |
|         MegatronBertForCausalLM         |  4  | 0.984402 |
|       T5ForConditionalGeneration        |  4  | 0.983666 |
|                 T5Small                 |  4  | 0.983232 |
|     MobileBertForQuestionAnswering      | 128 | 0.983206 |
|          AllenaiLongformerBase          |  4  | 0.979046 |
|     PegasusForConditionalGeneration     | 32  | 0.975303 |
|      BartForConditionalGeneration       |  2  | 0.968915 |
|      MBartForConditionalGeneration      |  2  | 0.967072 |
|       MT5ForConditionalGeneration       | 16  | 0.962518 |
|             XGLMForCausalLM             |  8  | 0.931839 |
|     M2M100ForConditionalGeneration      | 16  | 0.927241 |
|          DebertaV2ForMaskedLM           |  2  | 0.915272 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.869559 |
|          BlenderbotForCausalLM          |  4  | 0.843867 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|            AlbertForMaskedLM            |  4  | 2508.681596 |
|       AlbertForQuestionAnswering        |  4  | 2497.588218 |
|            XLNetLMHeadModel             |  8  | 1282.569728 |
|            TrOCRForCausalLM             | 32  | 970.828917  |
|     PegasusForConditionalGeneration     | 32  | 930.303114  |
|     DistilBertForQuestionAnswering      | 256 | 838.943726  |
|    MegatronBertForQuestionAnswering     |  8  | 738.525677  |
|      MBartForConditionalGeneration      |  2  | 659.300126  |
|            MBartForCausalLM             |  4  | 648.660701  |
|          DistilBertForMaskedLM          | 128 | 629.210634  |
|           RobertaForCausalLM            | 16  |  605.3667   |
|            YituTechConvBert             | 16  | 589.034856  |
|          BlenderbotForCausalLM          |  4  |  587.58759  |
|             OPTForCausalLM              |  2  | 583.921433  |
|      BartForConditionalGeneration       |  2  | 581.941604  |
|     M2M100ForConditionalGeneration      | 16  | 575.880103  |
|          DebertaV2ForMaskedLM           |  2  |  571.06474  |
|                CamemBert                | 16  | 557.296864  |
|             BertForMaskedLM             | 16  | 552.823992  |
|           LayoutLMForMaskedLM           | 16  | 549.581912  |
|          AllenaiLongformerBase          |  4  | 521.124475  |
|       DebertaForQuestionAnswering       | 16  | 519.099405  |
|            PLBartForCausalLM            |  8  | 515.962567  |
|             BartForCausalLM             |  4  | 490.897014  |
| BlenderbotSmallForConditionalGeneration | 64  | 462.119023  |
|     PLBartForConditionalGeneration      |  4  | 459.998977  |
|           PegasusForCausalLM            | 32  | 459.495107  |
|        BertForQuestionAnswering         | 16  | 445.882064  |
|    LayoutLMForSequenceClassification    | 16  | 442.960743  |
|         MegatronBertForCausalLM         |  4  | 430.785051  |
|       RobertaForQuestionAnswering       | 16  | 424.373607  |
|               GoogleFnet                | 16  | 397.256033  |
|                 T5Small                 |  4  | 395.979269  |
|       T5ForConditionalGeneration        |  4  |  395.20598  |
|               DistillGPT2               | 16  | 385.555844  |
|          MobileBertForMaskedLM          | 128 | 377.942497  |
|           DebertaForMaskedLM            |  8  | 348.102113  |
|             XGLMForCausalLM             |  8  |  328.01744  |
|       ElectraForQuestionAnswering       | 64  | 314.982875  |
|       BlenderbotSmallForCausalLM        | 64  | 268.540688  |
|         Speech2Text2ForCausalLM         | 256 | 254.738846  |
|      GPT2ForSequenceClassification      |  4  | 253.702355  |
|      DebertaV2ForQuestionAnswering      |  1  | 236.243338  |
|       MT5ForConditionalGeneration       | 16  | 234.686566  |
|           ElectraForCausalLM            | 32  | 229.839693  |
|     MobileBertForQuestionAnswering      | 128 | 221.743161  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           fbnetc_100            | 512  | 3.905662 |
|            lcnet_050            | 256  | 3.851044 |
|           mnasnet_100           | 512  | 3.84571  |
|         mobilenetv2_100         | 128  | 3.730093 |
|          spnasnet_100           | 128  | 3.525951 |
|      mobilenetv3_large_100      | 512  | 3.518504 |
|            fbnetv3_b            | 256  | 3.386666 |
|           regnety_002           | 1024 | 3.270549 |
|           rexnet_100            | 256  | 3.008808 |
|       tf_efficientnet_b0        | 128  | 2.929328 |
|            tinynet_a            | 128  | 2.846988 |
|          pnasnet5large          |  16  | 2.604641 |
|        ese_vovnet19b_dw         | 256  | 2.56849  |
|            hrnet_w18            | 128  | 2.547749 |
|          botnet26t_256          | 128  | 2.520026 |
|           res2next50            | 128  | 2.379491 |
|          ghostnet_100           | 512  | 2.315653 |
|       eca_botnext26ts_256       | 128  | 2.310314 |
|       gluon_inception_v3        | 256  | 2.249451 |
|           resnest101e           |  64  | 2.198025 |
|          inception_v3           | 128  | 2.186552 |
|             dla102              | 128  | 2.180703 |
|        adv_inception_v3         | 128  | 2.160029 |
|        res2net50_14w_8s         | 128  | 2.116133 |
|        eca_halonext26ts         | 128  | 2.090014 |
|        res2net101_26w_4s        | 128  |  2.0834  |
|            repvgg_a2            | 128  | 2.03864  |
|          cspdarknet53           |  64  | 2.026277 |
|            nfnet_l0             | 128  | 1.954701 |
|        convmixer_768_32         |  32  | 1.928518 |
|           tf_mixnet_l           | 128  | 1.878807 |
|            gernet_l             | 128  | 1.854627 |
|           selecsls42b           | 128  | 1.778606 |
|           dm_nfnet_f0           | 128  | 1.775026 |
|        sebotnet33ts_256         |  64  | 1.752837 |
|            mixnet_l             | 128  | 1.715163 |
|           volo_d1_224           |  64  | 1.699727 |
|           mobilevit_s           |  64  | 1.669762 |
|         visformer_small         | 128  | 1.633629 |
|         poolformer_m36          |  64  | 1.630234 |
|     swsl_resnext101_32x16d      |  32  | 1.587078 |
|           convit_base           |  64  | 1.53329  |
|             dpn107              |  64  | 1.488433 |
|            levit_128            | 1024 | 1.462566 |
|          gmlp_s16_224           | 128  | 1.400294 |
|      xcit_large_24_p8_224       |  16  | 1.337805 |
|          gmixer_24_224          | 128  | 1.328443 |
|  swin_base_patch4_window7_224   |  64  | 1.282847 |
|        twins_pcpvt_base         | 128  | 1.222098 |
|          mixer_b16_224          | 128  | 1.216327 |
|        tnt_s_patch16_224        | 128  | 1.198079 |
|          convnext_base          |  64  | 1.188095 |
|      beit_base_patch16_224      |  64  | 1.172851 |
| deit_base_distilled_patch16_224 |  64  |  1.1523  |
|      vit_base_patch16_224       |  64  | 1.15135  |
|          cait_m36_384           |  4   | 1.134452 |
|            pit_b_224            |  64  | 1.109589 |
|          jx_nest_base           |  32  | 1.076194 |
|         crossvit_9_240          | 256  | 1.058379 |
|          resmlp_12_224          | 128  | 0.748714 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 8  |     pass      |
|           dm_nfnet_f0           | 8  |     pass      |
|            mixnet_l             | 8  |     pass      |
|       eca_botnext26ts_256       | 8  |     pass      |
|         crossvit_9_240          | 8  |     pass      |
| deit_base_distilled_patch16_224 | 8  |     pass      |
|          convnext_base          | 8  |     pass      |
|           regnety_002           | 8  |     pass      |
|           convit_base           | 8  |     pass      |
|          gmlp_s16_224           | 8  |     pass      |
|          cait_m36_384           | 8  |     pass      |
|             dpn107              | 8  |     pass      |
|      vit_base_patch16_224       | 8  |     pass      |
|          cspdarknet53           | 8  |     pass      |
|        eca_halonext26ts         | 8  |     pass      |
|        ese_vovnet19b_dw         | 8  |     pass      |
|           fbnetc_100            | 8  |     pass      |
|            fbnetv3_b            | 8  |     pass      |
|            gernet_l             | 8  |     pass      |
|          botnet26t_256          | 8  |     pass      |
|       gluon_inception_v3        | 8  |     pass      |
|          gmixer_24_224          | 8  |     pass      |
|          jx_nest_base           | 8  |     pass      |
|        convmixer_768_32         | 8  |     pass      |
|        twins_pcpvt_base         | 8  |     pass      |
|         poolformer_m36          | 8  |     pass      |
|            lcnet_050            | 8  |     pass      |
|          mixer_b16_224          | 8  |     pass      |
|        tnt_s_patch16_224        | 8  |     pass      |
|           mnasnet_100           | 8  |     pass      |
|         mobilenetv2_100         | 8  |     pass      |
|      mobilenetv3_large_100      | 8  |     pass      |
|           mobilevit_s           | 8  |     pass      |
|            nfnet_l0             | 8  |     pass      |
|            pit_b_224            | 8  |     pass      |
|      beit_base_patch16_224      | 8  |     pass      |
|            repvgg_a2            | 8  |     pass      |
|  swin_base_patch4_window7_224   | 8  |     pass      |
|          inception_v3           | 8  |     pass      |
|           tf_mixnet_l           | 8  |     pass      |
|       tf_efficientnet_b0        | 8  |     pass      |
|            tinynet_a            | 8  |     pass      |
|          spnasnet_100           | 8  |     pass      |
|        sebotnet33ts_256         | 8  |     pass      |
|          resmlp_12_224          | 8  |     pass      |
|         coat_lite_mini          | 8  |  fail_to_run  |
|           resnest101e           | 8  | fail_accuracy |
|            levit_128            | 8  | fail_accuracy |
|          ghostnet_100           | 8  | fail_accuracy |
|          pnasnet5large          | 8  | fail_accuracy |
|        res2net101_26w_4s        | 8  | fail_accuracy |
|        res2net50_14w_8s         | 8  | fail_accuracy |
|             dla102              | 8  | fail_accuracy |
|     swsl_resnext101_32x16d      | 8  | fail_accuracy |
|           rexnet_100            | 8  | fail_accuracy |
|           selecsls42b           | 8  | fail_accuracy |
|           res2next50            | 8  | fail_accuracy |
|            hrnet_w18            | 8  | fail_accuracy |
|           volo_d1_224           | 8  | fail_accuracy |
|         visformer_small         | 8  | fail_accuracy |
|      xcit_large_24_p8_224       | 8  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+------+-----------+
|              name               |  bs  | inductor  |
+---------------------------------+------+-----------+
|          pnasnet5large          |  16  | 94.944599 |
|  swin_base_patch4_window7_224   |  64  | 90.846361 |
|           mobilevit_s           |  64  | 84.978973 |
|           tf_mixnet_l           | 128  | 79.557686 |
|             dpn107              |  64  | 78.343963 |
|        twins_pcpvt_base         | 128  | 71.628226 |
|           rexnet_100            | 256  | 70.531178 |
|        eca_halonext26ts         | 128  | 67.938026 |
|        res2net50_14w_8s         | 128  | 65.238746 |
|        sebotnet33ts_256         |  64  | 65.100958 |
|      xcit_large_24_p8_224       |  16  | 64.739459 |
|          jx_nest_base           |  32  | 64.712587 |
|          ghostnet_100           | 512  | 64.074625 |
|            levit_128            | 1024 | 62.962893 |
|          cait_m36_384           |  4   | 62.043446 |
|        tnt_s_patch16_224        | 128  | 59.309961 |
|            mixnet_l             | 128  | 57.934545 |
|         crossvit_9_240          | 256  | 57.370385 |
|         poolformer_m36          |  64  | 56.359204 |
|           dm_nfnet_f0           | 128  | 56.230951 |
|       eca_botnext26ts_256       | 128  | 53.677671 |
|           volo_d1_224           |  64  | 51.145914 |
|          convnext_base          |  64  | 44.939924 |
|        res2net101_26w_4s        | 128  | 44.857256 |
|       tf_efficientnet_b0        | 128  | 44.369708 |
|            nfnet_l0             | 128  | 43.655371 |
|            hrnet_w18            | 128  | 42.86888  |
|       gluon_inception_v3        | 256  | 42.081577 |
|           convit_base           |  64  | 40.609391 |
|          botnet26t_256          | 128  | 40.137726 |
|          inception_v3           | 128  | 40.029379 |
|        adv_inception_v3         | 128  | 39.986701 |
|           res2next50            | 128  | 38.788741 |
|            tinynet_a            | 128  | 38.098925 |
|            pit_b_224            |  64  | 36.510815 |
|           resnest101e           |  64  | 32.734735 |
|            fbnetv3_b            | 256  | 32.129595 |
|         visformer_small         | 128  | 30.269662 |
|        ese_vovnet19b_dw         | 256  | 29.63955  |
|          cspdarknet53           |  64  | 29.591778 |
|          gmlp_s16_224           | 128  | 29.483197 |
|             dla102              | 128  | 29.056539 |
|          gmixer_24_224          | 128  | 28.222921 |
|      mobilenetv3_large_100      | 512  | 27.364888 |
|      vit_base_patch16_224       |  64  | 22.925086 |
|      beit_base_patch16_224      |  64  | 21.887868 |
|          mixer_b16_224          | 128  | 21.744917 |
| deit_base_distilled_patch16_224 |  64  | 21.671923 |
|           regnety_002           | 1024 | 20.949398 |
|          resmlp_12_224          | 128  | 19.436221 |
|        convmixer_768_32         |  32  | 17.382929 |
|           selecsls42b           | 128  | 17.055555 |
|            repvgg_a2            | 128  | 16.828826 |
|            lcnet_050            | 256  | 14.718665 |
|     swsl_resnext101_32x16d      |  32  | 13.924056 |
|         mobilenetv2_100         | 128  | 11.056285 |
|          spnasnet_100           | 128  | 11.019862 |
|            gernet_l             | 128  | 10.693624 |
|           fbnetc_100            | 512  | 10.52508  |
|           mnasnet_100           | 512  | 10.047296 |
+---------------------------------+------+-----------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.997478 |
|           fbnetc_100            | 512  | 0.996674 |
|      mobilenetv3_large_100      | 512  | 0.996413 |
|            fbnetv3_b            | 256  | 0.99581  |
|           mnasnet_100           | 512  | 0.995612 |
|          ghostnet_100           | 512  | 0.995602 |
|           regnety_002           | 1024 | 0.995531 |
|           dm_nfnet_f0           | 128  | 0.995132 |
|       eca_botnext26ts_256       | 128  | 0.994291 |
|          convnext_base          |  64  | 0.994014 |
|            levit_128            | 1024 | 0.99389  |
|        eca_halonext26ts         | 128  | 0.993506 |
|           rexnet_100            | 256  | 0.993401 |
|        res2net101_26w_4s        | 128  | 0.993115 |
|           res2next50            | 128  | 0.992798 |
|          botnet26t_256          | 128  | 0.992545 |
|       tf_efficientnet_b0        | 128  | 0.992544 |
|        convmixer_768_32         |  32  | 0.992538 |
|           tf_mixnet_l           | 128  | 0.992264 |
|          cspdarknet53           |  64  | 0.992161 |
|          gmlp_s16_224           | 128  | 0.991991 |
|        twins_pcpvt_base         | 128  | 0.991839 |
|            gernet_l             | 128  | 0.991833 |
|       gluon_inception_v3        | 256  | 0.991666 |
|            nfnet_l0             | 128  | 0.991623 |
|         visformer_small         | 128  | 0.991461 |
|            mixnet_l             | 128  | 0.991372 |
|        sebotnet33ts_256         |  64  | 0.991363 |
|          mixer_b16_224          | 128  | 0.990907 |
|           mobilevit_s           |  64  | 0.990715 |
|          gmixer_24_224          | 128  | 0.990708 |
|         mobilenetv2_100         | 128  | 0.990624 |
|        res2net50_14w_8s         | 128  | 0.990426 |
|      xcit_large_24_p8_224       |  16  | 0.990335 |
|             dla102              | 128  | 0.989444 |
|           selecsls42b           | 128  | 0.989279 |
|  swin_base_patch4_window7_224   |  64  | 0.98867  |
|           convit_base           |  64  | 0.98861  |
|      beit_base_patch16_224      |  64  | 0.988263 |
|          pnasnet5large          |  16  | 0.987992 |
|            tinynet_a            | 128  | 0.987742 |
|          spnasnet_100           | 128  | 0.987599 |
|        tnt_s_patch16_224        | 128  | 0.987167 |
|         poolformer_m36          |  64  | 0.987146 |
|      vit_base_patch16_224       |  64  | 0.986384 |
|          resmlp_12_224          | 128  | 0.986357 |
|          inception_v3           | 128  | 0.986241 |
|        adv_inception_v3         | 128  | 0.986203 |
| deit_base_distilled_patch16_224 |  64  | 0.985902 |
|           resnest101e           |  64  | 0.985559 |
|             dpn107              |  64  | 0.985441 |
|            hrnet_w18            | 128  | 0.985068 |
|            lcnet_050            | 256  | 0.984596 |
|            pit_b_224            |  64  | 0.983211 |
|            repvgg_a2            | 128  | 0.981973 |
|           volo_d1_224           |  64  | 0.981205 |
|          jx_nest_base           |  32  | 0.980014 |
|     swsl_resnext101_32x16d      |  32  | 0.979966 |
|          cait_m36_384           |  4   | 0.978969 |
|         crossvit_9_240          | 256  | 0.968184 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+-------------+
|              name               |  bs  |  inductor   |
+---------------------------------+------+-------------+
|      xcit_large_24_p8_224       |  16  | 1322.165898 |
|          cait_m36_384           |  4   | 1138.710058 |
|          convnext_base          |  64  | 1050.835249 |
|           dm_nfnet_f0           | 128  | 997.442125  |
|          mixer_b16_224          | 128  | 931.368192  |
|             dpn107              |  64  |  930.1308   |
|       gluon_inception_v3        | 256  | 832.575107  |
|        tnt_s_patch16_224        | 128  | 759.138715  |
|        twins_pcpvt_base         | 128  | 738.074015  |
|  swin_base_patch4_window7_224   |  64  | 732.471625  |
|           convit_base           |  64  | 724.696153  |
|        res2net101_26w_4s        | 128  | 655.752739  |
|     swsl_resnext101_32x16d      |  32  | 644.417451  |
| deit_base_distilled_patch16_224 |  64  | 631.061761  |
|      vit_base_patch16_224       |  64  |   628.882   |
|      beit_base_patch16_224      |  64  | 623.807663  |
|            nfnet_l0             | 128  |  621.40099  |
|            levit_128            | 1024 | 566.691227  |
|        ese_vovnet19b_dw         | 256  | 560.812308  |
|             dla102              | 128  | 533.959859  |
|            pit_b_224            |  64  | 527.331067  |
|          gmlp_s16_224           | 128  | 508.904173  |
|          jx_nest_base           |  32  | 507.020461  |
|           resnest101e           |  64  | 491.726246  |
|          gmixer_24_224          | 128  |  490.11529  |
|         crossvit_9_240          | 256  | 486.984759  |
|         poolformer_m36          |  64  |  481.76517  |
|        convmixer_768_32         |  32  | 448.244379  |
|            hrnet_w18            | 128  | 438.040797  |
|          resmlp_12_224          | 128  | 427.487144  |
|          inception_v3           | 128  | 419.620888  |
|        adv_inception_v3         | 128  | 418.777076  |
|           volo_d1_224           |  64  | 416.918957  |
|        res2net50_14w_8s         | 128  | 403.804338  |
|         visformer_small         | 128  | 399.905549  |
|           res2next50            | 128  | 363.689449  |
|          ghostnet_100           | 512  |  363.01738  |
|            mixnet_l             | 128  | 357.836796  |
|           tf_mixnet_l           | 128  | 346.335629  |
|          pnasnet5large          |  16  | 342.712782  |
|            repvgg_a2            | 128  | 340.168062  |
|        eca_halonext26ts         | 128  | 333.642418  |
|           fbnetc_100            | 512  | 307.392094  |
|            gernet_l             | 128  | 301.309821  |
|       eca_botnext26ts_256       | 128  | 295.588887  |
|        sebotnet33ts_256         |  64  | 284.512395  |
|          botnet26t_256          | 128  | 283.246054  |
|           regnety_002           | 1024 | 282.717661  |
|          cspdarknet53           |  64  | 265.155717  |
|           mnasnet_100           | 512  | 259.059748  |
|            fbnetv3_b            | 256  | 246.455677  |
|      mobilenetv3_large_100      | 512  | 235.962603  |
|           selecsls42b           | 128  | 232.686812  |
|           rexnet_100            | 256  | 231.103495  |
|           mobilevit_s           |  64  | 225.255299  |
|       tf_efficientnet_b0        | 128  | 119.103886  |
|            tinynet_a            | 128  |  82.798024  |
|         mobilenetv2_100         | 128  |  72.992213  |
|          spnasnet_100           | 128  |  66.070324  |
|            lcnet_050            | 256  |  26.915281  |
+---------------------------------+------+-------------+

@zxd1997066
Copy link
Contributor

[default] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-04-28 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 7478b7f1cac9686f00edf3db4667cf86d2421531
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+2c4665f
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 86%, 68/79 | 100%, 46/46 | 75%, 45/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.53x    |    1.20x    |    1.47x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   25.90    |    26.76    |    34.45    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.86x    |    0.81x    |    0.82x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 63.262442 |
|     pyhpc_equation_of_state     |    1    | 24.500432 |
|          maml_omniglot          |    5    | 3.868692  |
|         basic_gnn_sage          |    1    | 3.578566  |
|          basic_gnn_gin          |    1    | 3.456577  |
|          squeezenet1_1          |    1    | 3.345799  |
|     functorch_maml_omniglot     |    1    | 3.310978  |
|          basic_gnn_gcn          |    1    | 2.851753  |
|           timm_nfnet            |    1    | 2.759761  |
|         opacus_cifar10          |    1    | 2.245315  |
|            resnet18             |    1    | 2.194495  |
|              dcgan              |    1    | 2.170345  |
|      functorch_dp_cifar10       |    1    | 2.077488  |
|       shufflenet_v2_x1_0        |    1    | 2.068728  |
|          timm_resnest           |    1    | 1.991185  |
|          mobilenet_v2           |    1    | 1.872382  |
|          lennard_jones          |    1    |  1.79801  |
|            resnet50             |    1    | 1.776776  |
|           mnasnet1_0            |    1    | 1.753838  |
|       mobilenet_v3_large        |    1    | 1.707462  |
|            resnet152            |    1    | 1.653877  |
|           densenet121           |    1    | 1.650925  |
|         phlippe_resnet          |    1    | 1.621616  |
|           timm_vovnet           |    1    | 1.600889  |
|        timm_efficientnet        |    1    | 1.596071  |
|         LearningToPaint         |    1    | 1.567731  |
|      doctr_reco_predictor       |    1    | 1.502322  |
|         resnext50_32x4d         |    1    | 1.483361  |
|           timm_regnet           |    1    | 1.471197  |
|        phlippe_densenet         |    1    | 1.446083  |
|              vgg16              |    1    | 1.443946  |
|        basic_gnn_edgecnn        |    1    | 1.401205  |
|              llama              |    1    | 1.383997  |
|             yolov3              |    1    | 1.376947  |
|             alexnet             |    1    | 1.349291  |
|          BERT_pytorch           |    1    | 1.297712  |
|       doctr_det_predictor       |    1    | 1.288907  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.281898  |
| detectron2_fasterrcnn_r_101_c4  |    1    |  1.27779  |
|            hf_Albert            |    1    | 1.274382  |
|              maml               |    1    | 1.250008  |
|             hf_GPT2             |    1    | 1.248956  |
|               drq               |    1    | 1.221048  |
|          hf_GPT2_large          |    1    | 1.220003  |
|            moondream            |    1    | 1.211574  |
|          fastNLP_Bert           |    1    | 1.192918  |
|     timm_vision_transformer     |    1    | 1.192438  |
|         pytorch_stargan         |   16    | 1.191932  |
|        soft_actor_critic        |   256   | 1.184417  |
|  timm_vision_transformer_large  |    1    | 1.182843  |
|          hf_Bert_large          |    1    | 1.171868  |
|           hf_BigBird            |    1    | 1.153824  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.153818  |
|             hf_Bert             |    1    |  1.15107  |
|              dlrm               |    1    | 1.144017  |
|          hf_DistilBert          |    1    | 1.138331  |
|      torch_multimodal_clip      |    1    | 1.133059  |
|             hf_Bart             |    1    |  1.10074  |
|       speech_transformer        |    1    | 1.074328  |
|        hf_distil_whisper        |    1    | 1.062453  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.060951  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.050063  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.048379  |
|          pytorch_unet           |    1    | 1.040479  |
|          hf_Longformer          |    1    | 1.025126  |
|             demucs              |    1    | 1.001397  |
|           tts_angular           |    1    | 1.000037  |
|     resnet50_quantized_qat      |    1    | 0.995507  |
|   mobilenet_v2_quantized_qat    |    1    | 0.984789  |
|     nvidia_deeprecommender      |    1    | 0.954459  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.913991  |
|           hf_Reformer           |    1    | 0.847059  |
|       Background_Matting        |    1    | 0.799993  |
|           hf_T5_large           |    1    | 0.795817  |
|              hf_T5              |    1    | 0.710098  |
|           hf_T5_base            |    1    |  0.60092  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|         basic_gnn_sage          |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|               drq               |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|              llama              |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|              hf_T5              |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|       speech_transformer        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|           timm_regnet           |    1    |        pass        |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_fpn |    1    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
|           Super_SloMo           |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    1    |    fail_to_run     |
|         resnext50_32x4d         |    1    |   fail_accuracy    |
|            resnet152            |    1    |   fail_accuracy    |
|       shufflenet_v2_x1_0        |    1    |   fail_accuracy    |
|           mnasnet1_0            |    1    |   fail_accuracy    |
|          mobilenet_v2           |    1    |   fail_accuracy    |
|            resnet18             |    1    |   fail_accuracy    |
|            resnet50             |    1    |   fail_accuracy    |
|           densenet121           |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           hf_T5_base            |    1    | 98.309093 |
|           densenet121           |    1    | 90.455437 |
|           hf_BigBird            |    1    | 75.480233 |
|           hf_T5_large           |    1    | 71.380516 |
|    detectron2_fcos_r_50_fpn     |    1    | 69.475394 |
|              maml               |    1    | 53.754721 |
|          hf_Longformer          |    1    | 52.030002 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 50.014363 |
|           timm_nfnet            |    1    | 48.376334 |
|           hf_Reformer           |    1    | 47.684787 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 43.58856  |
|        phlippe_densenet         |    1    | 42.875657 |
|       speech_transformer        |    1    | 39.311693 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 39.033303 |
|  timm_vision_transformer_large  |    1    | 36.133595 |
|      torch_multimodal_clip      |    1    | 35.703585 |
|             demucs              |    1    | 35.404881 |
|        timm_efficientnet        |    1    | 33.582402 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 33.417886 |
|              hf_T5              |    1    | 32.61885  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 32.043927 |
|       Background_Matting        |    1    | 30.303982 |
|             yolov3              |    1    | 28.850848 |
|        hf_distil_whisper        |    1    | 28.650441 |
|         opacus_cifar10          |    1    | 28.619037 |
|            moondream            |    1    | 27.664421 |
|      functorch_dp_cifar10       |    1    | 27.222765 |
|          hf_GPT2_large          |    1    | 26.92861  |
|          hf_Bert_large          |    1    | 25.682672 |
|          timm_resnest           |    1    | 25.587136 |
|       doctr_det_predictor       |    1    | 24.677588 |
|       shufflenet_v2_x1_0        |    1    | 24.094917 |
|       mobilenet_v3_large        |    1    | 24.072805 |
|              llama              |    1    | 23.898589 |
|          BERT_pytorch           |    1    | 22.86616  |
|          fastNLP_Bert           |    1    | 22.381119 |
|             hf_Bart             |    1    | 22.338378 |
|           timm_vovnet           |    1    | 21.628362 |
|           timm_regnet           |    1    | 21.329989 |
|     timm_vision_transformer     |    1    | 21.218738 |
|          pytorch_unet           |    1    | 20.899623 |
|            hf_Albert            |    1    | 19.97451  |
|             hf_GPT2             |    1    | 19.503265 |
|          hf_DistilBert          |    1    | 19.052416 |
|             hf_Bert             |    1    | 18.828055 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 17.929882 |
|          squeezenet1_1          |    1    | 16.058009 |
|            resnet152            |    1    | 15.667878 |
|              vgg16              |    1    | 15.02801  |
|         pytorch_stargan         |   16    | 14.651159 |
|      doctr_reco_predictor       |    1    | 13.629644 |
|             alexnet             |    1    | 12.652098 |
|     pyhpc_isoneutral_mixing     |    1    | 12.113204 |
|         resnext50_32x4d         |    1    | 11.511932 |
|            resnet50             |    1    | 11.506502 |
|               drq               |    1    | 11.055464 |
|              dlrm               |    1    | 10.677767 |
|            resnet18             |    1    | 10.163143 |
|          mobilenet_v2           |    1    | 10.023548 |
|           mnasnet1_0            |    1    |  9.83509  |
|     functorch_maml_omniglot     |    1    | 9.793684  |
|          basic_gnn_gcn          |    1    | 9.521682  |
|          maml_omniglot          |    5    | 9.456393  |
|     nvidia_deeprecommender      |    1    |  9.31537  |
|          basic_gnn_gin          |    1    | 9.084035  |
|        basic_gnn_edgecnn        |    1    | 9.027598  |
|         LearningToPaint         |    1    | 8.968833  |
|     pyhpc_equation_of_state     |    1    | 8.826524  |
|         phlippe_resnet          |    1    |  8.67618  |
|         basic_gnn_sage          |    1    | 7.963586  |
|        soft_actor_critic        |   256   | 6.921197  |
|          lennard_jones          |    1    | 5.850488  |
|              dcgan              |    1    | 5.773668  |
|           tts_angular           |    1    | 5.607236  |
|   mobilenet_v2_quantized_qat    |    1    | 0.096576  |
|     resnet50_quantized_qat      |    1    | 0.071421  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.988125 |
|           hf_T5_base            |    1    | 0.987353 |
|             demucs              |    1    | 0.984633 |
|       Background_Matting        |    1    | 0.982807 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.979031 |
|          pytorch_unet           |    1    | 0.978084 |
|          hf_GPT2_large          |    1    | 0.97793  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.971966 |
|        basic_gnn_edgecnn        |    1    | 0.969496 |
|       doctr_det_predictor       |    1    | 0.969435 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.963824 |
|     resnet50_quantized_qat      |    1    | 0.955935 |
|         LearningToPaint         |    1    | 0.948541 |
|         pytorch_stargan         |   16    | 0.947986 |
|           hf_BigBird            |    1    | 0.946416 |
|      doctr_reco_predictor       |    1    | 0.94279  |
|          basic_gnn_gin          |    1    | 0.940215 |
|          basic_gnn_gcn          |    1    | 0.936155 |
|         basic_gnn_sage          |    1    | 0.935692 |
|   mobilenet_v2_quantized_qat    |    1    | 0.929419 |
|      torch_multimodal_clip      |    1    | 0.924298 |
|              llama              |    1    | 0.91977  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.916549 |
|        hf_distil_whisper        |    1    | 0.916114 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.88686  |
|           tts_angular           |    1    | 0.88582  |
|        soft_actor_critic        |   256   | 0.884724 |
|         opacus_cifar10          |    1    | 0.882629 |
|        timm_efficientnet        |    1    | 0.876009 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.875476 |
|          mobilenet_v2           |    1    | 0.865361 |
|          lennard_jones          |    1    | 0.859429 |
|          squeezenet1_1          |    1    | 0.858002 |
|          maml_omniglot          |    5    | 0.857605 |
|           mnasnet1_0            |    1    | 0.857363 |
|     functorch_maml_omniglot     |    1    | 0.854839 |
|          fastNLP_Bert           |    1    | 0.85406  |
|          timm_resnest           |    1    | 0.85302  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.851286 |
|       mobilenet_v3_large        |    1    | 0.846837 |
|              dcgan              |    1    | 0.84561  |
|       shufflenet_v2_x1_0        |    1    | 0.839098 |
|         phlippe_resnet          |    1    | 0.83548  |
|     pyhpc_equation_of_state     |    1    | 0.832318 |
|            moondream            |    1    | 0.825172 |
|       speech_transformer        |    1    | 0.821385 |
|        phlippe_densenet         |    1    | 0.818071 |
|           timm_nfnet            |    1    |  0.8149  |
|          hf_Bert_large          |    1    | 0.812817 |
|         resnext50_32x4d         |    1    | 0.810998 |
|     pyhpc_isoneutral_mixing     |    1    | 0.808527 |
|           hf_T5_large           |    1    |  0.8074  |
|          hf_Longformer          |    1    | 0.804716 |
|             hf_Bert             |    1    | 0.803303 |
|     timm_vision_transformer     |    1    | 0.803244 |
|            hf_Albert            |    1    | 0.801546 |
|              maml               |    1    | 0.790031 |
|             hf_Bart             |    1    | 0.783455 |
|             yolov3              |    1    | 0.779139 |
|          BERT_pytorch           |    1    | 0.777332 |
|          hf_DistilBert          |    1    | 0.776254 |
|            resnet50             |    1    | 0.769538 |
|             hf_GPT2             |    1    | 0.768425 |
|           timm_regnet           |    1    | 0.76139  |
|               drq               |    1    | 0.759793 |
|            resnet18             |    1    | 0.758696 |
|           timm_vovnet           |    1    | 0.75816  |
|           densenet121           |    1    | 0.756266 |
|              hf_T5              |    1    | 0.749775 |
|           hf_Reformer           |    1    | 0.744367 |
|      functorch_dp_cifar10       |    1    | 0.741531 |
|             alexnet             |    1    | 0.739813 |
|  timm_vision_transformer_large  |    1    | 0.732592 |
|              vgg16              |    1    | 0.720029 |
|            resnet152            |    1    | 0.692309 |
|     nvidia_deeprecommender      |    1    | 0.672518 |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26137.881712 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11777.230781 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11190.609251 |
|          hf_GPT2_large          |    1    | 10129.528218 |
|           hf_T5_large           |    1    | 7472.327057  |
|            moondream            |    1    | 7358.782113  |
|        hf_distil_whisper        |    1    | 7037.755077  |
|       Background_Matting        |    1    | 6976.740222  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5670.871523  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 5043.688544  |
|          pytorch_unet           |    1    | 4859.208827  |
|  timm_vision_transformer_large  |    1    | 2781.707397  |
|    detectron2_fcos_r_50_fpn     |    1    | 2534.275264  |
|             demucs              |    1    | 2422.112907  |
|         pytorch_stargan         |   16    | 2016.282484  |
|          hf_Bert_large          |    1    | 1762.922235  |
|       doctr_det_predictor       |    1    | 1743.995302  |
|           hf_BigBird            |    1    | 1474.281692  |
|      torch_multimodal_clip      |    1    | 1246.833932  |
|          hf_Longformer          |    1    | 1113.843301  |
|             hf_Bart             |    1    |  881.152666  |
|              hf_T5              |    1    |  762.84413   |
|             hf_Bert             |    1    |  679.665028  |
|       speech_transformer        |    1    |  674.885484  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  628.985311  |
|            hf_Albert            |    1    |  575.250829  |
|          fastNLP_Bert           |    1    |  528.39548   |
|             yolov3              |    1    |  430.729258  |
|          hf_DistilBert          |    1    |  418.038052  |
|           hf_Reformer           |    1    |  415.382759  |
|             hf_GPT2             |    1    |  354.954846  |
|        basic_gnn_edgecnn        |    1    |  231.924366  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  209.949031  |
|              vgg16              |    1    |  190.221336  |
|           timm_regnet           |    1    |  150.633606  |
|          BERT_pytorch           |    1    |  141.568119  |
|            resnet152            |    1    |  137.844515  |
|           timm_nfnet            |    1    |  96.880245   |
|           timm_vovnet           |    1    |  79.827295   |
|              maml               |    1    |  74.224129   |
|     timm_vision_transformer     |    1    |  59.017569   |
|     nvidia_deeprecommender      |    1    |   58.19148   |
|         resnext50_32x4d         |    1    |  57.251568   |
|           tts_angular           |    1    |  54.159359   |
|            resnet50             |    1    |  51.005111   |
|           densenet121           |    1    |  45.512955   |
|          basic_gnn_gcn          |    1    |  35.467134   |
|          timm_resnest           |    1    |  33.492335   |
|      doctr_reco_predictor       |    1    |  23.378332   |
|              llama              |    1    |  22.807483   |
|            resnet18             |    1    |  22.403165   |
|             alexnet             |    1    |  22.225803   |
|     resnet50_quantized_qat      |    1    |  18.185335   |
|          basic_gnn_gin          |    1    |  16.810777   |
|         basic_gnn_sage          |    1    |  16.516789   |
|        timm_efficientnet        |    1    |  13.532603   |
|         LearningToPaint         |    1    |   9.835196   |
|           mnasnet1_0            |    1    |   7.829349   |
|          mobilenet_v2           |    1    |   7.670396   |
|       mobilenet_v3_large        |    1    |   7.403772   |
|   mobilenet_v2_quantized_qat    |    1    |   6.935995   |
|          squeezenet1_1          |    1    |   5.969277   |
|       shufflenet_v2_x1_0        |    1    |   5.573553   |
|        phlippe_densenet         |    1    |   3.369035   |
|        soft_actor_critic        |   256   |   2.997543   |
|      functorch_dp_cifar10       |    1    |   2.459026   |
|         opacus_cifar10          |    1    |   2.404073   |
|               drq               |    1    |    1.9016    |
|              dcgan              |    1    |   1.711664   |
|         phlippe_resnet          |    1    |   1.363548   |
|     functorch_maml_omniglot     |    1    |   0.869131   |
|              dlrm               |    1    |   0.700079   |
|          maml_omniglot          |    5    |   0.565751   |
|     pyhpc_equation_of_state     |    1    |   0.044016   |
|     pyhpc_isoneutral_mixing     |    1    |   0.042489   |
|          lennard_jones          |    1    |   0.038992   |
|              moco               |    0    |     0.0      |
|         DALLE2_pytorch          |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 2.007203 |
|     MobileBertForQuestionAnswering      | 1  | 1.560904 |
|            XLNetLMHeadModel             | 1  | 1.388225 |
|            YituTechConvBert             | 1  | 1.318027 |
|         Speech2Text2ForCausalLM         | 1  | 1.31403  |
|          DistilBertForMaskedLM          | 1  | 1.301493 |
|      GPT2ForSequenceClassification      | 1  | 1.296008 |
|       DebertaForQuestionAnswering       | 1  | 1.295465 |
|     DistilBertForQuestionAnswering      | 1  | 1.290527 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.287108 |
|       BlenderbotSmallForCausalLM        | 1  | 1.276341 |
|           DebertaForMaskedLM            | 1  | 1.266231 |
|          BlenderbotForCausalLM          | 1  | 1.248739 |
|     M2M100ForConditionalGeneration      | 1  | 1.239431 |
|           PegasusForCausalLM            | 1  | 1.237531 |
|       MT5ForConditionalGeneration       | 1  | 1.236468 |
|     PegasusForConditionalGeneration     | 1  | 1.235343 |
|             XGLMForCausalLM             | 1  | 1.234946 |
|               GoogleFnet                | 1  | 1.226771 |
|            AlbertForMaskedLM            | 1  | 1.205212 |
|       AlbertForQuestionAnswering        | 1  | 1.203322 |
|               DistillGPT2               | 1  | 1.193686 |
|            TrOCRForCausalLM             | 1  | 1.177426 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.174321 |
|           ElectraForCausalLM            | 1  | 1.172597 |
|        BertForQuestionAnswering         | 1  | 1.171756 |
|          DebertaV2ForMaskedLM           | 1  | 1.166774 |
|           RobertaForCausalLM            | 1  | 1.165789 |
|                CamemBert                | 1  | 1.164245 |
|             BertForMaskedLM             | 1  | 1.164178 |
|         MegatronBertForCausalLM         | 1  | 1.163988 |
|       RobertaForQuestionAnswering       | 1  | 1.163579 |
|    LayoutLMForSequenceClassification    | 1  | 1.156027 |
|    MegatronBertForQuestionAnswering     | 1  | 1.155966 |
|       ElectraForQuestionAnswering       | 1  | 1.143751 |
|           LayoutLMForMaskedLM           | 1  | 1.143264 |
|     PLBartForConditionalGeneration      | 1  | 1.086122 |
|      MBartForConditionalGeneration      | 1  | 1.064002 |
|             BartForCausalLM             | 1  | 1.053885 |
|      BartForConditionalGeneration       | 1  | 1.052465 |
|             OPTForCausalLM              | 1  | 1.03112  |
|            PLBartForCausalLM            | 1  | 1.018992 |
|            MBartForCausalLM             | 1  | 1.012363 |
|          AllenaiLongformerBase          | 1  | 0.972227 |
|       T5ForConditionalGeneration        | 1  | 0.617536 |
|                 T5Small                 | 1  | 0.616468 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 58.917166 |
|          MobileBertForMaskedLM          | 1  | 44.228115 |
|     MobileBertForQuestionAnswering      | 1  | 43.087302 |
|     PegasusForConditionalGeneration     | 1  | 40.895582 |
|     M2M100ForConditionalGeneration      | 1  | 40.243759 |
|      MBartForConditionalGeneration      | 1  | 39.521396 |
|                 T5Small                 | 1  | 38.579033 |
|       T5ForConditionalGeneration        | 1  | 38.44578  |
|          BlenderbotForCausalLM          | 1  | 37.957656 |
|       MT5ForConditionalGeneration       | 1  | 36.112453 |
|            XLNetLMHeadModel             | 1  | 34.814988 |
|             XGLMForCausalLM             | 1  | 34.046073 |
|          DebertaV2ForMaskedLM           | 1  | 31.280951 |
| BlenderbotSmallForConditionalGeneration | 1  | 30.896981 |
|      DebertaV2ForQuestionAnswering      | 1  | 29.96944  |
|      BartForConditionalGeneration       | 1  | 29.438207 |
|         MegatronBertForCausalLM         | 1  | 28.738519 |
|            YituTechConvBert             | 1  | 28.57901  |
|     PLBartForConditionalGeneration      | 1  | 28.07441  |
|    MegatronBertForQuestionAnswering     | 1  | 27.484594 |
|             OPTForCausalLM              | 1  | 25.302815 |
|           PegasusForCausalLM            | 1  | 24.925697 |
|            MBartForCausalLM             | 1  | 24.397074 |
|            TrOCRForCausalLM             | 1  | 22.651115 |
|           DebertaForMaskedLM            | 1  | 22.36294  |
|       DebertaForQuestionAnswering       | 1  | 21.184516 |
|           RobertaForCausalLM            | 1  | 20.427098 |
|           ElectraForCausalLM            | 1  | 20.298785 |
|                CamemBert                | 1  | 20.268372 |
|          DistilBertForMaskedLM          | 1  | 19.652204 |
|       BlenderbotSmallForCausalLM        | 1  | 19.60927  |
|      GPT2ForSequenceClassification      | 1  | 19.384371 |
|           LayoutLMForMaskedLM           | 1  | 19.121758 |
|             BertForMaskedLM             | 1  | 19.102866 |
|       RobertaForQuestionAnswering       | 1  | 19.020733 |
|       ElectraForQuestionAnswering       | 1  | 19.004816 |
|         Speech2Text2ForCausalLM         | 1  | 18.858856 |
|            PLBartForCausalLM            | 1  | 18.801034 |
|    LayoutLMForSequenceClassification    | 1  | 18.77483  |
|             BartForCausalLM             | 1  | 18.55864  |
|     DistilBertForQuestionAnswering      | 1  | 18.393928 |
|        BertForQuestionAnswering         | 1  | 17.832358 |
|               GoogleFnet                | 1  | 17.540904 |
|               DistillGPT2               | 1  | 17.334439 |
|            AlbertForMaskedLM            | 1  | 14.128212 |
|       AlbertForQuestionAnswering        | 1  | 12.907623 |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.986972 |
|      MBartForConditionalGeneration      | 1  | 0.975945 |
|      GPT2ForSequenceClassification      | 1  | 0.955298 |
|          AllenaiLongformerBase          | 1  | 0.948139 |
|            MBartForCausalLM             | 1  | 0.926038 |
|            XLNetLMHeadModel             | 1  | 0.910618 |
|     PLBartForConditionalGeneration      | 1  | 0.909529 |
|       T5ForConditionalGeneration        | 1  | 0.904741 |
|                 T5Small                 | 1  | 0.904527 |
|            PLBartForCausalLM            | 1  | 0.903848 |
|       DebertaForQuestionAnswering       | 1  | 0.872822 |
|      BartForConditionalGeneration       | 1  | 0.864627 |
|       RobertaForQuestionAnswering       | 1  | 0.856448 |
|               GoogleFnet                | 1  | 0.855695 |
|    LayoutLMForSequenceClassification    | 1  | 0.848457 |
|        BertForQuestionAnswering         | 1  | 0.847614 |
|       ElectraForQuestionAnswering       | 1  | 0.841415 |
|    MegatronBertForQuestionAnswering     | 1  | 0.840212 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.835164 |
|               DistillGPT2               | 1  | 0.828741 |
|         MegatronBertForCausalLM         | 1  | 0.818679 |
|           LayoutLMForMaskedLM           | 1  | 0.817854 |
|           DebertaForMaskedLM            | 1  | 0.815492 |
|           RobertaForCausalLM            | 1  | 0.81433  |
|             BertForMaskedLM             | 1  | 0.813219 |
|         Speech2Text2ForCausalLM         | 1  | 0.812854 |
|                CamemBert                | 1  | 0.812366 |
|            YituTechConvBert             | 1  | 0.81203  |
|           ElectraForCausalLM            | 1  | 0.805305 |
|             BartForCausalLM             | 1  | 0.800897 |
|     DistilBertForQuestionAnswering      | 1  | 0.799986 |
|          BlenderbotForCausalLM          | 1  | 0.798203 |
|          DebertaV2ForMaskedLM           | 1  | 0.79693  |
|       MT5ForConditionalGeneration       | 1  | 0.787641 |
|            TrOCRForCausalLM             | 1  | 0.787359 |
|       BlenderbotSmallForCausalLM        | 1  | 0.763155 |
|           PegasusForCausalLM            | 1  | 0.750234 |
|          DistilBertForMaskedLM          | 1  | 0.745749 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.738604 |
|     MobileBertForQuestionAnswering      | 1  | 0.732979 |
|     PegasusForConditionalGeneration     | 1  | 0.715129 |
|     M2M100ForConditionalGeneration      | 1  | 0.706246 |
|          MobileBertForMaskedLM          | 1  | 0.702934 |
|             XGLMForCausalLM             | 1  | 0.698073 |
|            AlbertForMaskedLM            | 1  | 0.448366 |
|       AlbertForQuestionAnswering        | 1  | 0.442911 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12761.437502 |
|       AlbertForQuestionAnswering        | 1  | 12705.225061 |
|      MBartForConditionalGeneration      | 1  | 6134.564803  |
|      BartForConditionalGeneration       | 1  | 5671.367795  |
|             OPTForCausalLM              | 1  | 5223.490498  |
|          DebertaV2ForMaskedLM           | 1  | 5052.843121  |
|      DebertaV2ForQuestionAnswering      | 1  |  3969.45806  |
|            XLNetLMHeadModel             | 1  | 3115.640729  |
|            MBartForCausalLM             | 1  | 3045.411525  |
|          BlenderbotForCausalLM          | 1  | 2630.561899  |
|             BartForCausalLM             | 1  | 2554.274487  |
|                 T5Small                 | 1  | 2498.964583  |
|       T5ForConditionalGeneration        | 1  | 2490.276333  |
|          AllenaiLongformerBase          | 1  | 2409.531974  |
|     PLBartForConditionalGeneration      | 1  | 2180.273879  |
|         MegatronBertForCausalLM         | 1  | 2038.305949  |
|    MegatronBertForQuestionAnswering     | 1  | 1874.726858  |
|      GPT2ForSequenceClassification      | 1  | 1327.222911  |
|            PLBartForCausalLM            | 1  | 1224.548807  |
|             XGLMForCausalLM             | 1  |  834.494703  |
|           DebertaForMaskedLM            | 1  |  786.295053  |
|           RobertaForCausalLM            | 1  |  777.854548  |
|     M2M100ForConditionalGeneration      | 1  |  716.713398  |
|           LayoutLMForMaskedLM           | 1  |  697.199825  |
|                CamemBert                | 1  |  694.529811  |
|            YituTechConvBert             | 1  |  685.397423  |
|             BertForMaskedLM             | 1  |  685.098671  |
|     PegasusForConditionalGeneration     | 1  |  608.487733  |
|            TrOCRForCausalLM             | 1  |  587.688411  |
|       DebertaForQuestionAnswering       | 1  |  559.636724  |
|    LayoutLMForSequenceClassification    | 1  |  544.794512  |
|        BertForQuestionAnswering         | 1  |  544.76337   |
|       RobertaForQuestionAnswering       | 1  |  543.520431  |
|               DistillGPT2               | 1  |  502.577963  |
|               GoogleFnet                | 1  |  474.601263  |
|       MT5ForConditionalGeneration       | 1  |  302.538529  |
|           PegasusForCausalLM            | 1  |  301.102374  |
| BlenderbotSmallForConditionalGeneration | 1  |  146.344606  |
|           ElectraForCausalLM            | 1  |  137.428577  |
|          DistilBertForMaskedLM          | 1  |  100.481107  |
|       ElectraForQuestionAnswering       | 1  |  95.037593   |
|       BlenderbotSmallForCausalLM        | 1  |  85.301327   |
|          MobileBertForMaskedLM          | 1  |  67.223891   |
|     DistilBertForQuestionAnswering      | 1  |  64.375724   |
|     MobileBertForQuestionAnswering      | 1  |  41.134882   |
|         Speech2Text2ForCausalLM         | 1  |  19.029331   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.38326  |
|          inception_v3           | 1  | 2.21026  |
|           dm_nfnet_f0           | 1  | 2.194079 |
|       gluon_inception_v3        | 1  | 2.191075 |
|        adv_inception_v3         | 1  | 2.163989 |
|            nfnet_l0             | 1  | 2.159925 |
|            repvgg_a2            | 1  | 1.963008 |
|         mobilenetv2_100         | 1  | 1.937837 |
|            hrnet_w18            | 1  | 1.912222 |
|          spnasnet_100           | 1  | 1.880774 |
|           mnasnet_100           | 1  | 1.835532 |
|           fbnetc_100            | 1  | 1.817924 |
|            levit_128            | 1  | 1.800316 |
|          ghostnet_100           | 1  | 1.785819 |
|            lcnet_050            | 1  | 1.769484 |
|           selecsls42b           | 1  | 1.760683 |
|      mobilenetv3_large_100      | 1  | 1.747816 |
|           regnety_002           | 1  | 1.690719 |
|             dla102              | 1  | 1.680457 |
|        ese_vovnet19b_dw         | 1  | 1.665581 |
|          botnet26t_256          | 1  | 1.657728 |
|           rexnet_100            | 1  | 1.654606 |
|       tf_efficientnet_b0        | 1  | 1.654026 |
|       eca_botnext26ts_256       | 1  | 1.648908 |
|            fbnetv3_b            | 1  | 1.632637 |
|           resnest101e           | 1  | 1.61466  |
|          cspdarknet53           | 1  | 1.587778 |
|            tinynet_a            | 1  | 1.538293 |
|        eca_halonext26ts         | 1  | 1.52184  |
|         poolformer_m36          | 1  | 1.509878 |
|           res2next50            | 1  | 1.505273 |
|        res2net50_14w_8s         | 1  | 1.480221 |
|           volo_d1_224           | 1  | 1.467423 |
|        res2net101_26w_4s        | 1  | 1.455862 |
|           mobilevit_s           | 1  | 1.427631 |
|         visformer_small         | 1  | 1.390189 |
|           convit_base           | 1  | 1.377563 |
|          gmixer_24_224          | 1  | 1.349723 |
|     swsl_resnext101_32x16d      | 1  | 1.340148 |
|           tf_mixnet_l           | 1  | 1.325482 |
|        twins_pcpvt_base         | 1  | 1.296659 |
|            gernet_l             | 1  | 1.288951 |
|      beit_base_patch16_224      | 1  | 1.272811 |
|  swin_base_patch4_window7_224   | 1  | 1.262839 |
|          resmlp_12_224          | 1  | 1.247721 |
|        convmixer_768_32         | 1  | 1.239753 |
|          mixer_b16_224          | 1  | 1.207861 |
|      vit_base_patch16_224       | 1  | 1.197221 |
| deit_base_distilled_patch16_224 | 1  | 1.190128 |
|      xcit_large_24_p8_224       | 1  | 1.186704 |
|             dpn107              | 1  | 1.185392 |
|            mixnet_l             | 1  | 1.173266 |
|          jx_nest_base           | 1  | 1.162481 |
|            pit_b_224            | 1  | 1.150281 |
|          convnext_base          | 1  | 1.137985 |
|          gmlp_s16_224           | 1  | 1.133796 |
|        tnt_s_patch16_224        | 1  | 1.131366 |
|         crossvit_9_240          | 1  | 1.128546 |
|        sebotnet33ts_256         | 1  | 1.081858 |
|          cait_m36_384           | 1  | 0.979508 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|       eca_botnext26ts_256       | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          cait_m36_384           | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
|        eca_halonext26ts         | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|          botnet26t_256          | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|        convmixer_768_32         | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|        sebotnet33ts_256         | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|         coat_lite_mini          | 1  |  fail_to_run  |
|           resnest101e           | 1  | fail_accuracy |
|            levit_128            | 1  | fail_accuracy |
|          ghostnet_100           | 1  | fail_accuracy |
|          pnasnet5large          | 1  | fail_accuracy |
|        res2net101_26w_4s        | 1  | fail_accuracy |
|        res2net50_14w_8s         | 1  | fail_accuracy |
|             dla102              | 1  | fail_accuracy |
|     swsl_resnext101_32x16d      | 1  | fail_accuracy |
|           rexnet_100            | 1  | fail_accuracy |
|           selecsls42b           | 1  | fail_accuracy |
|           res2next50            | 1  | fail_accuracy |
|            hrnet_w18            | 1  | fail_accuracy |
|           volo_d1_224           | 1  | fail_accuracy |
|         visformer_small         | 1  | fail_accuracy |
|      xcit_large_24_p8_224       | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|          pnasnet5large          | 1  | 80.787573 |
|  swin_base_patch4_window7_224   | 1  | 80.299407 |
|           tf_mixnet_l           | 1  | 68.432707 |
|             dpn107              | 1  | 62.652177 |
|        twins_pcpvt_base         | 1  | 60.978817 |
|           mobilevit_s           | 1  | 58.355296 |
|          jx_nest_base           | 1  | 58.087115 |
|        res2net50_14w_8s         | 1  | 56.85012  |
|           rexnet_100            | 1  | 55.354391 |
|      xcit_large_24_p8_224       | 1  | 54.201428 |
|          cait_m36_384           | 1  | 53.475792 |
|          ghostnet_100           | 1  | 52.903179 |
|            mixnet_l             | 1  | 50.838875 |
|        sebotnet33ts_256         | 1  | 50.385564 |
|         poolformer_m36          | 1  | 50.189673 |
|            levit_128            | 1  | 49.122267 |
|           dm_nfnet_f0           | 1  | 48.346148 |
|        eca_halonext26ts         | 1  | 48.232973 |
|         crossvit_9_240          | 1  | 48.058312 |
|        tnt_s_patch16_224        | 1  | 47.022488 |
|           volo_d1_224           | 1  | 42.640768 |
|       eca_botnext26ts_256       | 1  | 41.094577 |
|            hrnet_w18            | 1  | 40.280803 |
|        res2net101_26w_4s        | 1  | 39.600658 |
|       tf_efficientnet_b0        | 1  | 39.339729 |
|            nfnet_l0             | 1  | 37.75466  |
|          convnext_base          | 1  | 37.703722 |
|           resnest101e           | 1  | 37.439741 |
|          inception_v3           | 1  | 35.888554 |
|       gluon_inception_v3        | 1  | 35.874847 |
|        adv_inception_v3         | 1  | 35.815695 |
|            tinynet_a            | 1  | 34.407138 |
|           res2next50            | 1  | 34.260929 |
|            pit_b_224            | 1  | 33.964773 |
|           convit_base           | 1  | 30.915312 |
|          botnet26t_256          | 1  | 30.14754  |
|          cspdarknet53           | 1  | 27.583846 |
|            fbnetv3_b            | 1  | 26.778388 |
|             dla102              | 1  | 26.766052 |
|          gmlp_s16_224           | 1  | 26.181624 |
|          gmixer_24_224          | 1  | 25.251187 |
|         visformer_small         | 1  | 24.468452 |
|        ese_vovnet19b_dw         | 1  | 23.467762 |
|      mobilenetv3_large_100      | 1  | 23.284573 |
|      vit_base_patch16_224       | 1  | 21.54108  |
| deit_base_distilled_patch16_224 | 1  | 20.453949 |
|      beit_base_patch16_224      | 1  | 20.219415 |
|          mixer_b16_224          | 1  | 19.28561  |
|           regnety_002           | 1  | 19.007273 |
|            repvgg_a2            | 1  | 17.315887 |
|          resmlp_12_224          | 1  | 17.100428 |
|        convmixer_768_32         | 1  | 17.092036 |
|           selecsls42b           | 1  | 16.510381 |
|            lcnet_050            | 1  | 14.088413 |
|     swsl_resnext101_32x16d      | 1  | 12.786179 |
|            gernet_l             | 1  | 10.911341 |
|           fbnetc_100            | 1  | 10.897657 |
|          spnasnet_100           | 1  | 10.843895 |
|         mobilenetv2_100         | 1  | 10.511701 |
|           mnasnet_100           | 1  | 10.328881 |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.945877 |
|          pnasnet5large          | 1  | 0.928816 |
|        convmixer_768_32         | 1  | 0.918431 |
|            nfnet_l0             | 1  | 0.900557 |
|      xcit_large_24_p8_224       | 1  | 0.890424 |
|        ese_vovnet19b_dw         | 1  | 0.88765  |
|         mobilenetv2_100         | 1  | 0.883335 |
|           mnasnet_100           | 1  | 0.87652  |
|            fbnetv3_b            | 1  | 0.876491 |
|       tf_efficientnet_b0        | 1  | 0.874393 |
|          spnasnet_100           | 1  | 0.874351 |
|           fbnetc_100            | 1  | 0.867748 |
|      mobilenetv3_large_100      | 1  | 0.867549 |
|            tinynet_a            | 1  |  0.8632  |
|       eca_botnext26ts_256       | 1  | 0.862757 |
|           rexnet_100            | 1  | 0.858443 |
|            lcnet_050            | 1  | 0.858215 |
|         poolformer_m36          | 1  | 0.85436  |
|        eca_halonext26ts         | 1  | 0.853819 |
|           dm_nfnet_f0           | 1  | 0.85375  |
|           tf_mixnet_l           | 1  | 0.849728 |
|           mobilevit_s           | 1  | 0.849173 |
|            mixnet_l             | 1  | 0.848645 |
|          ghostnet_100           | 1  | 0.846787 |
|           regnety_002           | 1  | 0.842721 |
|          botnet26t_256          | 1  | 0.840106 |
|          resmlp_12_224          | 1  | 0.828116 |
|         visformer_small         | 1  | 0.819979 |
|           res2next50            | 1  | 0.816783 |
|            levit_128            | 1  | 0.808691 |
|             dpn107              | 1  | 0.802665 |
|          convnext_base          | 1  | 0.801016 |
|        sebotnet33ts_256         | 1  | 0.798647 |
|            hrnet_w18            | 1  | 0.798311 |
|          cspdarknet53           | 1  | 0.794315 |
|        res2net50_14w_8s         | 1  | 0.793889 |
|          gmlp_s16_224           | 1  | 0.792238 |
|          gmixer_24_224          | 1  | 0.791797 |
|           volo_d1_224           | 1  | 0.785314 |
|        tnt_s_patch16_224        | 1  | 0.78198  |
|           convit_base           | 1  | 0.781171 |
|         crossvit_9_240          | 1  | 0.780221 |
|          mixer_b16_224          | 1  | 0.776678 |
|             dla102              | 1  | 0.775931 |
|        twins_pcpvt_base         | 1  | 0.772573 |
|           resnest101e           | 1  | 0.771929 |
|      beit_base_patch16_224      | 1  | 0.771707 |
|          jx_nest_base           | 1  |  0.7681  |
| deit_base_distilled_patch16_224 | 1  | 0.762769 |
|       gluon_inception_v3        | 1  | 0.762458 |
|          inception_v3           | 1  | 0.762037 |
|        adv_inception_v3         | 1  | 0.761912 |
|      vit_base_patch16_224       | 1  | 0.759769 |
|            pit_b_224            | 1  | 0.750239 |
|           selecsls42b           | 1  | 0.739746 |
|  swin_base_patch4_window7_224   | 1  | 0.738971 |
|        res2net101_26w_4s        | 1  | 0.738318 |
|            gernet_l             | 1  | 0.735659 |
|            repvgg_a2            | 1  | 0.691912 |
|     swsl_resnext101_32x16d      | 1  | 0.64072  |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 3649.67282  |
|      xcit_large_24_p8_224       | 1  | 1541.409279 |
|     swsl_resnext101_32x16d      | 1  | 443.514396  |
|          pnasnet5large          | 1  | 370.567332  |
|          convnext_base          | 1  | 308.760887  |
|             dpn107              | 1  | 264.056469  |
|        convmixer_768_32         | 1  | 247.478925  |
|          jx_nest_base           | 1  |  236.06264  |
|      beit_base_patch16_224      | 1  | 199.397503  |
| deit_base_distilled_patch16_224 | 1  | 198.646907  |
|  swin_base_patch4_window7_224   | 1  | 198.554951  |
|           convit_base           | 1  | 196.547249  |
|      vit_base_patch16_224       | 1  | 196.486835  |
|            pit_b_224            | 1  | 171.964088  |
|           resnest101e           | 1  | 165.691113  |
|           dm_nfnet_f0           | 1  | 159.314015  |
|          mixer_b16_224          | 1  | 140.379452  |
|         poolformer_m36          | 1  | 139.255792  |
|        res2net101_26w_4s        | 1  | 114.200643  |
|        twins_pcpvt_base         | 1  | 108.524147  |
|        tnt_s_patch16_224        | 1  |  96.648139  |
|           volo_d1_224           | 1  |  96.198759  |
|            nfnet_l0             | 1  |  95.384013  |
|             dla102              | 1  |  91.542328  |
|            hrnet_w18            | 1  |  86.250383  |
|        sebotnet33ts_256         | 1  |  85.933157  |
|          cspdarknet53           | 1  |  83.798467  |
|        adv_inception_v3         | 1  |  74.051055  |
|          inception_v3           | 1  |  74.022911  |
|       gluon_inception_v3        | 1  |  73.764675  |
|          gmlp_s16_224           | 1  |  70.765807  |
|         visformer_small         | 1  |  67.489728  |
|        res2net50_14w_8s         | 1  |  65.591764  |
|          gmixer_24_224          | 1  |  63.578645  |
|            repvgg_a2            | 1  |  63.57077   |
|           res2next50            | 1  |  60.25833   |
|            gernet_l             | 1  |  57.248432  |
|          botnet26t_256          | 1  |  44.895153  |
|           selecsls42b           | 1  |  44.885264  |
|        eca_halonext26ts         | 1  |  44.573452  |
|           mobilevit_s           | 1  |  42.534105  |
|       eca_botnext26ts_256       | 1  |  40.777776  |
|          resmlp_12_224          | 1  |  36.413883  |
|         crossvit_9_240          | 1  |  34.626162  |
|            mixnet_l             | 1  |  31.850689  |
|        ese_vovnet19b_dw         | 1  |  31.674395  |
|           tf_mixnet_l           | 1  |  30.95598   |
|            fbnetv3_b            | 1  |  16.09031   |
|       tf_efficientnet_b0        | 1  |  13.883467  |
|           rexnet_100            | 1  |  13.850127  |
|            tinynet_a            | 1  |  12.663388  |
|           fbnetc_100            | 1  |  9.535886   |
|            levit_128            | 1  |  9.452213   |
|          ghostnet_100           | 1  |  9.210042   |
|          spnasnet_100           | 1  |   8.46664   |
|           mnasnet_100           | 1  |  7.850726   |
|         mobilenetv2_100         | 1  |  7.670022   |
|      mobilenetv3_large_100      | 1  |  7.443259   |
|           regnety_002           | 1  |  6.553789   |
|            lcnet_050            | 1  |  2.571441   |
+---------------------------------+----+-------------+

@WeizhuoZhang-intel
Copy link
Contributor

[amp] Performance Dashboard for amp precision -- Single-Socket Multi-threads (2024-05-05 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 6d30803d64953955df63da56833bf4eb52249aae
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+06ad737
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c7i.metal-24xl
CPU Model Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory 192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS Ubuntu 22.04.3 LTS
Kernel 6.2.0-1017-aws
Microcode 0x2b0004d0
GCC gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.8.18
OpenSSL OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000" 

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 89%, 70/79 | 100%, 46/46 | 98%, 59/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.94x    |    2.05x    |    2.45x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   20.45    |    23.32    |    33.02    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.92x    |    0.97x    |    0.98x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 11.576724 |
|          timm_resnest           |   32    | 4.764342  |
|            resnet50             |   32    | 4.608626  |
|          squeezenet1_1          |   16    | 4.483705  |
|         resnext50_32x4d         |    8    | 4.247424  |
|          mobilenet_v2           |   16    | 4.082127  |
|         phlippe_resnet          |   128   | 4.036549  |
|            resnet18             |    8    | 3.931702  |
|              vgg16              |    4    | 3.925173  |
|           mnasnet1_0            |   32    | 3.837075  |
|            resnet152            |   32    | 3.615916  |
|             yolov3              |    8    | 3.385664  |
|       mobilenet_v3_large        |   32    | 3.345152  |
|           timm_vovnet           |   32    | 3.297458  |
|             alexnet             |   128   | 3.213263  |
|       shufflenet_v2_x1_0        |   64    | 2.783262  |
|          maml_omniglot          |    5    | 2.584624  |
|           timm_regnet           |   32    |  2.51227  |
|        phlippe_densenet         |   128   | 2.493329  |
|        soft_actor_critic        |   256   | 2.403879  |
|             hf_GPT2             |    1    | 2.397515  |
|          lennard_jones          |  1000   | 2.260501  |
|              llama              |   32    |  2.23287  |
|           timm_nfnet            |   128   | 2.209671  |
|        timm_efficientnet        |   64    |  2.1757   |
|           densenet121           |   64    | 2.171094  |
|             hf_Bert             |    1    | 2.115558  |
|          pytorch_unet           |    1    | 2.068422  |
|              dcgan              |   256   | 2.029361  |
|     functorch_maml_omniglot     |    1    | 2.028442  |
|          hf_DistilBert          |    1    | 2.024956  |
|          hf_Bert_large          |    1    | 1.986563  |
|               drq               |    1    | 1.941345  |
|           hf_T5_base            |    1    | 1.916479  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.914229  |
|         LearningToPaint         |   96    | 1.905128  |
|          BERT_pytorch           |    2    | 1.900527  |
|             hf_Bart             |    1    | 1.880913  |
|      doctr_reco_predictor       |    1    |  1.81104  |
|          fastNLP_Bert           |    1    | 1.747143  |
|              hf_T5              |    1    | 1.714274  |
|            moondream            |    1    | 1.684984  |
|       Background_Matting        |    1    | 1.626241  |
|           hf_T5_large           |    1    | 1.610611  |
|            hf_Albert            |    1    | 1.600384  |
|          hf_GPT2_large          |    1    | 1.578175  |
|        basic_gnn_edgecnn        |    1    | 1.506559  |
|     timm_vision_transformer     |   32    | 1.487965  |
|        hf_distil_whisper        |    1    | 1.422397  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.394632  |
|         pytorch_stargan         |   16    | 1.368486  |
|          hf_Longformer          |    1    | 1.348252  |
|          basic_gnn_gin          |    1    | 1.270381  |
|       speech_transformer        |    1    | 1.263245  |
|           hf_Reformer           |    1    | 1.245623  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.229075  |
|         basic_gnn_sage          |    1    |  1.21835  |
| detectron2_fasterrcnn_r_101_c4  |    1    |  1.20278  |
|          basic_gnn_gcn          |    1    | 1.179485  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.156877  |
|     nvidia_deeprecommender      |   256   | 1.138242  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.105604  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.063052  |
|  timm_vision_transformer_large  |   32    |  1.06185  |
|              dlrm               |  2048   | 1.056382  |
|             demucs              |    1    | 1.048215  |
|           hf_BigBird            |    1    | 1.036983  |
|   mobilenet_v2_quantized_qat    |   96    | 1.003567  |
|           tts_angular           |   64    | 0.988215  |
|     resnet50_quantized_qat      |   32    | 0.974057  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.859968  |
|      torch_multimodal_clip      |   32    | 0.790673  |
|              maml               |    1    | 0.749597  |
|         opacus_cifar10          |   64    | 0.469995  |
|      functorch_dp_cifar10       |   64    |  0.45181  |
|              moco               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
|       doctr_det_predictor       |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|        basic_gnn_edgecnn        |    1    |        pass        |
|           densenet121           |    4    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    4    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|              dlrm               |    4    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|        hf_distil_whisper        |    4    |        pass        |
|              llama              |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|           hf_T5_base            |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|               drq               |    1    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|             yolov3              |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|            hf_Albert            |    4    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|            moondream            |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|            resnet152            |    4    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|       speech_transformer        |    1    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|              moco               |    0    | model_fail_to_load |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
|        timm_efficientdet        |    0    | model_fail_to_load |
|           Super_SloMo           |    4    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_101_fpn |    4    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    4    |    fail_to_run     |
| detectron2_fasterrcnn_r_101_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    4    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    4    |   fail_accuracy    |
|      doctr_reco_predictor       |    4    |   fail_accuracy    |
|       doctr_det_predictor       |    0    | eager_fail_to_run  |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           densenet121           |   64    | 76.273116 |
|           hf_BigBird            |    1    | 67.904981 |
|    detectron2_fcos_r_50_fpn     |    1    | 58.646024 |
|           hf_T5_large           |    1    | 51.639702 |
|              maml               |    1    | 48.722071 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 42.911971 |
|           timm_nfnet            |   128   | 41.644396 |
|          hf_Longformer          |    1    | 40.344601 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 40.245228 |
|            moondream            |    1    | 38.78429  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 37.794033 |
|           hf_Reformer           |    1    | 37.078923 |
|      torch_multimodal_clip      |   32    | 37.054972 |
|          hf_GPT2_large          |    1    | 35.912779 |
|  timm_vision_transformer_large  |   32    | 35.695378 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 35.051187 |
|       speech_transformer        |    1    | 34.216882 |
|        phlippe_densenet         |   128   | 34.178822 |
|           hf_T5_base            |    1    | 34.06111  |
|             demucs              |    1    | 29.911709 |
|        timm_efficientnet        |   64    | 28.111696 |
|        hf_distil_whisper        |    1    | 27.815134 |
|             yolov3              |    8    | 25.206209 |
|              hf_T5              |    1    | 24.29671  |
|         opacus_cifar10          |   64    | 22.665683 |
|     pyhpc_isoneutral_mixing     | 1048576 | 21.976239 |
|      functorch_dp_cifar10       |   64    | 21.558959 |
|          timm_resnest           |   32    | 21.260488 |
|          hf_Bert_large          |    1    | 21.124902 |
|              llama              |   32    | 20.785759 |
|          BERT_pytorch           |    2    | 20.11973  |
|             hf_GPT2             |    1    | 19.976654 |
|       shufflenet_v2_x1_0        |   64    | 19.914661 |
|       Background_Matting        |    1    | 19.20081  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 19.028382 |
|       mobilenet_v3_large        |   32    | 18.475112 |
|     timm_vision_transformer     |   32    | 18.377759 |
|           timm_regnet           |   32    | 17.535691 |
|           timm_vovnet           |   32    | 17.412655 |
|          fastNLP_Bert           |    1    | 17.282513 |
|          pytorch_unet           |    1    | 17.025836 |
|             hf_Bart             |    1    | 16.982549 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 15.499249 |
|            hf_Albert            |    1    | 14.970149 |
|             hf_Bert             |    1    | 14.69258  |
|          squeezenet1_1          |   16    | 13.482797 |
|         pytorch_stargan         |   16    | 13.374562 |
|          hf_DistilBert          |    1    | 12.483495 |
|            resnet152            |   32    | 12.136101 |
|      doctr_reco_predictor       |    1    | 11.216162 |
|          basic_gnn_gcn          |    1    | 10.775039 |
|              vgg16              |    4    | 10.428032 |
|         basic_gnn_sage          |    1    | 10.052773 |
|               drq               |    1    | 8.943086  |
|             alexnet             |   128   | 8.926685  |
|            resnet50             |   32    |  8.78027  |
|         resnext50_32x4d         |    8    | 8.715822  |
|     nvidia_deeprecommender      |   256   |  7.97669  |
|              dlrm               |  2048   | 7.953215  |
|     functorch_maml_omniglot     |    1    | 7.949695  |
|            resnet18             |    8    | 7.637055  |
|          maml_omniglot          |    5    | 7.608347  |
|          mobilenet_v2           |   16    |  7.60528  |
|          basic_gnn_gin          |    1    | 7.497693  |
|           mnasnet1_0            |   32    | 7.331612  |
|     pyhpc_equation_of_state     | 1048576 | 7.025399  |
|        basic_gnn_edgecnn        |    1    | 6.731152  |
|         LearningToPaint         |   96    | 6.721809  |
|        soft_actor_critic        |   256   | 6.440044  |
|         phlippe_resnet          |   128   | 6.185603  |
|          lennard_jones          |  1000   | 5.374627  |
|              dcgan              |   256   | 4.800045  |
|           tts_angular           |   64    | 4.675723  |
|   mobilenet_v2_quantized_qat    |   96    | 0.202013  |
|     resnet50_quantized_qat      |   32    | 0.180673  |
|        timm_efficientdet        |    0    |    0.0    |
|       doctr_det_predictor       |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |  2048   | 0.988068 |
|           timm_nfnet            |   128   | 0.986952 |
|           hf_T5_base            |    1    | 0.982697 |
|             demucs              |    1    | 0.982475 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.978335 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.97737  |
|        timm_efficientnet        |   64    | 0.976659 |
|           timm_regnet           |   32    | 0.975012 |
|             yolov3              |    8    | 0.971834 |
|              llama              |   32    | 0.970224 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.970082 |
|          pytorch_unet           |    1    | 0.968712 |
|           densenet121           |   64    | 0.968304 |
|         LearningToPaint         |   96    | 0.968216 |
|            resnet152            |   32    | 0.964083 |
|             alexnet             |   128   | 0.963211 |
|           timm_vovnet           |   32    | 0.962365 |
|            resnet50             |   32    | 0.961949 |
|        basic_gnn_edgecnn        |    1    | 0.958886 |
|           hf_BigBird            |    1    | 0.958795 |
|     resnet50_quantized_qat      |   32    | 0.956915 |
|   mobilenet_v2_quantized_qat    |   96    | 0.955676 |
|     timm_vision_transformer     |   32    | 0.955083 |
|       Background_Matting        |    1    | 0.953343 |
|          timm_resnest           |   32    | 0.95277  |
|           mnasnet1_0            |   32    | 0.946242 |
|          basic_gnn_gcn          |    1    | 0.94363  |
|      torch_multimodal_clip      |   32    | 0.94251  |
|          mobilenet_v2           |   16    | 0.940123 |
|  timm_vision_transformer_large  |   32    | 0.938467 |
|       shufflenet_v2_x1_0        |   64    | 0.937576 |
|         resnext50_32x4d         |    8    | 0.936488 |
|     pyhpc_equation_of_state     | 1048576 | 0.935536 |
|       mobilenet_v3_large        |   32    | 0.93507  |
|          basic_gnn_gin          |    1    | 0.932359 |
|      doctr_reco_predictor       |    1    | 0.93104  |
|         basic_gnn_sage          |    1    | 0.931003 |
|       speech_transformer        |    1    | 0.928576 |
|             hf_Bert             |    1    | 0.928537 |
|         pytorch_stargan         |   16    | 0.927682 |
|     nvidia_deeprecommender      |   256   | 0.92207  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.920684 |
|          fastNLP_Bert           |    1    | 0.920182 |
|        phlippe_densenet         |   128   | 0.920121 |
|            moondream            |    1    | 0.917501 |
|          BERT_pytorch           |    2    | 0.916435 |
|        hf_distil_whisper        |    1    | 0.913744 |
|            hf_Albert            |    1    | 0.910993 |
|              hf_T5              |    1    | 0.909378 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.906768 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.90561  |
|              dcgan              |   256   | 0.905101 |
|             hf_GPT2             |    1    | 0.904008 |
|               drq               |    1    | 0.901408 |
|          squeezenet1_1          |   16    | 0.900169 |
|          hf_DistilBert          |    1    | 0.896975 |
|          hf_Longformer          |    1    | 0.896254 |
|         opacus_cifar10          |   64    | 0.89127  |
|          hf_GPT2_large          |    1    | 0.89125  |
|          hf_Bert_large          |    1    | 0.889963 |
|             hf_Bart             |    1    | 0.889359 |
|        soft_actor_critic        |   256   | 0.885675 |
|           tts_angular           |   64    | 0.88427  |
|              vgg16              |    4    | 0.882075 |
|         phlippe_resnet          |   128   | 0.878116 |
|          lennard_jones          |  1000   | 0.866102 |
|     functorch_maml_omniglot     |    1    | 0.865913 |
|          maml_omniglot          |    5    | 0.864734 |
|      functorch_dp_cifar10       |   64    | 0.863462 |
|           hf_T5_large           |    1    | 0.847386 |
|            resnet18             |    8    | 0.827442 |
|           hf_Reformer           |    1    | 0.820867 |
|              maml               |    1    | 0.782127 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.756744 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.568589 |
|        timm_efficientdet        |    0    |   0.0    |
|       doctr_det_predictor       |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
|              moco               |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|  timm_vision_transformer_large  |   32    | 865.06952  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 790.80172  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 780.925685 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 695.540222 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 669.97729  |
|           hf_T5_base            |    1    | 428.387525 |
|          hf_GPT2_large          |    1    |  149.5138  |
|           hf_T5_large           |    1    | 126.137914 |
|           timm_nfnet            |   128   | 119.390418 |
|            moondream            |    1    | 114.122232 |
|        hf_distil_whisper        |    1    | 92.600931  |
|    detectron2_fcos_r_50_fpn     |    1    | 82.757948  |
|           hf_BigBird            |    1    | 80.501432  |
|      torch_multimodal_clip      |   32    | 79.469548  |
|       Background_Matting        |    1    | 66.591125  |
|          pytorch_unet           |    1    | 57.045238  |
|           densenet121           |   64    | 52.713227  |
|             demucs              |    1    | 50.357367  |
|              maml               |    1    |  42.9884   |
|           timm_regnet           |   32    | 41.134142  |
|          hf_Longformer          |    1    |  38.50499  |
|          hf_Bert_large          |    1    | 34.423519  |
|   mobilenet_v2_quantized_qat    |   96    | 31.267087  |
|     pyhpc_isoneutral_mixing     | 1048576 |  29.63517  |
|            resnet152            |   32    | 29.000964  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  28.1544   |
|             yolov3              |    8    | 28.001492  |
|        timm_efficientnet        |   64    |  27.36108  |
|       speech_transformer        |    1    | 23.321511  |
|           hf_Reformer           |    1    | 21.264351  |
|     nvidia_deeprecommender      |   256   |  21.06555  |
|     timm_vision_transformer     |   32    | 20.412461  |
|              hf_T5              |    1    | 18.293909  |
|             hf_Bart             |    1    | 18.248481  |
|          fastNLP_Bert           |    1    | 17.223142  |
|         pytorch_stargan         |   16    | 17.162118  |
|           timm_vovnet           |   32    | 16.474183  |
|            hf_Albert            |    1    | 14.581165  |
|         opacus_cifar10          |   64    | 14.043928  |
|             hf_Bert             |    1    |  14.02422  |
|      functorch_dp_cifar10       |   64    | 14.010162  |
|          BERT_pytorch           |    2    | 12.170417  |
|            resnet50             |   32    | 11.122274  |
|             hf_GPT2             |    1    | 11.077114  |
|     resnet50_quantized_qat      |   32    |  10.16223  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  9.584763  |
|          timm_resnest           |   32    |  8.829537  |
|              llama              |   32    |  8.817637  |
|         LearningToPaint         |   96    |  8.149741  |
|          hf_DistilBert          |    1    |  8.106487  |
|       shufflenet_v2_x1_0        |   64    |  8.069742  |
|           tts_angular           |   64    |  7.84651   |
|          basic_gnn_gcn          |    1    |  7.133902  |
|        basic_gnn_edgecnn        |    1    |  6.745735  |
|       mobilenet_v3_large        |   32    |   6.6029   |
|        phlippe_densenet         |   128   |  6.568699  |
|              vgg16              |    4    |  5.987481  |
|         resnext50_32x4d         |    8    |  5.908952  |
|             alexnet             |   128   |  5.873635  |
|           mnasnet1_0            |   32    |  5.185941  |
|              dlrm               |  2048   |  4.148682  |
|         basic_gnn_sage          |    1    |  4.044699  |
|          mobilenet_v2           |   16    |  3.713154  |
|          basic_gnn_gin          |    1    |  3.677306  |
|      doctr_reco_predictor       |    1    |  2.857985  |
|          squeezenet1_1          |   16    |  2.576863  |
|              dcgan              |   256   |  2.526371  |
|            resnet18             |    8    |  2.184933  |
|     pyhpc_equation_of_state     | 1048576 |  1.556493  |
|         phlippe_resnet          |   128   |  1.271128  |
|               drq               |    1    |  0.75756   |
|     functorch_maml_omniglot     |    1    |  0.36311   |
|          maml_omniglot          |    5    |  0.253804  |
|        soft_actor_critic        |   256   |  0.217096  |
|          lennard_jones          |  1000   |  0.13285   |
|        timm_efficientdet        |    0    |    0.0     |
|       doctr_det_predictor       |    0    |    0.0     |
|              moco               |    0    |    0.0     |
|         DALLE2_pytorch          |    0    |    0.0     |
+---------------------------------+---------+------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|            XLNetLMHeadModel             |  8  | 11.731461 |
|       ElectraForQuestionAnswering       | 64  | 4.259904  |
|           ElectraForCausalLM            | 32  | 3.645639  |
|          MobileBertForMaskedLM          | 128 | 3.305681  |
|     MobileBertForQuestionAnswering      | 128 | 3.189206  |
|    LayoutLMForSequenceClassification    | 16  | 3.097378  |
|       RobertaForQuestionAnswering       | 16  | 3.075242  |
|           LayoutLMForMaskedLM           | 16  | 3.028338  |
|        BertForQuestionAnswering         | 16  | 3.015915  |
|                CamemBert                | 16  | 2.927482  |
|             BertForMaskedLM             | 16  | 2.870373  |
|           RobertaForCausalLM            | 16  | 2.824676  |
|    MegatronBertForQuestionAnswering     |  8  | 2.666447  |
|            YituTechConvBert             | 16  | 2.483012  |
|         MegatronBertForCausalLM         |  4  | 2.198613  |
|               DistillGPT2               | 16  | 2.071661  |
|             OPTForCausalLM              |  2  |  1.94862  |
|      DebertaV2ForQuestionAnswering      |  1  | 1.936367  |
|       MT5ForConditionalGeneration       | 16  | 1.915347  |
|         Speech2Text2ForCausalLM         | 256 | 1.893171  |
|       BlenderbotSmallForCausalLM        | 64  | 1.865045  |
|                 T5Small                 |  4  | 1.859605  |
|       T5ForConditionalGeneration        |  4  | 1.838504  |
|          DistilBertForMaskedLM          | 128 | 1.838068  |
|             XGLMForCausalLM             |  8  | 1.772896  |
|           DebertaForMaskedLM            |  8  | 1.759878  |
|            PLBartForCausalLM            |  8  | 1.709524  |
|      GPT2ForSequenceClassification      |  4  |  1.70584  |
|          BlenderbotForCausalLM          |  4  |  1.67751  |
|            MBartForCausalLM             |  4  | 1.662302  |
| BlenderbotSmallForConditionalGeneration | 64  | 1.652319  |
|     DistilBertForQuestionAnswering      | 256 | 1.640947  |
|     PLBartForConditionalGeneration      |  4  | 1.610209  |
|            TrOCRForCausalLM             | 32  | 1.605923  |
|       DebertaForQuestionAnswering       | 16  | 1.565622  |
|          DebertaV2ForMaskedLM           |  2  | 1.562397  |
|           PegasusForCausalLM            | 32  | 1.513355  |
|     M2M100ForConditionalGeneration      | 16  | 1.499088  |
|     PegasusForConditionalGeneration     | 32  |  1.49525  |
|      MBartForConditionalGeneration      |  2  | 1.487402  |
|             BartForCausalLM             |  4  | 1.432271  |
|      BartForConditionalGeneration       |  2  | 1.430216  |
|               GoogleFnet                | 16  | 1.389237  |
|       AlbertForQuestionAnswering        |  4  | 1.311264  |
|            AlbertForMaskedLM            |  4  | 1.308082  |
|          AllenaiLongformerBase          |  4  | 1.055923  |
+-----------------------------------------+-----+-----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|          AllenaiLongformerBase          |  4  | 71.907724 |
|     PegasusForConditionalGeneration     | 32  | 38.868429 |
|     M2M100ForConditionalGeneration      | 16  | 38.084938 |
|      MBartForConditionalGeneration      |  2  | 36.66196  |
|          BlenderbotForCausalLM          |  4  | 36.595269 |
|          MobileBertForMaskedLM          | 128 | 36.450882 |
|     MobileBertForQuestionAnswering      | 128 | 35.421825 |
|       MT5ForConditionalGeneration       | 16  | 35.335711 |
|             XGLMForCausalLM             |  8  | 32.009945 |
|                 T5Small                 |  4  | 30.608282 |
|       T5ForConditionalGeneration        |  4  | 30.477745 |
|          DebertaV2ForMaskedLM           |  2  | 27.570767 |
|             OPTForCausalLM              |  2  | 26.800465 |
|      DebertaV2ForQuestionAnswering      |  1  | 26.348871 |
| BlenderbotSmallForConditionalGeneration | 64  | 25.76737  |
|         MegatronBertForCausalLM         |  4  | 24.669193 |
|      BartForConditionalGeneration       |  2  | 24.330731 |
|           PegasusForCausalLM            | 32  | 24.253982 |
|     PLBartForConditionalGeneration      |  4  | 24.102971 |
|    MegatronBertForQuestionAnswering     |  8  | 23.707595 |
|            MBartForCausalLM             |  4  | 22.994141 |
|      GPT2ForSequenceClassification      |  4  | 21.935138 |
|            YituTechConvBert             | 16  | 21.75016  |
|               DistillGPT2               | 16  | 20.274924 |
|            TrOCRForCausalLM             | 32  | 19.049237 |
|       DebertaForQuestionAnswering       | 16  | 18.774606 |
|           DebertaForMaskedLM            |  8  | 18.507807 |
|          DistilBertForMaskedLM          | 128 | 16.289904 |
|     DistilBertForQuestionAnswering      | 256 | 16.133248 |
|       BlenderbotSmallForCausalLM        | 64  | 15.882032 |
|         Speech2Text2ForCausalLM         | 256 | 15.597605 |
|            PLBartForCausalLM            |  8  | 15.407656 |
|               GoogleFnet                | 16  | 15.277267 |
|           RobertaForCausalLM            | 16  | 15.230578 |
|            AlbertForMaskedLM            |  4  | 14.83762  |
|       AlbertForQuestionAnswering        |  4  | 14.665031 |
|       RobertaForQuestionAnswering       | 16  | 14.577618 |
|                CamemBert                | 16  | 14.548686 |
|            XLNetLMHeadModel             |  8  | 14.48266  |
|             BartForCausalLM             |  4  | 14.445873 |
|    LayoutLMForSequenceClassification    | 16  | 14.331202 |
|           ElectraForCausalLM            | 32  | 13.798814 |
|           LayoutLMForMaskedLM           | 16  | 13.763145 |
|             BertForMaskedLM             | 16  | 13.580599 |
|        BertForQuestionAnswering         | 16  | 13.440227 |
|       ElectraForQuestionAnswering       | 64  | 13.172734 |
+-----------------------------------------+-----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  | 0.992814 |
|       AlbertForQuestionAnswering        |  4  | 0.992348 |
|            PLBartForCausalLM            |  8  | 0.992041 |
|               GoogleFnet                | 16  | 0.991072 |
|          DistilBertForMaskedLM          | 128 | 0.99067  |
|           ElectraForCausalLM            | 32  | 0.990456 |
|               DistillGPT2               | 16  | 0.990452 |
|             BertForMaskedLM             | 16  | 0.989063 |
|       ElectraForQuestionAnswering       | 64  | 0.988992 |
|            YituTechConvBert             | 16  | 0.988625 |
|             OPTForCausalLM              |  2  | 0.988037 |
|                CamemBert                | 16  | 0.987952 |
|       BlenderbotSmallForCausalLM        | 64  | 0.987499 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.987316 |
|         Speech2Text2ForCausalLM         | 256 | 0.986576 |
|           LayoutLMForMaskedLM           | 16  | 0.986129 |
|           RobertaForCausalLM            | 16  | 0.985857 |
|        BertForQuestionAnswering         | 16  | 0.985619 |
|       MT5ForConditionalGeneration       | 16  | 0.98458  |
|       DebertaForQuestionAnswering       | 16  | 0.982879 |
|     DistilBertForQuestionAnswering      | 256 | 0.982682 |
|          MobileBertForMaskedLM          | 128 | 0.982568 |
|            MBartForCausalLM             |  4  | 0.981501 |
|       T5ForConditionalGeneration        |  4  | 0.981475 |
|            TrOCRForCausalLM             | 32  | 0.981298 |
|    LayoutLMForSequenceClassification    | 16  | 0.980968 |
|       RobertaForQuestionAnswering       | 16  | 0.980588 |
|          AllenaiLongformerBase          |  4  | 0.98012  |
|            XLNetLMHeadModel             |  8  | 0.980029 |
|                 T5Small                 |  4  | 0.973196 |
|     MobileBertForQuestionAnswering      | 128 | 0.973092 |
|      GPT2ForSequenceClassification      |  4  | 0.972589 |
|           PegasusForCausalLM            | 32  | 0.969202 |
|           DebertaForMaskedLM            |  8  | 0.968556 |
|             BartForCausalLM             |  4  | 0.966973 |
|     PLBartForConditionalGeneration      |  4  | 0.964322 |
|     M2M100ForConditionalGeneration      | 16  | 0.955174 |
|         MegatronBertForCausalLM         |  4  | 0.948958 |
|    MegatronBertForQuestionAnswering     |  8  | 0.942798 |
|             XGLMForCausalLM             |  8  | 0.939147 |
|          BlenderbotForCausalLM          |  4  | 0.936138 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.93256  |
|      MBartForConditionalGeneration      |  2  | 0.92772  |
|      BartForConditionalGeneration       |  2  | 0.92567  |
|     PegasusForConditionalGeneration     | 32  | 0.919104 |
|          DebertaV2ForMaskedLM           |  2  | 0.888664 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 947.738024 |
|            AlbertForMaskedLM            |  4  | 539.854808 |
|       AlbertForQuestionAnswering        |  4  | 530.524399 |
|            XLNetLMHeadModel             |  8  | 395.727774 |
|               GoogleFnet                | 16  | 266.704853 |
|             OPTForCausalLM              |  2  | 205.979326 |
|            TrOCRForCausalLM             | 32  | 173.958574 |
|      MBartForConditionalGeneration      |  2  | 173.042085 |
|            MBartForCausalLM             |  4  | 172.87073  |
|     PegasusForConditionalGeneration     | 32  | 167.168037 |
|                 T5Small                 |  4  | 152.462714 |
|       T5ForConditionalGeneration        |  4  | 151.986563 |
|     DistilBertForQuestionAnswering      | 256 | 144.585555 |
|            PLBartForCausalLM            |  8  | 135.371546 |
|      BartForConditionalGeneration       |  2  | 132.305359 |
|    MegatronBertForQuestionAnswering     |  8  | 124.467084 |
|     PLBartForConditionalGeneration      |  4  | 122.915767 |
|          BlenderbotForCausalLM          |  4  | 121.71309  |
|          DebertaV2ForMaskedLM           |  2  | 120.668046 |
|     M2M100ForConditionalGeneration      | 16  | 118.268229 |
|            YituTechConvBert             | 16  | 118.128083 |
|       DebertaForQuestionAnswering       | 16  | 117.483857 |
|           RobertaForCausalLM            | 16  | 109.91929  |
| BlenderbotSmallForConditionalGeneration | 64  | 106.963648 |
|          DistilBertForMaskedLM          | 128 | 104.209272 |
|      GPT2ForSequenceClassification      |  4  | 99.625597  |
|             BartForCausalLM             |  4  | 96.403913  |
|             BertForMaskedLM             | 16  | 93.590347  |
|                CamemBert                | 16  | 93.035356  |
|           LayoutLMForMaskedLM           | 16  | 91.086983  |
|          MobileBertForMaskedLM          | 128 | 86.568668  |
|         MegatronBertForCausalLM         |  4  | 83.703887  |
|               DistillGPT2               | 16  |  82.77152  |
|           PegasusForCausalLM            | 32  | 80.352395  |
|             XGLMForCausalLM             |  8  | 75.701557  |
|        BertForQuestionAnswering         | 16  |  75.06484  |
|           DebertaForMaskedLM            |  8  | 73.877835  |
|    LayoutLMForSequenceClassification    | 16  |  72.85233  |
|       RobertaForQuestionAnswering       | 16  | 72.373946  |
|       MT5ForConditionalGeneration       | 16  | 65.583783  |
|       ElectraForQuestionAnswering       | 64  | 60.153373  |
|         Speech2Text2ForCausalLM         | 256 | 58.895364  |
|      DebertaV2ForQuestionAnswering      |  1  | 56.283371  |
|           ElectraForCausalLM            | 32  | 55.951632  |
|       BlenderbotSmallForCausalLM        | 64  | 53.581415  |
|     MobileBertForQuestionAnswering      | 128 | 50.770336  |
+-----------------------------------------+-----+------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           resnest101e           |  64  | 4.883251 |
|           fbnetc_100            | 512  | 4.632514 |
|           mnasnet_100           | 512  | 4.616444 |
|            lcnet_050            | 256  | 4.447456 |
|           regnety_002           | 1024 | 4.258113 |
|      mobilenetv3_large_100      | 512  |  4.2573  |
|         mobilenetv2_100         | 128  | 4.227495 |
|          cspdarknet53           |  64  | 4.198949 |
|        ese_vovnet19b_dw         | 256  | 4.112927 |
|          spnasnet_100           | 128  | 3.959384 |
|          botnet26t_256          | 128  | 3.93405  |
|           res2next50            | 128  | 3.921843 |
|            hrnet_w18            | 128  | 3.811831 |
|        res2net101_26w_4s        | 128  | 3.76801  |
|             dla102              | 128  | 3.756934 |
|          pnasnet5large          |  16  | 3.747112 |
|          inception_v3           | 128  | 3.738538 |
|        res2net50_14w_8s         | 128  | 3.724072 |
|       gluon_inception_v3        | 256  | 3.702429 |
|        adv_inception_v3         | 128  | 3.637835 |
|            gernet_l             | 128  | 3.623423 |
|            fbnetv3_b            | 256  | 3.620281 |
|     swsl_resnext101_32x16d      |  32  | 3.576697 |
|           rexnet_100            | 256  | 3.560384 |
|            nfnet_l0             | 128  | 3.196143 |
|       eca_botnext26ts_256       | 128  | 3.112879 |
|        eca_halonext26ts         | 128  | 2.763726 |
|           volo_d1_224           |  64  | 2.61139  |
|           selecsls42b           | 128  | 2.595702 |
|            repvgg_a2            | 128  | 2.552209 |
|           dm_nfnet_f0           | 128  | 2.508455 |
|            tinynet_a            | 128  | 2.496535 |
|       tf_efficientnet_b0        | 128  | 2.409386 |
|          ghostnet_100           | 512  | 2.374343 |
|         visformer_small         | 128  | 2.356535 |
|           convit_base           |  64  | 2.112764 |
|         poolformer_m36          |  64  | 2.06956  |
|        convmixer_768_32         |  32  | 2.050205 |
|             dpn107              |  64  | 1.883982 |
|          convnext_base          |  64  | 1.875573 |
|            levit_128            | 1024 | 1.851719 |
|           tf_mixnet_l           | 128  | 1.823226 |
|      xcit_large_24_p8_224       |  16  | 1.802163 |
|            mixnet_l             | 128  | 1.771799 |
|          gmlp_s16_224           | 128  | 1.643845 |
|        twins_pcpvt_base         | 128  | 1.623703 |
|  swin_base_patch4_window7_224   |  64  | 1.606819 |
|        sebotnet33ts_256         |  64  | 1.514286 |
| deit_base_distilled_patch16_224 |  64  | 1.468273 |
|      beit_base_patch16_224      |  64  | 1.442093 |
|           mobilevit_s           |  64  | 1.43308  |
|          mixer_b16_224          | 128  | 1.395517 |
|          gmixer_24_224          | 128  | 1.346142 |
|      vit_base_patch16_224       |  64  | 1.341246 |
|            pit_b_224            |  64  | 1.334035 |
|         crossvit_9_240          | 256  | 1.330054 |
|        tnt_s_patch16_224        | 128  | 1.201187 |
|          jx_nest_base           |  32  | 1.097549 |
|          resmlp_12_224          | 128  | 1.073324 |
|          cait_m36_384           |  4   | 0.825905 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 8  |     pass      |
|           mnasnet_100           | 8  |     pass      |
|          mixer_b16_224          | 8  |     pass      |
|          botnet26t_256          | 8  |     pass      |
|          cait_m36_384           | 8  |     pass      |
|           convit_base           | 8  |     pass      |
|        convmixer_768_32         | 8  |     pass      |
|          convnext_base          | 8  |     pass      |
|         crossvit_9_240          | 8  |     pass      |
|          cspdarknet53           | 8  |     pass      |
| deit_base_distilled_patch16_224 | 8  |     pass      |
|             dla102              | 8  |     pass      |
|           dm_nfnet_f0           | 8  |     pass      |
|             dpn107              | 8  |     pass      |
|       eca_botnext26ts_256       | 8  |     pass      |
|        eca_halonext26ts         | 8  |     pass      |
|        ese_vovnet19b_dw         | 8  |     pass      |
|           fbnetc_100            | 8  |     pass      |
|            fbnetv3_b            | 8  |     pass      |
|            gernet_l             | 8  |     pass      |
|          ghostnet_100           | 8  |     pass      |
|       gluon_inception_v3        | 8  |     pass      |
|          gmixer_24_224          | 8  |     pass      |
|          gmlp_s16_224           | 8  |     pass      |
|            hrnet_w18            | 8  |     pass      |
|          inception_v3           | 8  |     pass      |
|          jx_nest_base           | 8  |     pass      |
|            levit_128            | 8  |     pass      |
|      xcit_large_24_p8_224       | 8  |     pass      |
|      beit_base_patch16_224      | 8  |     pass      |
|            mixnet_l             | 8  |     pass      |
|         mobilenetv2_100         | 8  |     pass      |
|           volo_d1_224           | 8  |     pass      |
|      mobilenetv3_large_100      | 8  |     pass      |
|           mobilevit_s           | 8  |     pass      |
|            nfnet_l0             | 8  |     pass      |
|            pit_b_224            | 8  |     pass      |
|          pnasnet5large          | 8  |     pass      |
|         poolformer_m36          | 8  |     pass      |
|           regnety_002           | 8  |     pass      |
|            repvgg_a2            | 8  |     pass      |
|        res2net101_26w_4s        | 8  |     pass      |
|        res2net50_14w_8s         | 8  |     pass      |
|           res2next50            | 8  |     pass      |
|          resmlp_12_224          | 8  |     pass      |
|           resnest101e           | 8  |     pass      |
|           rexnet_100            | 8  |     pass      |
|        sebotnet33ts_256         | 8  |     pass      |
|           selecsls42b           | 8  |     pass      |
|          spnasnet_100           | 8  |     pass      |
|  swin_base_patch4_window7_224   | 8  |     pass      |
|     swsl_resnext101_32x16d      | 8  |     pass      |
|       tf_efficientnet_b0        | 8  |     pass      |
|           tf_mixnet_l           | 8  |     pass      |
|            tinynet_a            | 8  |     pass      |
|        tnt_s_patch16_224        | 8  |     pass      |
|        twins_pcpvt_base         | 8  |     pass      |
|         visformer_small         | 8  |     pass      |
|      vit_base_patch16_224       | 8  |     pass      |
|         coat_lite_mini          | 8  |  fail_to_run  |
|            lcnet_050            | 8  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+------+-----------+
|              name               |  bs  | inductor  |
+---------------------------------+------+-----------+
|          pnasnet5large          |  16  | 74.430479 |
|  swin_base_patch4_window7_224   |  64  | 70.588305 |
|           tf_mixnet_l           | 128  | 61.468459 |
|             dpn107              |  64  | 61.174648 |
|           rexnet_100            | 256  | 53.947869 |
|           mobilevit_s           |  64  | 53.017682 |
|        eca_halonext26ts         | 128  | 51.794476 |
|        sebotnet33ts_256         |  64  | 50.573972 |
|          jx_nest_base           |  32  | 49.835955 |
|            levit_128            | 1024 | 49.753272 |
|         crossvit_9_240          | 256  | 49.123563 |
|        res2net50_14w_8s         | 128  | 48.967352 |
|          ghostnet_100           | 512  | 48.509307 |
|        tnt_s_patch16_224        | 128  | 47.791633 |
|          cait_m36_384           |  4   | 45.865199 |
|      xcit_large_24_p8_224       |  16  | 45.215845 |
|            mixnet_l             | 128  | 44.45681  |
|         poolformer_m36          |  64  | 43.572693 |
|        twins_pcpvt_base         | 128  | 43.301396 |
|           dm_nfnet_f0           | 128  | 42.886943 |
|         visformer_small         | 128  | 41.953368 |
|       eca_botnext26ts_256       | 128  | 40.737062 |
|           volo_d1_224           |  64  | 39.380056 |
|           resnest101e           |  64  | 36.191577 |
|       tf_efficientnet_b0        | 128  | 35.399511 |
|           convit_base           |  64  | 34.095105 |
|        res2net101_26w_4s        | 128  | 33.702782 |
|            hrnet_w18            | 128  | 33.467308 |
|          convnext_base          |  64  | 33.392463 |
|            nfnet_l0             | 128  | 32.758562 |
|       gluon_inception_v3        | 256  | 32.042656 |
|          botnet26t_256          | 128  | 30.694098 |
|        adv_inception_v3         | 128  | 30.693973 |
|          inception_v3           | 128  | 30.662742 |
|            tinynet_a            | 128  | 29.287359 |
|           res2next50            | 128  | 29.221241 |
|            pit_b_224            |  64  | 25.584367 |
|            fbnetv3_b            | 256  | 24.492655 |
|             dla102              | 128  | 23.459272 |
|          cspdarknet53           |  64  | 22.575639 |
|        ese_vovnet19b_dw         | 256  | 21.314193 |
| deit_base_distilled_patch16_224 |  64  | 21.142034 |
|      mobilenetv3_large_100      | 512  | 20.581346 |
|          gmixer_24_224          | 128  | 20.234236 |
|      vit_base_patch16_224       |  64  | 19.28201  |
|          gmlp_s16_224           | 128  | 18.609528 |
|           regnety_002           | 1024 | 15.86641  |
|          resmlp_12_224          | 128  | 15.171368 |
|      beit_base_patch16_224      |  64  | 15.104785 |
|          mixer_b16_224          | 128  | 13.912548 |
|        convmixer_768_32         |  32  | 13.584186 |
|            repvgg_a2            | 128  | 13.324626 |
|           selecsls42b           | 128  | 12.848073 |
|            lcnet_050            | 256  | 11.004022 |
|     swsl_resnext101_32x16d      |  32  | 10.699564 |
|         mobilenetv2_100         | 128  |   8.277   |
|           fbnetc_100            | 512  | 8.262722  |
|          spnasnet_100           | 128  | 8.203045  |
|            gernet_l             | 128  | 8.124881  |
|           mnasnet_100           | 512  |  7.77001  |
+---------------------------------+------+-----------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.995948 |
|           fbnetc_100            | 512  | 0.993854 |
|       gluon_inception_v3        | 256  | 0.993338 |
|            fbnetv3_b            | 256  | 0.993275 |
|           rexnet_100            | 256  | 0.993235 |
|      mobilenetv3_large_100      | 512  | 0.993159 |
|           regnety_002           | 1024 | 0.993139 |
|           mnasnet_100           | 512  | 0.99273  |
|            levit_128            | 1024 | 0.992085 |
|          ghostnet_100           | 512  | 0.991367 |
|           dm_nfnet_f0           | 128  | 0.99063  |
|       eca_botnext26ts_256       | 128  | 0.990067 |
|      xcit_large_24_p8_224       |  16  | 0.989599 |
|          mixer_b16_224          | 128  | 0.989568 |
|           convit_base           |  64  | 0.989554 |
|        twins_pcpvt_base         | 128  | 0.989097 |
|          gmlp_s16_224           | 128  | 0.989037 |
|        eca_halonext26ts         | 128  | 0.988961 |
|           res2next50            | 128  | 0.988784 |
|             dla102              | 128  | 0.988611 |
|           resnest101e           |  64  | 0.988599 |
|        res2net101_26w_4s        | 128  | 0.988509 |
|          botnet26t_256          | 128  | 0.988468 |
|           tf_mixnet_l           | 128  | 0.987887 |
|          gmixer_24_224          | 128  | 0.987761 |
|         visformer_small         | 128  | 0.987639 |
|            nfnet_l0             | 128  | 0.987525 |
|         mobilenetv2_100         | 128  | 0.987312 |
|            mixnet_l             | 128  | 0.987293 |
|       tf_efficientnet_b0        | 128  | 0.987181 |
|        adv_inception_v3         | 128  | 0.987171 |
|        res2net50_14w_8s         | 128  | 0.986991 |
|        tnt_s_patch16_224        | 128  | 0.986966 |
|          inception_v3           | 128  | 0.986699 |
|            gernet_l             | 128  | 0.985965 |
|          convnext_base          |  64  | 0.985879 |
|          pnasnet5large          |  16  | 0.985872 |
|        convmixer_768_32         |  32  | 0.985756 |
|          cspdarknet53           |  64  | 0.985564 |
|      beit_base_patch16_224      |  64  | 0.985345 |
|            hrnet_w18            | 128  | 0.985333 |
|      vit_base_patch16_224       |  64  | 0.985201 |
|            tinynet_a            | 128  | 0.984847 |
|         crossvit_9_240          | 256  | 0.98475  |
| deit_base_distilled_patch16_224 |  64  |  0.9842  |
|             dpn107              |  64  | 0.984164 |
|           selecsls42b           | 128  | 0.98374  |
|  swin_base_patch4_window7_224   |  64  | 0.983646 |
|          resmlp_12_224          | 128  | 0.982888 |
|           mobilevit_s           |  64  | 0.982792 |
|            pit_b_224            |  64  | 0.982696 |
|         poolformer_m36          |  64  | 0.981935 |
|          spnasnet_100           | 128  | 0.981273 |
|           volo_d1_224           |  64  | 0.978917 |
|          cait_m36_384           |  4   | 0.976957 |
|            lcnet_050            | 256  | 0.975947 |
|          jx_nest_base           |  32  | 0.974672 |
|            repvgg_a2            | 128  | 0.969074 |
|     swsl_resnext101_32x16d      |  32  | 0.957498 |
|        sebotnet33ts_256         |  64  | 0.83594  |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          cait_m36_384           |  4   | 469.892351 |
|      xcit_large_24_p8_224       |  16  | 270.138714 |
|        tnt_s_patch16_224        | 128  | 243.472365 |
|             dpn107              |  64  | 224.546585 |
|           dm_nfnet_f0           | 128  | 213.470542 |
|            levit_128            | 1024 | 213.426553 |
|       gluon_inception_v3        | 256  | 181.556139 |
|        ese_vovnet19b_dw         | 256  | 173.332669 |
|          convnext_base          |  64  | 169.868981 |
|          mixer_b16_224          | 128  | 165.747147 |
|           convit_base           |  64  | 159.306167 |
|            nfnet_l0             | 128  | 156.890289 |
|         poolformer_m36          |  64  | 153.820172 |
|        twins_pcpvt_base         | 128  | 145.661825 |
|  swin_base_patch4_window7_224   |  64  | 140.996165 |
|         crossvit_9_240          | 256  | 136.574569 |
|           tf_mixnet_l           | 128  | 130.732868 |
|            mixnet_l             | 128  |  128.4712  |
|          ghostnet_100           | 512  | 125.317732 |
|           volo_d1_224           |  64  | 125.076322 |
|          jx_nest_base           |  32  | 123.322597 |
|           mobilevit_s           |  64  | 115.296688 |
|        sebotnet33ts_256         |  64  | 114.205967 |
|          gmixer_24_224          | 128  | 111.62468  |
|      vit_base_patch16_224       |  64  | 110.887535 |
|      beit_base_patch16_224      |  64  | 107.369584 |
|          resmlp_12_224          | 128  | 106.248074 |
|        eca_halonext26ts         | 128  | 105.780875 |
|            pit_b_224            |  64  | 103.576115 |
|        res2net101_26w_4s        | 128  | 102.687175 |
|          gmlp_s16_224           | 128  | 102.403483 |
| deit_base_distilled_patch16_224 |  64  | 102.027213 |
|          pnasnet5large          |  16  | 101.605014 |
|        convmixer_768_32         |  32  | 100.96556  |
|           fbnetc_100            | 512  | 98.726923  |
|       eca_botnext26ts_256       | 128  | 91.818681  |
|        adv_inception_v3         | 128  | 86.915036  |
|          inception_v3           | 128  | 86.903941  |
|           regnety_002           | 1024 | 86.441212  |
|         visformer_small         | 128  | 85.353579  |
|     swsl_resnext101_32x16d      |  32  | 85.292685  |
|             dla102              | 128  | 84.944324  |
|            fbnetv3_b            | 256  | 83.086997  |
|           mnasnet_100           | 512  | 82.460066  |
|           rexnet_100            | 256  |  81.96601  |
|            hrnet_w18            | 128  | 81.398485  |
|        res2net50_14w_8s         | 128  | 81.064961  |
|      mobilenetv3_large_100      | 512  | 80.789673  |
|           res2next50            | 128  | 80.003922  |
|           resnest101e           |  64  | 72.946749  |
|          botnet26t_256          | 128  | 72.242132  |
|       tf_efficientnet_b0        | 128  | 55.439402  |
|          cspdarknet53           |  64  | 48.321753  |
|            repvgg_a2            | 128  | 46.819861  |
|            tinynet_a            | 128  | 36.270293  |
|            gernet_l             | 128  | 36.228444  |
|           selecsls42b           | 128  |  32.75625  |
|         mobilenetv2_100         | 128  |  22.12019  |
|          spnasnet_100           | 128  | 19.092037  |
|            lcnet_050            | 256  |  7.779692  |
+---------------------------------+------+------------+

@WeizhuoZhang-intel
Copy link
Contributor

[amp] Performance Dashboard for amp precision -- Single-Socket Multi-threads (2024-05-05 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 6d30803d64953955df63da56833bf4eb52249aae
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+06ad737
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c7i.metal-24xl
CPU Model Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory 192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS Ubuntu 22.04.3 LTS
Kernel 6.2.0-1017-aws
Microcode 0x2b0004d0
GCC gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.8.18
OpenSSL OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000" 

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 88%, 69/78 | 100%, 46/46 | 98%, 59/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.85x    |    2.07x    |    2.37x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   21.83    |    33.47    |    41.33    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.92x    |    0.97x    |    0.98x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 11.583372 |
|          timm_resnest           |   32    | 4.361233  |
|            resnet50             |   32    | 3.948862  |
|          squeezenet1_1          |   16    | 3.882835  |
|          mobilenet_v2           |   16    | 3.372006  |
|              vgg16              |    4    | 3.366905  |
|         phlippe_resnet          |   128   | 3.326193  |
|           mnasnet1_0            |   32    | 3.262091  |
|            resnet152            |   32    | 3.122944  |
|         resnext50_32x4d         |    8    | 3.114623  |
|           timm_vovnet           |   32    | 3.088671  |
|             alexnet             |   128   | 3.087701  |
|             yolov3              |    8    | 3.045542  |
|            resnet18             |    8    | 3.022662  |
|       mobilenet_v3_large        |   32    | 2.856747  |
|       shufflenet_v2_x1_0        |   64    | 2.527337  |
|             hf_GPT2             |    1    | 2.385541  |
|        soft_actor_critic        |   256   | 2.314837  |
|           timm_regnet           |   32    | 2.306777  |
|        phlippe_densenet         |   128   | 2.281088  |
|        timm_efficientnet        |   64    | 2.233593  |
|           densenet121           |   64    | 2.131078  |
|          lennard_jones          |  1000   | 2.124986  |
|           timm_nfnet            |   128   | 2.109797  |
|             hf_Bert             |    1    | 2.084506  |
|          pytorch_unet           |    1    | 2.079293  |
|          hf_DistilBert          |    1    | 2.032485  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 2.009912  |
|     functorch_maml_omniglot     |    1    |  2.00064  |
|               drq               |    1    | 1.974632  |
|              dcgan              |   256   | 1.970783  |
|          hf_Bert_large          |    1    | 1.964507  |
|           hf_T5_base            |    1    | 1.919217  |
|             hf_Bart             |    1    | 1.901409  |
|          BERT_pytorch           |    2    | 1.885912  |
|         LearningToPaint         |   96    |  1.85457  |
|      doctr_reco_predictor       |    1    |  1.79794  |
|          fastNLP_Bert           |    1    | 1.760527  |
|              hf_T5              |    1    | 1.738905  |
|            moondream            |    1    | 1.659057  |
|       Background_Matting        |    1    | 1.635236  |
|            hf_Albert            |    1    | 1.588267  |
|           hf_T5_large           |    1    | 1.586425  |
|          hf_GPT2_large          |    1    | 1.563343  |
|        basic_gnn_edgecnn        |    1    | 1.526828  |
|     timm_vision_transformer     |   32    | 1.484306  |
|        hf_distil_whisper        |    1    |  1.42738  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.400864  |
|          hf_Longformer          |    1    |  1.35521  |
|          maml_omniglot          |    5    | 1.354287  |
|          basic_gnn_gcn          |    1    | 1.328639  |
|         pytorch_stargan         |   16    | 1.319267  |
|          basic_gnn_gin          |    1    | 1.313866  |
|       speech_transformer        |    1    | 1.258736  |
|         basic_gnn_sage          |    1    | 1.232662  |
|    detectron2_fcos_r_50_fpn     |    1    |  1.23015  |
|           hf_Reformer           |    1    |  1.21274  |
| detectron2_fasterrcnn_r_101_c4  |    1    |  1.20492  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.158331  |
|     nvidia_deeprecommender      |   256   | 1.132359  |
| detectron2_fasterrcnn_r_101_dc5 |    1    |  1.1068   |
|      torch_multimodal_clip      |   32    | 1.083205  |
|  timm_vision_transformer_large  |   32    | 1.077413  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |   1.063   |
|           hf_BigBird            |    1    |  1.05471  |
|             demucs              |    1    | 1.048042  |
|              dlrm               |  2048   | 1.004158  |
|   mobilenet_v2_quantized_qat    |   96    | 0.997831  |
|           tts_angular           |   64    | 0.978837  |
|     resnet50_quantized_qat      |   32    | 0.974717  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.862534  |
|              maml               |    1    | 0.752371  |
|         opacus_cifar10          |   64    | 0.473516  |
|      functorch_dp_cifar10       |   64    | 0.435264  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
|       doctr_det_predictor       |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+--------------------------------+---------+--------------------+
|              name              |   bs    |      inductor      |
+--------------------------------+---------+--------------------+
|       Background_Matting       |    1    |  pass_due_to_skip  |
| timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|          hf_T5_large           |    4    |  pass_due_to_skip  |
|         hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml              |    1    |  pass_due_to_skip  |
|         lennard_jones          |    4    |        pass        |
|         maml_omniglot          |    5    |        pass        |
|             llama              |    4    |        pass        |
|      functorch_dp_cifar10      |    4    |        pass        |
|        LearningToPaint         |    4    |        pass        |
|            alexnet             |    4    |        pass        |
|       basic_gnn_edgecnn        |    1    |        pass        |
|         basic_gnn_gcn          |    1    |        pass        |
|         basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage         |    1    |        pass        |
|             dcgan              |    4    |        pass        |
|             demucs             |    1    |        pass        |
|          densenet121           |    4    |        pass        |
|    detectron2_fcos_r_50_fpn    |    4    |        pass        |
|              dlrm              |    4    |        pass        |
|          fastNLP_Bert          |    4    |        pass        |
|             yolov3             |    4    |        pass        |
|              drq               |    1    |        pass        |
|     resnet50_quantized_qat     |    4    |        pass        |
|         hf_DistilBert          |    4    |        pass        |
|       hf_distil_whisper        |    4    |        pass        |
|           hf_T5_base           |    4    |        pass        |
|             hf_T5              |    4    |        pass        |
|          hf_Reformer           |    4    |        pass        |
|         hf_Longformer          |    4    |        pass        |
|           hf_Albert            |    4    |        pass        |
|            hf_GPT2             |    2    |        pass        |
|           hf_BigBird           |    4    |        pass        |
|         hf_Bert_large          |    4    |        pass        |
|            hf_Bert             |    4    |        pass        |
|            hf_Bart             |    4    |        pass        |
|    functorch_maml_omniglot     |    1    |        pass        |
|            resnet18            |    4    |        pass        |
|        resnext50_32x4d         |    4    |        pass        |
|          pytorch_unet          |    2    |        pass        |
|        pytorch_stargan         |   16    |        pass        |
|  pytorch_CycleGAN_and_pix2pix  |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy | 1048576 |        pass        |
|    pyhpc_isoneutral_mixing     |    4    |        pass        |
|    pyhpc_equation_of_state     |    4    |        pass        |
|         phlippe_resnet         |    4    |        pass        |
|         opacus_cifar10         |    4    |        pass        |
|        phlippe_densenet        |    4    |        pass        |
|           resnet152            |    4    |        pass        |
|     nvidia_deeprecommender     |    4    |        pass        |
|           moondream            |    4    |        pass        |
|       mobilenet_v3_large       |    4    |        pass        |
|       shufflenet_v2_x1_0       |    4    |        pass        |
|          mobilenet_v2          |    4    |        pass        |
|   mobilenet_v2_quantized_qat   |    4    |        pass        |
|          BERT_pytorch          |    4    |        pass        |
|          timm_resnest          |    4    |        pass        |
|       soft_actor_critic        |   256   |        pass        |
|       speech_transformer       |    1    |        pass        |
|         squeezenet1_1          |    4    |        pass        |
|           mnasnet1_0           |    4    |        pass        |
|           timm_nfnet           |    4    |        pass        |
|          timm_regnet           |    4    |        pass        |
|       timm_efficientnet        |    4    |        pass        |
|    timm_vision_transformer     |    4    |        pass        |
|          timm_vovnet           |    4    |        pass        |
|     torch_multimodal_clip      |    4    |        pass        |
|          tts_angular           |    4    |        pass        |
|             vgg16              |    4    |        pass        |
|            resnet50            |    4    |        pass        |
|       timm_efficientdet        |    0    | model_fail_to_load |
|              moco              |    0    | model_fail_to_load |
|         DALLE2_pytorch         |    0    | model_fail_to_load |
|          Super_SloMo           |    4    |    fail_to_run     |
|        vision_maskrcnn         |    1    |    fail_to_run     |
|      doctr_reco_predictor      |    4    |   fail_accuracy    |
|      doctr_det_predictor       |    0    | eager_fail_to_run  |
+--------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           densenet121           |   64    | 84.198568 |
|           hf_BigBird            |    1    | 67.737083 |
|    detectron2_fcos_r_50_fpn     |    1    | 58.451083 |
|  timm_vision_transformer_large  |   32    | 52.035176 |
|           hf_T5_large           |    1    | 51.742338 |
|              maml               |    1    | 48.884124 |
|           timm_nfnet            |   128   | 46.760059 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 42.941179 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 40.416683 |
|          hf_Longformer          |    1    | 40.412099 |
|            moondream            |    1    | 38.834537 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 37.79302  |
|        phlippe_densenet         |   128   | 37.561973 |
|           hf_Reformer           |    1    | 37.108778 |
|          hf_GPT2_large          |    1    | 35.807035 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 35.005555 |
|           hf_T5_base            |    1    | 34.167135 |
|       speech_transformer        |    1    | 34.077379 |
|      torch_multimodal_clip      |   32    | 33.245772 |
|        timm_efficientnet        |   64    | 32.298047 |
|             yolov3              |    8    | 30.206305 |
|             demucs              |    1    | 29.889138 |
|        hf_distil_whisper        |    1    | 27.956843 |
|          BERT_pytorch           |    2    | 27.520656 |
|         opacus_cifar10          |   64    | 25.974508 |
|      functorch_dp_cifar10       |   64    | 24.590223 |
|              hf_T5              |    1    | 24.334326 |
|     timm_vision_transformer     |   32    | 23.531623 |
|          timm_resnest           |   32    | 22.859162 |
|       mobilenet_v3_large        |   32    | 22.428692 |
|       shufflenet_v2_x1_0        |   64    | 22.209868 |
|     pyhpc_isoneutral_mixing     | 1048576 | 21.985607 |
|           timm_regnet           |   32    | 21.801644 |
|          hf_Bert_large          |    1    | 21.096453 |
|             hf_GPT2             |    1    | 19.909908 |
|       Background_Matting        |    1    | 19.129808 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 18.988511 |
|           timm_vovnet           |   32    | 18.929105 |
|            resnet152            |   32    | 18.166425 |
|          fastNLP_Bert           |    1    | 17.203739 |
|          pytorch_unet           |    1    | 17.068977 |
|         pytorch_stargan         |   16    | 17.029306 |
|             hf_Bart             |    1    | 16.968171 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 15.506289 |
|            hf_Albert            |    1    | 14.994117 |
|             hf_Bert             |    1    | 14.715039 |
|          squeezenet1_1          |   16    | 14.517161 |
|          hf_DistilBert          |    1    | 12.483401 |
|      doctr_reco_predictor       |    1    | 11.23174  |
|              vgg16              |    4    | 11.020156 |
|            resnet50             |   32    | 10.895577 |
|          mobilenet_v2           |   16    | 10.887711 |
|         resnext50_32x4d         |    8    | 10.824012 |
|          basic_gnn_gcn          |    1    | 10.743284 |
|         basic_gnn_sage          |    1    | 10.092439 |
|             alexnet             |   128   | 9.274418  |
|               drq               |    1    | 8.986029  |
|           mnasnet1_0            |   32    | 8.875712  |
|     nvidia_deeprecommender      |   256   | 8.501068  |
|            resnet18             |    8    | 8.436338  |
|              dlrm               |  2048   | 7.994428  |
|     functorch_maml_omniglot     |    1    | 7.986088  |
|          maml_omniglot          |    5    | 7.956193  |
|         LearningToPaint         |   96    | 7.513074  |
|          basic_gnn_gin          |    1    | 7.495806  |
|         phlippe_resnet          |   128   | 7.134773  |
|     pyhpc_equation_of_state     | 1048576 | 7.079113  |
|        basic_gnn_edgecnn        |    1    | 6.729437  |
|        soft_actor_critic        |   256   | 6.652032  |
|          lennard_jones          |  1000   | 5.568611  |
|              dcgan              |   256   | 5.075793  |
|           tts_angular           |   64    | 4.640846  |
|   mobilenet_v2_quantized_qat    |   96    | 0.225776  |
|     resnet50_quantized_qat      |   32    | 0.202484  |
|       doctr_det_predictor       |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |  2048   | 0.987928 |
|           timm_nfnet            |   128   | 0.987445 |
|           hf_T5_base            |    1    | 0.982663 |
|             demucs              |    1    | 0.981871 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.978207 |
|        timm_efficientnet        |   64    | 0.97713  |
|           timm_regnet           |   32    | 0.976961 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.976353 |
|      torch_multimodal_clip      |   32    | 0.975773 |
|          pytorch_unet           |    1    | 0.970077 |
|            resnet152            |   32    | 0.969951 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.969818 |
|           densenet121           |   64    | 0.968776 |
|         LearningToPaint         |   96    | 0.968166 |
|             yolov3              |    8    | 0.967182 |
|            resnet50             |   32    | 0.964237 |
|             alexnet             |   128   | 0.963941 |
|           timm_vovnet           |   32    | 0.961327 |
|           hf_BigBird            |    1    | 0.961083 |
|        basic_gnn_edgecnn        |    1    | 0.959658 |
|          timm_resnest           |   32    | 0.955303 |
|     resnet50_quantized_qat      |   32    | 0.955189 |
|       Background_Matting        |    1    | 0.954874 |
|     timm_vision_transformer     |   32    | 0.954296 |
|   mobilenet_v2_quantized_qat    |   96    | 0.953278 |
|          basic_gnn_gcn          |    1    | 0.942568 |
|           mnasnet1_0            |   32    | 0.942544 |
|          mobilenet_v2           |   16    | 0.940672 |
|  timm_vision_transformer_large  |   32    | 0.938252 |
|       mobilenet_v3_large        |   32    | 0.937504 |
|         resnext50_32x4d         |    8    | 0.936488 |
|       shufflenet_v2_x1_0        |   64    | 0.935911 |
|     pyhpc_equation_of_state     | 1048576 | 0.935548 |
|         basic_gnn_sage          |    1    | 0.93111  |
|          basic_gnn_gin          |    1    | 0.931023 |
|       speech_transformer        |    1    | 0.930219 |
|             hf_Bert             |    1    | 0.929717 |
|      doctr_reco_predictor       |    1    | 0.926426 |
|         pytorch_stargan         |   16    | 0.923429 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.920727 |
|     nvidia_deeprecommender      |   256   | 0.920577 |
|            moondream            |    1    | 0.917097 |
|          fastNLP_Bert           |    1    | 0.917028 |
|          BERT_pytorch           |    2    | 0.915502 |
|        phlippe_densenet         |   128   | 0.915481 |
|            hf_Albert            |    1    | 0.914207 |
|        hf_distil_whisper        |    1    | 0.913258 |
|              hf_T5              |    1    | 0.909365 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.906175 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.905722 |
|              dcgan              |   256   | 0.905101 |
|               drq               |    1    | 0.90508  |
|             hf_GPT2             |    1    | 0.903395 |
|          squeezenet1_1          |   16    | 0.900306 |
|          hf_DistilBert          |    1    | 0.899002 |
|          hf_Longformer          |    1    | 0.896426 |
|         opacus_cifar10          |   64    | 0.892197 |
|          hf_Bert_large          |    1    | 0.891882 |
|          hf_GPT2_large          |    1    | 0.891134 |
|             hf_Bart             |    1    | 0.890377 |
|        soft_actor_critic        |   256   | 0.887208 |
|              vgg16              |    4    | 0.882749 |
|           tts_angular           |   64    | 0.881166 |
|         phlippe_resnet          |   128   | 0.876858 |
|      functorch_dp_cifar10       |   64    | 0.870013 |
|          lennard_jones          |  1000   | 0.86802  |
|          maml_omniglot          |    5    | 0.866129 |
|     functorch_maml_omniglot     |    1    | 0.864516 |
|           hf_T5_large           |    1    | 0.847176 |
|            resnet18             |    8    | 0.82743  |
|           hf_Reformer           |    1    | 0.80901  |
|              maml               |    1    | 0.785069 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.757376 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.568339 |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|       doctr_det_predictor       |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|  timm_vision_transformer_large  |   32    | 850.407533 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 791.294756 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 780.239354 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 695.574738 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 670.091203 |
|           hf_T5_base            |    1    | 428.443316 |
|          hf_GPT2_large          |    1    | 152.018564 |
|           hf_T5_large           |    1    | 127.386011 |
|           timm_nfnet            |   128   | 127.373027 |
|            moondream            |    1    | 116.227723 |
|        hf_distil_whisper        |    1    | 92.566873  |
|    detectron2_fcos_r_50_fpn     |    1    | 83.075942  |
|           hf_BigBird            |    1    | 79.198507  |
|       Background_Matting        |    1    | 66.460288  |
|      torch_multimodal_clip      |   32    |  57.25899  |
|          pytorch_unet           |    1    | 57.205701  |
|           densenet121           |   64    | 55.274099  |
|             demucs              |    1    | 50.805395  |
|           timm_regnet           |   32    |  44.9329   |
|              maml               |    1    | 42.866713  |
|          hf_Longformer          |    1    | 38.441431  |
|          hf_Bert_large          |    1    | 34.909026  |
|            resnet152            |   32    | 34.310001  |
|             yolov3              |    8    | 31.128547  |
|   mobilenet_v2_quantized_qat    |   96    | 30.551314  |
|     pyhpc_isoneutral_mixing     | 1048576 | 29.632544  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 28.309924  |
|        timm_efficientnet        |   64    | 26.473346  |
|       speech_transformer        |    1    | 23.420975  |
|           hf_Reformer           |    1    | 21.559652  |
|     nvidia_deeprecommender      |   256   | 21.119365  |
|     timm_vision_transformer     |   32    | 20.571043  |
|              hf_T5              |    1    | 18.242068  |
|             hf_Bart             |    1    | 18.140249  |
|         pytorch_stargan         |   16    | 18.075064  |
|           timm_vovnet           |   32    | 17.413034  |
|          fastNLP_Bert           |    1    | 17.089063  |
|         opacus_cifar10          |   64    | 14.771828  |
|      functorch_dp_cifar10       |   64    | 14.642308  |
|            hf_Albert            |    1    | 14.637021  |
|             hf_Bert             |    1    | 14.018614  |
|            resnet50             |   32    | 12.720847  |
|          BERT_pytorch           |    2    | 12.192282  |
|             hf_GPT2             |    1    | 10.925039  |
|     resnet50_quantized_qat      |   32    | 10.245347  |
|          timm_resnest           |   32    |  9.571667  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  9.32194   |
|       shufflenet_v2_x1_0        |   64    |  8.818475  |
|         LearningToPaint         |   96    |  8.141154  |
|          hf_DistilBert          |    1    |  8.097709  |
|           tts_angular           |   64    |  7.929387  |
|         resnext50_32x4d         |    8    |  7.898877  |
|       mobilenet_v3_large        |   32    |  7.85657   |
|          basic_gnn_gcn          |    1    |  7.472539  |
|        basic_gnn_edgecnn        |    1    |  7.458652  |
|        phlippe_densenet         |   128   |  7.16985   |
|              vgg16              |    4    |  6.84687   |
|             alexnet             |   128   |  6.111262  |
|           mnasnet1_0            |   32    |  5.955141  |
|          mobilenet_v2           |   16    |  4.509189  |
|              dlrm               |  2048   |  4.391598  |
|         basic_gnn_sage          |    1    |  3.79665   |
|          basic_gnn_gin          |    1    |  3.359998  |
|          squeezenet1_1          |   16    |  3.028482  |
|            resnet18             |    8    |  2.956476  |
|      doctr_reco_predictor       |    1    |  2.884811  |
|              dcgan              |   256   |  2.75487   |
|         phlippe_resnet          |   128   |  1.580226  |
|     pyhpc_equation_of_state     | 1048576 |  1.55696   |
|               drq               |    1    |  0.750833  |
|          maml_omniglot          |    5    |  0.496891  |
|     functorch_maml_omniglot     |    1    |  0.361724  |
|        soft_actor_critic        |   256   |  0.235942  |
|          lennard_jones          |  1000   |  0.144324  |
|       doctr_det_predictor       |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
|         DALLE2_pytorch          |    0    |    0.0     |
|              moco               |    0    |    0.0     |
+---------------------------------+---------+------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|            XLNetLMHeadModel             |  8  | 11.859923 |
|           ElectraForCausalLM            | 32  | 3.591992  |
|       ElectraForQuestionAnswering       | 64  | 3.281866  |
|          MobileBertForMaskedLM          | 128 | 3.269875  |
|     MobileBertForQuestionAnswering      | 128 |  3.12112  |
|           RobertaForCausalLM            | 16  | 3.057315  |
|    LayoutLMForSequenceClassification    | 16  |  3.03949  |
|       RobertaForQuestionAnswering       | 16  | 2.988523  |
|           LayoutLMForMaskedLM           | 16  | 2.987315  |
|        BertForQuestionAnswering         | 16  | 2.977138  |
|             BertForMaskedLM             | 16  | 2.935513  |
|                CamemBert                | 16  | 2.898633  |
|       T5ForConditionalGeneration        |  4  |  2.66707  |
|    MegatronBertForQuestionAnswering     |  8  | 2.651984  |
|                 T5Small                 |  4  |  2.63706  |
|            YituTechConvBert             | 16  | 2.366956  |
|         MegatronBertForCausalLM         |  4  | 2.176205  |
|       MT5ForConditionalGeneration       | 16  | 2.168602  |
|               DistillGPT2               | 16  | 2.087249  |
|             OPTForCausalLM              |  2  | 1.953445  |
|      DebertaV2ForQuestionAnswering      |  1  |  1.92509  |
|         Speech2Text2ForCausalLM         | 256 | 1.891054  |
|      GPT2ForSequenceClassification      |  4  | 1.847469  |
|          DistilBertForMaskedLM          | 128 | 1.827761  |
|       BlenderbotSmallForCausalLM        | 64  | 1.780492  |
|             XGLMForCausalLM             |  8  | 1.757534  |
|           DebertaForMaskedLM            |  8  | 1.750069  |
|            PLBartForCausalLM            |  8  | 1.731922  |
| BlenderbotSmallForConditionalGeneration | 64  | 1.684321  |
|          BlenderbotForCausalLM          |  4  | 1.670393  |
|            MBartForCausalLM             |  4  | 1.654168  |
|            TrOCRForCausalLM             | 32  | 1.612209  |
|     PLBartForConditionalGeneration      |  4  | 1.609235  |
|     DistilBertForQuestionAnswering      | 256 |  1.60713  |
|       DebertaForQuestionAnswering       | 16  | 1.564509  |
|          DebertaV2ForMaskedLM           |  2  | 1.511558  |
|     M2M100ForConditionalGeneration      | 16  | 1.510332  |
|      MBartForConditionalGeneration      |  2  | 1.498462  |
|     PegasusForConditionalGeneration     | 32  | 1.495397  |
|           PegasusForCausalLM            | 32  | 1.495185  |
|      BartForConditionalGeneration       |  2  | 1.437108  |
|             BartForCausalLM             |  4  | 1.410747  |
|               GoogleFnet                | 16  | 1.348543  |
|            AlbertForMaskedLM            |  4  | 1.319431  |
|       AlbertForQuestionAnswering        |  4  | 1.315974  |
|          AllenaiLongformerBase          |  4  | 1.049412  |
+-----------------------------------------+-----+-----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 101.887317 |
|          MobileBertForMaskedLM          | 128 | 57.745355  |
|     MobileBertForQuestionAnswering      | 128 | 56.987086  |
|     PegasusForConditionalGeneration     | 32  | 56.485367  |
|     M2M100ForConditionalGeneration      | 16  | 55.690326  |
|      MBartForConditionalGeneration      |  2  | 54.711339  |
|          BlenderbotForCausalLM          |  4  | 50.663457  |
|       MT5ForConditionalGeneration       | 16  | 49.254908  |
|             XGLMForCausalLM             |  8  | 46.201877  |
|          DebertaV2ForMaskedLM           |  2  | 41.877904  |
|       T5ForConditionalGeneration        |  4  | 40.575563  |
|                 T5Small                 |  4  | 40.320006  |
|            XLNetLMHeadModel             |  8  | 38.258324  |
|      BartForConditionalGeneration       |  2  |  38.19818  |
|         MegatronBertForCausalLM         |  4  | 37.894998  |
| BlenderbotSmallForConditionalGeneration | 64  | 37.709196  |
|    MegatronBertForQuestionAnswering     |  8  | 37.203103  |
|            YituTechConvBert             | 16  | 36.508184  |
|     PLBartForConditionalGeneration      |  4  | 33.414016  |
|             OPTForCausalLM              |  2  | 33.325073  |
|           PegasusForCausalLM            | 32  |  31.30607  |
|            MBartForCausalLM             |  4  |  29.89019  |
|      GPT2ForSequenceClassification      |  4  | 28.438719  |
|      DebertaV2ForQuestionAnswering      |  1  | 26.373602  |
|       DebertaForQuestionAnswering       | 16  | 26.330292  |
|           DebertaForMaskedLM            |  8  | 26.100535  |
|            TrOCRForCausalLM             | 32  | 25.829579  |
|               DistillGPT2               | 16  | 23.470104  |
|           RobertaForCausalLM            | 16  | 22.933443  |
|            AlbertForMaskedLM            |  4  | 22.569673  |
|       AlbertForQuestionAnswering        |  4  | 22.564427  |
|                CamemBert                | 16  | 22.263647  |
|       RobertaForQuestionAnswering       | 16  | 22.203316  |
|    LayoutLMForSequenceClassification    | 16  | 21.958646  |
|           ElectraForCausalLM            | 32  | 21.767663  |
|           LayoutLMForMaskedLM           | 16  | 21.437627  |
|             BertForMaskedLM             | 16  | 21.059638  |
|        BertForQuestionAnswering         | 16  | 21.046726  |
|       ElectraForQuestionAnswering       | 64  | 20.815483  |
|       BlenderbotSmallForCausalLM        | 64  | 20.805901  |
|          DistilBertForMaskedLM          | 128 | 20.178826  |
|     DistilBertForQuestionAnswering      | 256 | 20.078244  |
|             BartForCausalLM             |  4  | 19.993944  |
|         Speech2Text2ForCausalLM         | 256 | 19.114918  |
|            PLBartForCausalLM            |  8  | 19.058326  |
|               GoogleFnet                | 16  | 17.308659  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       AlbertForQuestionAnswering        |  4  | 0.991988 |
|            AlbertForMaskedLM            |  4  | 0.991937 |
|            PLBartForCausalLM            |  8  | 0.991835 |
|               GoogleFnet                | 16  | 0.991217 |
|          DistilBertForMaskedLM          | 128 | 0.990127 |
|               DistillGPT2               | 16  | 0.990045 |
|             BertForMaskedLM             | 16  | 0.990009 |
|           ElectraForCausalLM            | 32  | 0.989551 |
|            YituTechConvBert             | 16  | 0.988025 |
|       ElectraForQuestionAnswering       | 64  | 0.987641 |
|             OPTForCausalLM              |  2  | 0.987241 |
|                CamemBert                | 16  | 0.987161 |
|       BlenderbotSmallForCausalLM        | 64  | 0.987081 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.986969 |
|         Speech2Text2ForCausalLM         | 256 | 0.986433 |
|           RobertaForCausalLM            | 16  | 0.985748 |
|        BertForQuestionAnswering         | 16  | 0.985743 |
|           LayoutLMForMaskedLM           | 16  | 0.984831 |
|     DistilBertForQuestionAnswering      | 256 | 0.983198 |
|       MT5ForConditionalGeneration       | 16  | 0.983117 |
|            TrOCRForCausalLM             | 32  | 0.981034 |
|       DebertaForQuestionAnswering       | 16  | 0.980885 |
|            MBartForCausalLM             |  4  | 0.980206 |
|       RobertaForQuestionAnswering       | 16  | 0.980105 |
|    LayoutLMForSequenceClassification    | 16  | 0.979897 |
|                 T5Small                 |  4  | 0.979358 |
|          MobileBertForMaskedLM          | 128 | 0.978952 |
|      GPT2ForSequenceClassification      |  4  | 0.974081 |
|       T5ForConditionalGeneration        |  4  | 0.972565 |
|          AllenaiLongformerBase          |  4  | 0.970436 |
|            XLNetLMHeadModel             |  8  | 0.969643 |
|     MobileBertForQuestionAnswering      | 128 | 0.968157 |
|           PegasusForCausalLM            | 32  | 0.967771 |
|           DebertaForMaskedLM            |  8  | 0.966944 |
|             BartForCausalLM             |  4  | 0.965997 |
|     PLBartForConditionalGeneration      |  4  | 0.963493 |
|     M2M100ForConditionalGeneration      | 16  | 0.954264 |
|         MegatronBertForCausalLM         |  4  | 0.945517 |
|    MegatronBertForQuestionAnswering     |  8  | 0.939911 |
|          BlenderbotForCausalLM          |  4  | 0.936812 |
|             XGLMForCausalLM             |  8  | 0.936025 |
|      BartForConditionalGeneration       |  2  | 0.935936 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.932784 |
|      MBartForConditionalGeneration      |  2  | 0.926821 |
|     PegasusForConditionalGeneration     | 32  | 0.918211 |
|          DebertaV2ForMaskedLM           |  2  | 0.888033 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 946.994337 |
|            AlbertForMaskedLM            |  4  | 533.57865  |
|       AlbertForQuestionAnswering        |  4  | 528.375337 |
|            XLNetLMHeadModel             |  8  | 393.868118 |
|               GoogleFnet                | 16  | 274.48467  |
|             OPTForCausalLM              |  2  | 206.819001 |
|      MBartForConditionalGeneration      |  2  | 173.029557 |
|            TrOCRForCausalLM             | 32  | 173.01729  |
|            MBartForCausalLM             |  4  | 172.633076 |
|     PegasusForConditionalGeneration     | 32  | 165.91941  |
|          DebertaV2ForMaskedLM           |  2  | 160.282768 |
|     DistilBertForQuestionAnswering      | 256 | 146.433978 |
|            PLBartForCausalLM            |  8  | 134.579375 |
|      BartForConditionalGeneration       |  2  | 132.685042 |
|    MegatronBertForQuestionAnswering     |  8  | 124.568539 |
|     PLBartForConditionalGeneration      |  4  | 123.026751 |
|            YituTechConvBert             | 16  | 122.536034 |
|          BlenderbotForCausalLM          |  4  | 122.196516 |
|     M2M100ForConditionalGeneration      | 16  | 117.446529 |
|       DebertaForQuestionAnswering       | 16  | 117.215735 |
|       T5ForConditionalGeneration        |  4  | 105.959235 |
|                 T5Small                 |  4  | 105.949696 |
| BlenderbotSmallForConditionalGeneration | 64  | 104.277514 |
|          DistilBertForMaskedLM          | 128 | 103.870104 |
|           RobertaForCausalLM            | 16  | 103.240203 |
|             BartForCausalLM             |  4  | 96.705825  |
|                CamemBert                | 16  | 93.880642  |
|           LayoutLMForMaskedLM           | 16  | 92.170964  |
|             BertForMaskedLM             | 16  | 92.018965  |
|          MobileBertForMaskedLM          | 128 | 87.073542  |
|         MegatronBertForCausalLM         |  4  | 84.923758  |
|               DistillGPT2               | 16  | 81.854138  |
|           PegasusForCausalLM            | 32  | 80.091427  |
|       ElectraForQuestionAnswering       | 64  | 78.000661  |
|             XGLMForCausalLM             |  8  | 76.520683  |
|        BertForQuestionAnswering         | 16  | 75.749578  |
|       RobertaForQuestionAnswering       | 16  | 74.941752  |
|    LayoutLMForSequenceClassification    | 16  | 74.367779  |
|           DebertaForMaskedLM            |  8  | 74.016319  |
|      GPT2ForSequenceClassification      |  4  | 70.526487  |
|       BlenderbotSmallForCausalLM        | 64  | 59.603754  |
|         Speech2Text2ForCausalLM         | 256 | 58.690633  |
|       MT5ForConditionalGeneration       | 16  |  58.0352   |
|           ElectraForCausalLM            | 32  | 56.820576  |
|      DebertaV2ForQuestionAnswering      |  1  | 56.671734  |
|     MobileBertForQuestionAnswering      | 128 | 51.160365  |
+-----------------------------------------+-----+------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           resnest101e           |  64  | 4.579912 |
|           mnasnet_100           | 512  | 4.571489 |
|           fbnetc_100            | 512  | 4.567034 |
|           regnety_002           | 1024 | 4.179937 |
|      mobilenetv3_large_100      | 512  | 4.16405  |
|            lcnet_050            | 256  | 4.159318 |
|        ese_vovnet19b_dw         | 256  | 4.096491 |
|         mobilenetv2_100         | 128  | 4.095157 |
|          cspdarknet53           |  64  | 4.002903 |
|          botnet26t_256          | 128  | 3.873532 |
|           res2next50            | 128  | 3.821163 |
|          spnasnet_100           | 128  | 3.742647 |
|       gluon_inception_v3        | 256  | 3.679933 |
|          inception_v3           | 128  | 3.635456 |
|             dla102              | 128  | 3.617796 |
|        res2net101_26w_4s        | 128  | 3.617172 |
|        res2net50_14w_8s         | 128  | 3.60661  |
|            hrnet_w18            | 128  | 3.566797 |
|        adv_inception_v3         | 128  | 3.536017 |
|            fbnetv3_b            | 256  | 3.516941 |
|           rexnet_100            | 256  | 3.435201 |
|          pnasnet5large          |  16  | 3.423837 |
|            gernet_l             | 128  | 3.410485 |
|     swsl_resnext101_32x16d      |  32  | 3.252249 |
|            nfnet_l0             | 128  | 3.085885 |
|       eca_botnext26ts_256       | 128  | 2.805159 |
|        eca_halonext26ts         | 128  | 2.71917  |
|           volo_d1_224           |  64  | 2.63186  |
|           dm_nfnet_f0           | 128  | 2.558532 |
|           selecsls42b           | 128  | 2.49112  |
|            repvgg_a2            | 128  | 2.46882  |
|          ghostnet_100           | 512  | 2.386946 |
|       tf_efficientnet_b0        | 128  | 2.310806 |
|         visformer_small         | 128  | 2.296453 |
|            tinynet_a            | 128  | 2.114114 |
|         poolformer_m36          |  64  | 2.02571  |
|        convmixer_768_32         |  32  | 1.994635 |
|             dpn107              |  64  | 1.890537 |
|          convnext_base          |  64  | 1.871451 |
|            levit_128            | 1024 | 1.866965 |
|      xcit_large_24_p8_224       |  16  | 1.753998 |
|           tf_mixnet_l           | 128  | 1.743881 |
|            mixnet_l             | 128  | 1.708714 |
|          gmlp_s16_224           | 128  | 1.654203 |
|        twins_pcpvt_base         | 128  | 1.622962 |
|  swin_base_patch4_window7_224   |  64  | 1.609423 |
|           mobilevit_s           |  64  | 1.573344 |
| deit_base_distilled_patch16_224 |  64  | 1.468578 |
|      beit_base_patch16_224      |  64  | 1.456535 |
|        sebotnet33ts_256         |  64  | 1.45332  |
|          gmixer_24_224          | 128  | 1.358473 |
|      vit_base_patch16_224       |  64  | 1.346624 |
|           convit_base           |  64  | 1.346249 |
|          mixer_b16_224          | 128  | 1.338827 |
|            pit_b_224            |  64  | 1.334922 |
|         crossvit_9_240          | 256  | 1.330985 |
|        tnt_s_patch16_224        | 128  | 1.206937 |
|          jx_nest_base           |  32  | 1.101236 |
|          resmlp_12_224          | 128  | 1.075761 |
|          cait_m36_384           |  4   | 0.824789 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 8  |     pass      |
|           mnasnet_100           | 8  |     pass      |
|          mixer_b16_224          | 8  |     pass      |
|          botnet26t_256          | 8  |     pass      |
|          cait_m36_384           | 8  |     pass      |
|           convit_base           | 8  |     pass      |
|        convmixer_768_32         | 8  |     pass      |
|          convnext_base          | 8  |     pass      |
|         crossvit_9_240          | 8  |     pass      |
|          cspdarknet53           | 8  |     pass      |
| deit_base_distilled_patch16_224 | 8  |     pass      |
|             dla102              | 8  |     pass      |
|           dm_nfnet_f0           | 8  |     pass      |
|             dpn107              | 8  |     pass      |
|       eca_botnext26ts_256       | 8  |     pass      |
|        eca_halonext26ts         | 8  |     pass      |
|        ese_vovnet19b_dw         | 8  |     pass      |
|           fbnetc_100            | 8  |     pass      |
|            fbnetv3_b            | 8  |     pass      |
|            gernet_l             | 8  |     pass      |
|          ghostnet_100           | 8  |     pass      |
|       gluon_inception_v3        | 8  |     pass      |
|          gmixer_24_224          | 8  |     pass      |
|          gmlp_s16_224           | 8  |     pass      |
|            hrnet_w18            | 8  |     pass      |
|          inception_v3           | 8  |     pass      |
|          jx_nest_base           | 8  |     pass      |
|            levit_128            | 8  |     pass      |
|      xcit_large_24_p8_224       | 8  |     pass      |
|      beit_base_patch16_224      | 8  |     pass      |
|            mixnet_l             | 8  |     pass      |
|         mobilenetv2_100         | 8  |     pass      |
|           volo_d1_224           | 8  |     pass      |
|      mobilenetv3_large_100      | 8  |     pass      |
|           mobilevit_s           | 8  |     pass      |
|            nfnet_l0             | 8  |     pass      |
|            pit_b_224            | 8  |     pass      |
|          pnasnet5large          | 8  |     pass      |
|         poolformer_m36          | 8  |     pass      |
|           regnety_002           | 8  |     pass      |
|            repvgg_a2            | 8  |     pass      |
|        res2net101_26w_4s        | 8  |     pass      |
|        res2net50_14w_8s         | 8  |     pass      |
|           res2next50            | 8  |     pass      |
|          resmlp_12_224          | 8  |     pass      |
|           resnest101e           | 8  |     pass      |
|           rexnet_100            | 8  |     pass      |
|        sebotnet33ts_256         | 8  |     pass      |
|           selecsls42b           | 8  |     pass      |
|          spnasnet_100           | 8  |     pass      |
|  swin_base_patch4_window7_224   | 8  |     pass      |
|     swsl_resnext101_32x16d      | 8  |     pass      |
|       tf_efficientnet_b0        | 8  |     pass      |
|           tf_mixnet_l           | 8  |     pass      |
|            tinynet_a            | 8  |     pass      |
|        tnt_s_patch16_224        | 8  |     pass      |
|        twins_pcpvt_base         | 8  |     pass      |
|         visformer_small         | 8  |     pass      |
|      vit_base_patch16_224       | 8  |     pass      |
|         coat_lite_mini          | 8  |  fail_to_run  |
|            lcnet_050            | 8  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+------+-----------+
|              name               |  bs  | inductor  |
+---------------------------------+------+-----------+
|  swin_base_patch4_window7_224   |  64  | 91.177324 |
|          pnasnet5large          |  16  | 89.124834 |
|           tf_mixnet_l           | 128  | 78.548549 |
|          cait_m36_384           |  4   | 76.453285 |
|      xcit_large_24_p8_224       |  16  | 74.258612 |
|             dpn107              |  64  | 70.835279 |
|          jx_nest_base           |  32  | 67.040647 |
|        tnt_s_patch16_224        | 128  | 64.819757 |
|            levit_128            | 1024 | 61.371258 |
|        twins_pcpvt_base         | 128  | 60.56163  |
|         crossvit_9_240          | 256  | 59.605599 |
|           mobilevit_s           |  64  | 59.518765 |
|        res2net50_14w_8s         | 128  | 59.022935 |
|           rexnet_100            | 256  | 58.426263 |
|            mixnet_l             | 128  | 58.131469 |
|        sebotnet33ts_256         |  64  | 56.462241 |
|        eca_halonext26ts         | 128  | 56.395291 |
|         poolformer_m36          |  64  | 55.713106 |
|           volo_d1_224           |  64  | 53.182824 |
|          ghostnet_100           | 512  | 53.166815 |
|            hrnet_w18            | 128  | 52.727886 |
|           dm_nfnet_f0           | 128  | 48.221749 |
|         visformer_small         | 128  | 47.523331 |
|           resnest101e           |  64  | 47.236213 |
|       eca_botnext26ts_256       | 128  | 45.616527 |
|           convit_base           |  64  | 45.587652 |
|        res2net101_26w_4s        | 128  | 43.985893 |
|          convnext_base          |  64  | 42.82817  |
|       tf_efficientnet_b0        | 128  | 39.956951 |
|            nfnet_l0             | 128  | 37.609896 |
|       gluon_inception_v3        | 256  | 35.411351 |
|            fbnetv3_b            | 256  | 34.87794  |
|            tinynet_a            | 128  | 34.517739 |
|          botnet26t_256          | 128  | 34.365234 |
|          inception_v3           | 128  | 34.221671 |
|        adv_inception_v3         | 128  | 34.20372  |
|           res2next50            | 128  | 34.075301 |
|          gmixer_24_224          | 128  | 34.056094 |
|            pit_b_224            |  64  | 31.676375 |
|          gmlp_s16_224           | 128  | 31.461042 |
|             dla102              | 128  | 27.942007 |
|          cspdarknet53           |  64  | 26.601049 |
| deit_base_distilled_patch16_224 |  64  | 26.213909 |
|      vit_base_patch16_224       |  64  | 24.393738 |
|      mobilenetv3_large_100      | 512  | 24.301115 |
|        ese_vovnet19b_dw         | 256  | 22.663396 |
|      beit_base_patch16_224      |  64  | 20.365129 |
|          mixer_b16_224          | 128  | 19.954663 |
|        convmixer_768_32         |  32  | 19.25078  |
|           regnety_002           | 1024 |  19.0024  |
|          resmlp_12_224          | 128  | 18.530484 |
|            repvgg_a2            | 128  | 15.446963 |
|     swsl_resnext101_32x16d      |  32  | 14.579849 |
|           selecsls42b           | 128  | 14.389215 |
|            lcnet_050            | 256  | 14.207951 |
|         mobilenetv2_100         | 128  | 11.393558 |
|            gernet_l             | 128  | 10.236513 |
|           fbnetc_100            | 512  | 10.174229 |
|          spnasnet_100           | 128  | 10.099265 |
|           mnasnet_100           | 512  |  9.22173  |
+---------------------------------+------+-----------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.995992 |
|           fbnetc_100            | 512  | 0.993749 |
|           rexnet_100            | 256  | 0.993563 |
|       gluon_inception_v3        | 256  | 0.993292 |
|           regnety_002           | 1024 | 0.993204 |
|           mnasnet_100           | 512  | 0.992791 |
|            fbnetv3_b            | 256  | 0.992525 |
|           dm_nfnet_f0           | 128  | 0.992409 |
|      mobilenetv3_large_100      | 512  | 0.992292 |
|          ghostnet_100           | 512  | 0.991807 |
|            levit_128            | 1024 | 0.991712 |
|            nfnet_l0             | 128  | 0.991172 |
|             dla102              | 128  | 0.990321 |
|        res2net101_26w_4s        | 128  | 0.98997  |
|           convit_base           |  64  | 0.989624 |
|       eca_botnext26ts_256       | 128  | 0.989493 |
|           res2next50            | 128  | 0.989357 |
|      xcit_large_24_p8_224       |  16  | 0.989217 |
|          mixer_b16_224          | 128  | 0.989129 |
|        eca_halonext26ts         | 128  | 0.989049 |
|           resnest101e           |  64  | 0.988751 |
|             dpn107              |  64  | 0.988206 |
|          gmlp_s16_224           | 128  | 0.988106 |
|        twins_pcpvt_base         | 128  | 0.987961 |
|          gmixer_24_224          | 128  | 0.987768 |
|           tf_mixnet_l           | 128  | 0.987376 |
|        adv_inception_v3         | 128  | 0.987349 |
|          inception_v3           | 128  | 0.987346 |
|        convmixer_768_32         |  32  | 0.987298 |
|          botnet26t_256          | 128  | 0.98722  |
|          cspdarknet53           |  64  | 0.986676 |
|        tnt_s_patch16_224        | 128  | 0.986657 |
|            mixnet_l             | 128  | 0.986409 |
|       tf_efficientnet_b0        | 128  | 0.986384 |
|         visformer_small         | 128  | 0.986347 |
|        res2net50_14w_8s         | 128  | 0.986045 |
|            gernet_l             | 128  | 0.985705 |
|          convnext_base          |  64  | 0.985619 |
|          pnasnet5large          |  16  | 0.985423 |
|            tinynet_a            | 128  | 0.984833 |
|      vit_base_patch16_224       |  64  | 0.984648 |
|      beit_base_patch16_224      |  64  | 0.984644 |
|         mobilenetv2_100         | 128  | 0.984614 |
|         crossvit_9_240          | 256  | 0.984463 |
|            hrnet_w18            | 128  | 0.984105 |
|            pit_b_224            |  64  | 0.983691 |
|         poolformer_m36          |  64  | 0.982888 |
| deit_base_distilled_patch16_224 |  64  | 0.982661 |
|  swin_base_patch4_window7_224   |  64  | 0.982282 |
|           selecsls42b           | 128  | 0.982225 |
|          resmlp_12_224          | 128  | 0.981345 |
|           mobilevit_s           |  64  | 0.981265 |
|          spnasnet_100           | 128  | 0.980886 |
|     swsl_resnext101_32x16d      |  32  | 0.978437 |
|           volo_d1_224           |  64  | 0.977897 |
|            lcnet_050            | 256  | 0.975482 |
|          cait_m36_384           |  4   | 0.97531  |
|          jx_nest_base           |  32  | 0.973684 |
|            repvgg_a2            | 128  | 0.971329 |
|        sebotnet33ts_256         |  64  | 0.836007 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          cait_m36_384           |  4   | 469.368812 |
|      xcit_large_24_p8_224       |  16  | 278.363601 |
|           convit_base           |  64  | 250.154492 |
|        tnt_s_patch16_224        | 128  | 243.136388 |
|             dpn107              |  64  | 225.167462 |
|            levit_128            | 1024 | 213.533276 |
|           dm_nfnet_f0           | 128  | 211.248632 |
|       gluon_inception_v3        | 256  | 182.083804 |
|        ese_vovnet19b_dw         | 256  | 174.044765 |
|          mixer_b16_224          | 128  | 172.822027 |
|          convnext_base          |  64  | 170.718952 |
|            nfnet_l0             | 128  | 160.648258 |
|         poolformer_m36          |  64  | 155.275732 |
|        twins_pcpvt_base         | 128  | 145.823466 |
|           tf_mixnet_l           | 128  | 141.224441 |
|  swin_base_patch4_window7_224   |  64  | 139.851458 |
|         crossvit_9_240          | 256  | 135.066424 |
|            mixnet_l             | 128  | 133.966309 |
|          ghostnet_100           | 512  | 125.367487 |
|           volo_d1_224           |  64  | 124.570513 |
|          jx_nest_base           |  32  | 123.015127 |
|        sebotnet33ts_256         |  64  | 121.530888 |
|           mobilevit_s           |  64  | 118.228965 |
|          gmixer_24_224          | 128  | 111.051479 |
|      vit_base_patch16_224       |  64  | 110.709242 |
|          pnasnet5large          |  16  | 109.375372 |
|        eca_halonext26ts         | 128  | 107.586033 |
|        res2net101_26w_4s        | 128  | 107.237361 |
|      beit_base_patch16_224      |  64  | 107.033596 |
|          resmlp_12_224          | 128  | 106.031841 |
|            pit_b_224            |  64  | 103.721146 |
|        convmixer_768_32         |  32  | 102.906349 |
|       eca_botnext26ts_256       | 128  | 102.293137 |
| deit_base_distilled_patch16_224 |  64  | 102.05789  |
|          gmlp_s16_224           | 128  | 101.529682 |
|           fbnetc_100            | 512  | 100.088598 |
|     swsl_resnext101_32x16d      |  32  | 94.745214  |
|          inception_v3           | 128  | 89.432991  |
|        adv_inception_v3         | 128  | 89.031165  |
|           rexnet_100            | 256  | 88.762296  |
|           regnety_002           | 1024 | 88.182448  |
|             dla102              | 128  | 87.875465  |
|            hrnet_w18            | 128  |  86.85838  |
|         visformer_small         | 128  | 86.301081  |
|            fbnetv3_b            | 256  | 84.888065  |
|        res2net50_14w_8s         | 128  | 83.991794  |
|           mnasnet_100           | 512  | 83.137584  |
|           res2next50            | 128  | 82.149956  |
|      mobilenetv3_large_100      | 512  | 82.110473  |
|           resnest101e           |  64  |  78.80414  |
|          botnet26t_256          | 128  | 72.818284  |
|       tf_efficientnet_b0        | 128  | 58.013408  |
|          cspdarknet53           |  64  | 50.660602  |
|            repvgg_a2            | 128  | 48.838389  |
|            tinynet_a            | 128  | 42.832323  |
|            gernet_l             | 128  | 38.383975  |
|           selecsls42b           | 128  | 34.853469  |
|         mobilenetv2_100         | 128  | 22.808204  |
|          spnasnet_100           | 128  | 20.209499  |
|            lcnet_050            | 256  |  8.308169  |
+---------------------------------+------+------------+

@WeizhuoZhang-intel
Copy link
Contributor

[amp] Performance Dashboard for amp precision -- Single-core Single-thread (2024-05-05 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 6d30803d64953955df63da56833bf4eb52249aae
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+06ad737
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c7i.metal-24xl
CPU Model Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory 192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS Ubuntu 22.04.3 LTS
Kernel 6.2.0-1017-aws
Microcode 0x2b0004d0
GCC gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.8.18
OpenSSL OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 90%, 71/79 | 98%, 45/46  | 98%, 59/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.61x    |    1.77x    |    2.71x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   18.24    |    21.37    |    28.03    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.89x    |    0.90x    |    0.85x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 52.21147  |
|     pyhpc_equation_of_state     |    1    | 21.430476 |
|              dcgan              |    1    | 9.004245  |
|          squeezenet1_1          |    1    | 8.491741  |
|          lennard_jones          |    1    |  6.33413  |
|          timm_resnest           |    1    | 6.132899  |
|          maml_omniglot          |    5    | 5.992812  |
|         opacus_cifar10          |    1    | 5.796949  |
|      functorch_dp_cifar10       |    1    | 5.768105  |
|            resnet18             |    1    | 5.614826  |
|            resnet50             |    1    | 5.172545  |
|         LearningToPaint         |    1    | 5.023303  |
|         resnext50_32x4d         |    1    | 4.913265  |
|           timm_nfnet            |    1    | 4.863226  |
|              vgg16              |    1    | 4.826957  |
|          mobilenet_v2           |    1    | 4.506866  |
|             alexnet             |    1    | 4.394694  |
|            resnet152            |    1    | 4.284473  |
|           mnasnet1_0            |    1    | 4.196644  |
|           timm_vovnet           |    1    | 4.155871  |
|     nvidia_deeprecommender      |    1    | 4.055256  |
|             yolov3              |    1    | 3.956901  |
|      doctr_reco_predictor       |    1    | 3.955667  |
|              llama              |    1    | 3.639185  |
|       mobilenet_v3_large        |    1    | 3.610434  |
|          basic_gnn_gcn          |    1    |  3.42864  |
|     functorch_maml_omniglot     |    1    | 3.323282  |
|       shufflenet_v2_x1_0        |    1    | 3.304017  |
|           densenet121           |    1    | 3.157848  |
|           timm_regnet           |    1    | 3.049185  |
|         phlippe_resnet          |    1    | 2.867074  |
|              dlrm               |    1    | 2.796146  |
|    detectron2_fcos_r_50_fpn     |    1    | 2.617656  |
|        phlippe_densenet         |    1    | 2.589908  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  2.43182  |
|               drq               |    1    | 2.385012  |
|        timm_efficientnet        |    1    | 2.351837  |
|          BERT_pytorch           |    1    |  2.20884  |
|          pytorch_unet           |    1    |  2.13076  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 2.088243  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2.007774  |
|          basic_gnn_gin          |    1    | 1.959307  |
|        soft_actor_critic        |   256   | 1.943638  |
|       Background_Matting        |    1    | 1.830815  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.830225  |
|     timm_vision_transformer     |    1    | 1.814635  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.808033  |
|         basic_gnn_sage          |    1    | 1.798004  |
|             hf_Bert             |    1    | 1.721886  |
|        basic_gnn_edgecnn        |    1    | 1.697353  |
|             hf_GPT2             |    1    | 1.683714  |
|          hf_Bert_large          |    1    | 1.646269  |
|          hf_DistilBert          |    1    | 1.637485  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.634093  |
|         pytorch_stargan         |   16    |  1.59992  |
|  timm_vision_transformer_large  |    1    | 1.549426  |
|             hf_Bart             |    1    | 1.510761  |
|          hf_GPT2_large          |    1    | 1.476766  |
|           hf_T5_base            |    1    | 1.434884  |
|        hf_distil_whisper        |    1    | 1.322582  |
|           hf_BigBird            |    1    | 1.321779  |
|            moondream            |    1    | 1.280893  |
|       speech_transformer        |    1    | 1.268162  |
|           hf_T5_large           |    1    | 1.210618  |
|            hf_Albert            |    1    | 1.187718  |
|          fastNLP_Bert           |    1    | 1.156668  |
|              hf_T5              |    1    | 1.061785  |
|           hf_Reformer           |    1    | 1.033236  |
|             demucs              |    1    | 1.027782  |
|      torch_multimodal_clip      |    1    | 1.023415  |
|           tts_angular           |    1    | 0.997998  |
|     resnet50_quantized_qat      |    1    |  0.98492  |
|   mobilenet_v2_quantized_qat    |    1    | 0.980183  |
|              maml               |    1    | 0.954222  |
|          hf_Longformer          |    1    | 0.908672  |
|       doctr_det_predictor       |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
|              moco               |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|              llama              |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|               drq               |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|            resnet152            |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|          pytorch_unet           |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|       speech_transformer        |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|              moco               |    0    | model_fail_to_load |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
|        timm_efficientdet        |    0    | model_fail_to_load |
|           Super_SloMo           |    1    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_101_fpn |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_101_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    1    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    1    |   fail_accuracy    |
|       doctr_det_predictor       |    0    | eager_fail_to_run  |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           densenet121           |    1    | 70.235776 |
|           hf_BigBird            |    1    | 65.785817 |
|    detectron2_fcos_r_50_fpn     |    1    | 51.092154 |
|              maml               |    1    | 47.733789 |
|           hf_T5_large           |    1    | 46.955904 |
|          hf_Longformer          |    1    | 40.178389 |
|           timm_nfnet            |    1    | 37.099436 |
|            moondream            |    1    | 36.485676 |
|           hf_Reformer           |    1    | 36.148428 |
|      torch_multimodal_clip      |    1    | 33.170065 |
|       speech_transformer        |    1    | 33.153806 |
|        phlippe_densenet         |    1    | 32.684963 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 30.407717 |
|  timm_vision_transformer_large  |    1    | 29.266716 |
|             demucs              |    1    | 28.991771 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 28.526144 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 27.464441 |
|          hf_GPT2_large          |    1    | 27.26044  |
|        timm_efficientnet        |    1    | 25.770806 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 25.561427 |
|        hf_distil_whisper        |    1    | 24.057574 |
|              hf_T5              |    1    | 23.790334 |
|             yolov3              |    1    | 22.745152 |
|         opacus_cifar10          |    1    | 21.625773 |
|              llama              |    1    | 21.043615 |
|      functorch_dp_cifar10       |    1    | 20.481783 |
|          hf_Bert_large          |    1    | 19.915094 |
|          timm_resnest           |    1    | 19.289963 |
|             hf_GPT2             |    1    | 18.958156 |
|       shufflenet_v2_x1_0        |    1    | 18.414967 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 18.198572 |
|       mobilenet_v3_large        |    1    | 17.379051 |
|     timm_vision_transformer     |    1    | 17.378674 |
|          BERT_pytorch           |    1    | 16.756259 |
|          fastNLP_Bert           |    1    | 16.689059 |
|           timm_regnet           |    1    | 16.225257 |
|             hf_Bart             |    1    | 16.075916 |
|           timm_vovnet           |    1    | 16.03089  |
|           hf_T5_base            |    1    | 15.192929 |
|            hf_Albert            |    1    | 14.521671 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 14.307266 |
|             hf_Bert             |    1    | 13.829529 |
|       Background_Matting        |    1    | 12.633867 |
|          squeezenet1_1          |    1    | 12.585252 |
|         pytorch_stargan         |   16    | 12.388548 |
|          hf_DistilBert          |    1    | 11.893906 |
|            resnet152            |    1    | 11.417646 |
|      doctr_reco_predictor       |    1    | 10.838363 |
|              vgg16              |    1    | 9.478564  |
|     pyhpc_isoneutral_mixing     |    1    | 9.101195  |
|               drq               |    1    |  8.68615  |
|          pytorch_unet           |    1    | 8.452536  |
|          basic_gnn_gcn          |    1    | 8.451511  |
|            resnet50             |    1    | 8.179347  |
|         resnext50_32x4d         |    1    | 8.125034  |
|             alexnet             |    1    | 8.040175  |
|     functorch_maml_omniglot     |    1    | 7.713434  |
|              dlrm               |    1    | 7.581949  |
|     nvidia_deeprecommender      |    1    | 7.547717  |
|          maml_omniglot          |    5    | 7.403296  |
|          mobilenet_v2           |    1    | 7.259732  |
|            resnet18             |    1    | 7.216886  |
|         basic_gnn_sage          |    1    | 7.204096  |
|           mnasnet1_0            |    1    | 6.962729  |
|     pyhpc_equation_of_state     |    1    | 6.583431  |
|         LearningToPaint         |    1    | 6.261635  |
|        soft_actor_critic        |   256   | 6.243375  |
|        basic_gnn_edgecnn        |    1    | 6.218377  |
|         phlippe_resnet          |    1    | 5.962756  |
|          basic_gnn_gin          |    1    | 5.482142  |
|          lennard_jones          |    1    | 4.629364  |
|              dcgan              |    1    |  4.56464  |
|           tts_angular           |    1    | 4.441332  |
|   mobilenet_v2_quantized_qat    |    1    |  0.19128  |
|     resnet50_quantized_qat      |    1    | 0.174203  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
|       doctr_det_predictor       |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.986924 |
|             demucs              |    1    | 0.98453  |
|           hf_T5_large           |    1    | 0.980712 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.975925 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.975552 |
|           hf_T5_base            |    1    | 0.971678 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.970713 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.970165 |
|          pytorch_unet           |    1    | 0.967984 |
|              llama              |    1    | 0.964542 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.963102 |
|        basic_gnn_edgecnn        |    1    | 0.96125  |
|           hf_BigBird            |    1    | 0.95992  |
|         LearningToPaint         |    1    | 0.954421 |
|       Background_Matting        |    1    | 0.95395  |
|     resnet50_quantized_qat      |    1    | 0.945546 |
|          basic_gnn_gcn          |    1    | 0.943883 |
|      doctr_reco_predictor       |    1    | 0.943679 |
|      torch_multimodal_clip      |    1    | 0.941029 |
|         basic_gnn_sage          |    1    | 0.935273 |
|          basic_gnn_gin          |    1    | 0.935199 |
|        hf_distil_whisper        |    1    | 0.929922 |
|       speech_transformer        |    1    | 0.927133 |
|         pytorch_stargan         |   16    | 0.926802 |
|             hf_Bert             |    1    | 0.922364 |
|          fastNLP_Bert           |    1    | 0.92197  |
|            hf_Albert            |    1    | 0.917906 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.913007 |
|   mobilenet_v2_quantized_qat    |    1    | 0.912393 |
|          BERT_pytorch           |    1    | 0.911572 |
|          hf_GPT2_large          |    1    | 0.909866 |
|             hf_GPT2             |    1    | 0.907903 |
|              hf_T5              |    1    | 0.905451 |
|          hf_DistilBert          |    1    | 0.902399 |
|          hf_Longformer          |    1    | 0.897193 |
|         opacus_cifar10          |    1    | 0.893549 |
|               drq               |    1    | 0.892308 |
|             hf_Bart             |    1    | 0.89184  |
|        soft_actor_critic        |   256   | 0.888734 |
|           tts_angular           |    1    | 0.885989 |
|          mobilenet_v2           |    1    | 0.872309 |
|  timm_vision_transformer_large  |    1    | 0.871357 |
|        timm_efficientnet        |    1    | 0.871344 |
|          squeezenet1_1          |    1    | 0.868813 |
|           timm_nfnet            |    1    | 0.868277 |
|          maml_omniglot          |    5    | 0.864734 |
|       mobilenet_v3_large        |    1    | 0.864714 |
|            moondream            |    1    | 0.864332 |
|          lennard_jones          |    1    | 0.861775 |
|              dcgan              |    1    | 0.860182 |
|     functorch_maml_omniglot     |    1    | 0.860129 |
|              vgg16              |    1    | 0.858815 |
|          hf_Bert_large          |    1    | 0.858636 |
|           mnasnet1_0            |    1    | 0.858365 |
|             alexnet             |    1    | 0.856134 |
|     timm_vision_transformer     |    1    | 0.855961 |
|          timm_resnest           |    1    | 0.855207 |
|         phlippe_resnet          |    1    | 0.852459 |
|      functorch_dp_cifar10       |    1    | 0.848704 |
|       shufflenet_v2_x1_0        |    1    | 0.843176 |
|     nvidia_deeprecommender      |    1    | 0.837605 |
|     pyhpc_equation_of_state     |    1    | 0.833966 |
|             yolov3              |    1    | 0.83226  |
|           hf_Reformer           |    1    | 0.830001 |
|         resnext50_32x4d         |    1    | 0.827768 |
|            resnet18             |    1    | 0.817341 |
|     pyhpc_isoneutral_mixing     |    1    | 0.809875 |
|           timm_vovnet           |    1    | 0.809299 |
|        phlippe_densenet         |    1    | 0.808305 |
|            resnet50             |    1    | 0.806084 |
|           densenet121           |    1    | 0.804506 |
|              maml               |    1    | 0.795708 |
|           timm_regnet           |    1    | 0.795129 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.762097 |
|            resnet152            |    1    | 0.761934 |
|        timm_efficientdet        |    0    |   0.0    |
|       doctr_det_predictor       |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|           hf_T5_base            |    1    | 9750.331088 |
|          hf_GPT2_large          |    1    | 3039.458203 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 2930.193461 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2891.820017 |
|            moondream            |    1    | 2439.476126 |
|           hf_T5_large           |    1    | 2177.256537 |
|        hf_distil_whisper        |    1    | 2002.227984 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1752.412144 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1740.77404  |
|          pytorch_unet           |    1    | 1501.672176 |
|       Background_Matting        |    1    | 1464.672079 |
|             demucs              |    1    | 1195.44165  |
|  timm_vision_transformer_large  |    1    | 957.593714  |
|    detectron2_fcos_r_50_fpn     |    1    | 707.997849  |
|           hf_BigBird            |    1    | 515.151731  |
|          hf_Longformer          |    1    | 489.292811  |
|      torch_multimodal_clip      |    1    | 456.056737  |
|          hf_Bert_large          |    1    | 439.110268  |
|             hf_Bart             |    1    | 241.361873  |
|              hf_T5              |    1    | 235.546363  |
|         pytorch_stargan         |   16    | 234.791809  |
|          fastNLP_Bert           |    1    | 221.837537  |
|            hf_Albert            |    1    | 194.282705  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 179.091418  |
|       speech_transformer        |    1    | 176.525437  |
|             hf_Bert             |    1    | 171.434731  |
|           hf_Reformer           |    1    | 169.383076  |
|             hf_GPT2             |    1    | 130.256294  |
|          hf_DistilBert          |    1    | 102.768019  |
|        basic_gnn_edgecnn        |    1    |  87.930523  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  87.72501   |
|             yolov3              |    1    |  69.667276  |
|          BERT_pytorch           |    1    |  50.044383  |
|              maml               |    1    |  48.066106  |
|              vgg16              |    1    |  40.762766  |
|     nvidia_deeprecommender      |    1    |  39.63868   |
|           timm_nfnet            |    1    |  35.810936  |
|           timm_regnet           |    1    |  35.497468  |
|           tts_angular           |    1    |  30.250399  |
|            resnet152            |    1    |  28.74294   |
|          basic_gnn_gcn          |    1    |  23.775951  |
|     timm_vision_transformer     |    1    |  18.493781  |
|         basic_gnn_sage          |    1    |   15.989    |
|          basic_gnn_gin          |    1    |  15.239309  |
|           densenet121           |    1    |  12.985518  |
|         resnext50_32x4d         |    1    |  12.887431  |
|           timm_vovnet           |    1    |  12.325185  |
|             alexnet             |    1    |  11.239972  |
|              llama              |    1    |  11.177388  |
|            resnet50             |    1    |  10.658693  |
|        timm_efficientnet        |    1    |  9.314481   |
|     resnet50_quantized_qat      |    1    |  7.595879   |
|          timm_resnest           |    1    |  6.173713   |
|      doctr_reco_predictor       |    1    |  5.292075   |
|   mobilenet_v2_quantized_qat    |    1    |  4.832742   |
|       mobilenet_v3_large        |    1    |  3.818884   |
|            resnet18             |    1    |  3.804434   |
|           mnasnet1_0            |    1    |  3.258779   |
|          mobilenet_v2           |    1    |  3.196078   |
|       shufflenet_v2_x1_0        |    1    |  3.010169   |
|         LearningToPaint         |    1    |  2.386194   |
|          squeezenet1_1          |    1    |  1.913589   |
|        phlippe_densenet         |    1    |  1.782253   |
|      functorch_dp_cifar10       |    1    |  1.758747   |
|         opacus_cifar10          |    1    |  1.718384   |
|               drq               |    1    |  0.950324   |
|         phlippe_resnet          |    1    |  0.645251   |
|        soft_actor_critic        |   256   |  0.578426   |
|              dcgan              |    1    |  0.550734   |
|     functorch_maml_omniglot     |    1    |  0.510122   |
|              dlrm               |    1    |  0.483933   |
|          maml_omniglot          |    5    |  0.258571   |
|     pyhpc_isoneutral_mixing     |    1    |  0.027739   |
|     pyhpc_equation_of_state     |    1    |  0.021813   |
|          lennard_jones          |    1    |  0.017783   |
|        timm_efficientdet        |    0    |     0.0     |
|              moco               |    0    |     0.0     |
|       doctr_det_predictor       |    0    |     0.0     |
|         DALLE2_pytorch          |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 3.436515 |
|     PegasusForConditionalGeneration     | 1  | 2.840305 |
|             XGLMForCausalLM             | 1  | 2.681927 |
|          DistilBertForMaskedLM          | 1  | 2.666023 |
|     MobileBertForQuestionAnswering      | 1  | 2.657266 |
|     M2M100ForConditionalGeneration      | 1  | 2.600208 |
|           PegasusForCausalLM            | 1  | 2.556462 |
|     DistilBertForQuestionAnswering      | 1  | 2.555328 |
|          BlenderbotForCausalLM          | 1  | 2.525111 |
|            YituTechConvBert             | 1  | 2.446823 |
|       MT5ForConditionalGeneration       | 1  | 2.42405  |
|       BlenderbotSmallForCausalLM        | 1  |  2.4068  |
|         Speech2Text2ForCausalLM         | 1  | 2.327706 |
| BlenderbotSmallForConditionalGeneration | 1  | 2.272842 |
|           DebertaForMaskedLM            | 1  | 1.88768  |
|       DebertaForQuestionAnswering       | 1  | 1.882307 |
|            XLNetLMHeadModel             | 1  | 1.862015 |
|                CamemBert                | 1  | 1.805854 |
|           RobertaForCausalLM            | 1  | 1.799586 |
|           LayoutLMForMaskedLM           | 1  | 1.787587 |
|             BertForMaskedLM             | 1  | 1.777414 |
|           ElectraForCausalLM            | 1  | 1.737744 |
|    MegatronBertForQuestionAnswering     | 1  | 1.726351 |
|    LayoutLMForSequenceClassification    | 1  | 1.72136  |
|        BertForQuestionAnswering         | 1  | 1.720346 |
|            TrOCRForCausalLM             | 1  | 1.70121  |
|       RobertaForQuestionAnswering       | 1  | 1.695977 |
|         MegatronBertForCausalLM         | 1  | 1.695892 |
|       ElectraForQuestionAnswering       | 1  | 1.607902 |
|               DistillGPT2               | 1  | 1.548959 |
|          DebertaV2ForMaskedLM           | 1  | 1.484392 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.460787 |
|             OPTForCausalLM              | 1  | 1.458382 |
|      GPT2ForSequenceClassification      | 1  | 1.445689 |
|             BartForCausalLM             | 1  | 1.381504 |
|      BartForConditionalGeneration       | 1  | 1.335668 |
|            PLBartForCausalLM            | 1  | 1.330378 |
|     PLBartForConditionalGeneration      | 1  | 1.325697 |
|      MBartForConditionalGeneration      | 1  |  1.3099  |
|            MBartForCausalLM             | 1  | 1.307618 |
|               GoogleFnet                | 1  | 1.222838 |
|                 T5Small                 | 1  | 1.133812 |
|       T5ForConditionalGeneration        | 1  | 1.13194  |
|       AlbertForQuestionAnswering        | 1  | 1.117005 |
|            AlbertForMaskedLM            | 1  | 1.112303 |
|          AllenaiLongformerBase          | 1  | 0.811957 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |  fail_accuracy   |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 47.973504 |
|          MobileBertForMaskedLM          | 1  | 36.281786 |
|     MobileBertForQuestionAnswering      | 1  | 36.238978 |
|     PegasusForConditionalGeneration     | 1  | 33.832998 |
|     M2M100ForConditionalGeneration      | 1  | 32.712143 |
|      MBartForConditionalGeneration      | 1  | 31.81215  |
|          BlenderbotForCausalLM          | 1  | 31.194935 |
|             XGLMForCausalLM             | 1  | 28.492111 |
|       MT5ForConditionalGeneration       | 1  | 27.961401 |
|            XLNetLMHeadModel             | 1  | 27.162695 |
|         MegatronBertForCausalLM         | 1  | 25.534855 |
|       T5ForConditionalGeneration        | 1  | 24.968191 |
|                 T5Small                 | 1  | 24.940004 |
|    MegatronBertForQuestionAnswering     | 1  | 24.586127 |
|          DebertaV2ForMaskedLM           | 1  | 23.947924 |
|      DebertaV2ForQuestionAnswering      | 1  | 23.870234 |
| BlenderbotSmallForConditionalGeneration | 1  | 22.46209  |
|           PegasusForCausalLM            | 1  | 21.855522 |
|      BartForConditionalGeneration       | 1  | 21.493054 |
|            YituTechConvBert             | 1  | 20.987733 |
|     PLBartForConditionalGeneration      | 1  | 20.346349 |
|            MBartForCausalLM             | 1  | 20.040824 |
|      GPT2ForSequenceClassification      | 1  | 19.641705 |
|             OPTForCausalLM              | 1  | 19.314844 |
|               DistillGPT2               | 1  | 17.292284 |
|            AlbertForMaskedLM            | 1  | 16.639939 |
|       DebertaForQuestionAnswering       | 1  | 16.546747 |
|       AlbertForQuestionAnswering        | 1  | 16.502185 |
|            TrOCRForCausalLM             | 1  | 16.430379 |
|           DebertaForMaskedLM            | 1  | 16.364093 |
|       RobertaForQuestionAnswering       | 1  | 16.028968 |
|           RobertaForCausalLM            | 1  | 15.982883 |
|                CamemBert                | 1  | 15.956364 |
|    LayoutLMForSequenceClassification    | 1  | 15.911589 |
|           LayoutLMForMaskedLM           | 1  | 15.199003 |
|       ElectraForQuestionAnswering       | 1  | 15.19551  |
|        BertForQuestionAnswering         | 1  | 15.175627 |
|           ElectraForCausalLM            | 1  | 15.144625 |
|             BertForMaskedLM             | 1  | 15.09386  |
|       BlenderbotSmallForCausalLM        | 1  | 14.097025 |
|         Speech2Text2ForCausalLM         | 1  | 13.983223 |
|               GoogleFnet                | 1  | 13.539321 |
|     DistilBertForQuestionAnswering      | 1  | 13.302558 |
|          DistilBertForMaskedLM          | 1  | 13.233906 |
|            PLBartForCausalLM            | 1  | 13.201669 |
|             BartForCausalLM             | 1  | 12.683316 |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.976243 |
|            XLNetLMHeadModel             | 1  | 0.972265 |
|            MBartForCausalLM             | 1  | 0.961037 |
|               DistillGPT2               | 1  | 0.954621 |
|      GPT2ForSequenceClassification      | 1  | 0.945307 |
|            PLBartForCausalLM            | 1  | 0.944243 |
|       MT5ForConditionalGeneration       | 1  | 0.938419 |
|           LayoutLMForMaskedLM           | 1  | 0.937822 |
|       RobertaForQuestionAnswering       | 1  | 0.937729 |
|     PLBartForConditionalGeneration      | 1  | 0.936558 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.935084 |
|                CamemBert                | 1  | 0.93494  |
|          BlenderbotForCausalLM          | 1  | 0.933294 |
|         MegatronBertForCausalLM         | 1  | 0.932078 |
|            YituTechConvBert             | 1  | 0.931276 |
|             BertForMaskedLM             | 1  | 0.931248 |
|    MegatronBertForQuestionAnswering     | 1  | 0.93068  |
|           DebertaForMaskedLM            | 1  | 0.930294 |
|       T5ForConditionalGeneration        | 1  | 0.928257 |
|            TrOCRForCausalLM             | 1  | 0.927685 |
|                 T5Small                 | 1  | 0.92572  |
|             BartForCausalLM             | 1  | 0.925575 |
|    LayoutLMForSequenceClassification    | 1  | 0.923455 |
|     M2M100ForConditionalGeneration      | 1  | 0.923216 |
|           RobertaForCausalLM            | 1  | 0.923195 |
|      BartForConditionalGeneration       | 1  | 0.921292 |
|       BlenderbotSmallForCausalLM        | 1  | 0.91951  |
|        BertForQuestionAnswering         | 1  | 0.918576 |
|             XGLMForCausalLM             | 1  | 0.918301 |
|          DistilBertForMaskedLM          | 1  | 0.910291 |
|           PegasusForCausalLM            | 1  | 0.909245 |
|          AllenaiLongformerBase          | 1  | 0.903952 |
|          DebertaV2ForMaskedLM           | 1  | 0.901869 |
|      MBartForConditionalGeneration      | 1  | 0.899607 |
|           ElectraForCausalLM            | 1  | 0.894904 |
|       DebertaForQuestionAnswering       | 1  | 0.886463 |
|     PegasusForConditionalGeneration     | 1  | 0.885505 |
|       ElectraForQuestionAnswering       | 1  | 0.876477 |
|     DistilBertForQuestionAnswering      | 1  | 0.875235 |
|         Speech2Text2ForCausalLM         | 1  | 0.864074 |
|               GoogleFnet                | 1  | 0.854385 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.850826 |
|          MobileBertForMaskedLM          | 1  | 0.805429 |
|     MobileBertForQuestionAnswering      | 1  | 0.765237 |
|            AlbertForMaskedLM            | 1  | 0.688849 |
|       AlbertForQuestionAnswering        | 1  | 0.66783  |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+-------------+
|                  name                   | bs |  inductor   |
+-----------------------------------------+----+-------------+
|            AlbertForMaskedLM            | 1  | 4193.929411 |
|       AlbertForQuestionAnswering        | 1  | 4152.876601 |
|             OPTForCausalLM              | 1  | 2124.33428  |
|      MBartForConditionalGeneration      | 1  | 1777.626585 |
|      BartForConditionalGeneration       | 1  | 1407.088856 |
|          DebertaV2ForMaskedLM           | 1  | 1345.027874 |
|          AllenaiLongformerBase          | 1  | 1098.062534 |
|            XLNetLMHeadModel             | 1  | 1045.055817 |
|      DebertaV2ForQuestionAnswering      | 1  | 1032.313729 |
|            MBartForCausalLM             | 1  | 961.427521  |
|       T5ForConditionalGeneration        | 1  | 807.744501  |
|                 T5Small                 | 1  | 807.269585  |
|          BlenderbotForCausalLM          | 1  | 753.483181  |
|     PLBartForConditionalGeneration      | 1  | 648.174555  |
|             BartForCausalLM             | 1  | 612.032841  |
|         MegatronBertForCausalLM         | 1  | 506.403705  |
|      GPT2ForSequenceClassification      | 1  | 467.478661  |
|    MegatronBertForQuestionAnswering     | 1  | 462.678551  |
|               GoogleFnet                | 1  | 449.456228  |
|            PLBartForCausalLM            | 1  | 371.941005  |
|             XGLMForCausalLM             | 1  | 241.136729  |
|           DebertaForMaskedLM            | 1  | 213.480609  |
|     M2M100ForConditionalGeneration      | 1  | 208.854754  |
|           RobertaForCausalLM            | 1  | 205.208656  |
|            YituTechConvBert             | 1  | 195.712442  |
|           LayoutLMForMaskedLM           | 1  | 180.544672  |
|             BertForMaskedLM             | 1  | 180.090917  |
|                CamemBert                | 1  | 179.099088  |
|     PegasusForConditionalGeneration     | 1  |  178.67339  |
|            TrOCRForCausalLM             | 1  | 173.221835  |
|               DistillGPT2               | 1  | 165.393271  |
|       DebertaForQuestionAnswering       | 1  | 146.445316  |
|        BertForQuestionAnswering         | 1  | 140.799063  |
|    LayoutLMForSequenceClassification    | 1  | 140.563132  |
|       RobertaForQuestionAnswering       | 1  |  140.11975  |
|       MT5ForConditionalGeneration       | 1  | 110.072987  |
|           PegasusForCausalLM            | 1  |  87.319676  |
| BlenderbotSmallForConditionalGeneration | 1  |  52.680216  |
|           ElectraForCausalLM            | 1  |  51.263437  |
|          DistilBertForMaskedLM          | 1  |  29.984935  |
|          MobileBertForMaskedLM          | 1  |  29.30759   |
|       ElectraForQuestionAnswering       | 1  |  29.151044  |
|       BlenderbotSmallForCausalLM        | 1  |  28.519855  |
|     DistilBertForQuestionAnswering      | 1  |  18.064591  |
|     MobileBertForQuestionAnswering      | 1  |  16.809535  |
|         Speech2Text2ForCausalLM         | 1  |  5.483146   |
+-----------------------------------------+----+-------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|           resnest101e           | 1  | 5.709241 |
|          pnasnet5large          | 1  | 5.337391 |
|        ese_vovnet19b_dw         | 1  | 5.15619  |
|          cspdarknet53           | 1  | 4.733841 |
|         mobilenetv2_100         | 1  | 4.613892 |
|     swsl_resnext101_32x16d      | 1  | 4.526103 |
|          botnet26t_256          | 1  | 4.481049 |
|           fbnetc_100            | 1  | 4.429151 |
|           res2next50            | 1  | 4.417011 |
|             dla102              | 1  | 4.41398  |
|          inception_v3           | 1  | 4.410946 |
|       gluon_inception_v3        | 1  | 4.326604 |
|           mnasnet_100           | 1  | 4.278183 |
|        adv_inception_v3         | 1  | 4.242909 |
|          spnasnet_100           | 1  | 4.228944 |
|           selecsls42b           | 1  | 4.132823 |
|            gernet_l             | 1  | 4.05601  |
|           dm_nfnet_f0           | 1  | 3.955777 |
|        res2net50_14w_8s         | 1  | 3.874806 |
|            nfnet_l0             | 1  | 3.82268  |
|      mobilenetv3_large_100      | 1  | 3.791289 |
|        res2net101_26w_4s        | 1  | 3.77059  |
|            fbnetv3_b            | 1  | 3.757623 |
|            repvgg_a2            | 1  | 3.58549  |
|       eca_botnext26ts_256       | 1  | 3.418444 |
|           regnety_002           | 1  | 3.394093 |
|            hrnet_w18            | 1  | 3.343315 |
|            lcnet_050            | 1  | 3.268827 |
|          ghostnet_100           | 1  | 3.263654 |
|         visformer_small         | 1  | 3.059473 |
|         poolformer_m36          | 1  | 2.889601 |
|             dpn107              | 1  | 2.85251  |
|        eca_halonext26ts         | 1  | 2.847091 |
|           mobilevit_s           | 1  | 2.719679 |
|           rexnet_100            | 1  | 2.629021 |
|            tinynet_a            | 1  | 2.479511 |
|       tf_efficientnet_b0        | 1  | 2.284934 |
|            levit_128            | 1  | 2.253566 |
|           tf_mixnet_l           | 1  | 2.205349 |
|            mixnet_l             | 1  | 2.104177 |
|        twins_pcpvt_base         | 1  | 2.028606 |
|          gmixer_24_224          | 1  | 1.858376 |
|           volo_d1_224           | 1  | 1.838095 |
|        convmixer_768_32         | 1  | 1.778558 |
|          gmlp_s16_224           | 1  | 1.738847 |
|            pit_b_224            | 1  | 1.728365 |
|      beit_base_patch16_224      | 1  | 1.707935 |
|  swin_base_patch4_window7_224   | 1  | 1.617193 |
|      vit_base_patch16_224       | 1  | 1.60575  |
|      xcit_large_24_p8_224       | 1  | 1.551275 |
|          convnext_base          | 1  | 1.532978 |
|           convit_base           | 1  | 1.488587 |
|          cait_m36_384           | 1  | 1.454962 |
|         crossvit_9_240          | 1  | 1.440222 |
|        tnt_s_patch16_224        | 1  | 1.370228 |
| deit_base_distilled_patch16_224 | 1  | 1.369526 |
|        sebotnet33ts_256         | 1  | 1.362305 |
|          jx_nest_base           | 1  | 1.323015 |
|          resmlp_12_224          | 1  | 1.173804 |
|          mixer_b16_224          | 1  | 1.154118 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|          botnet26t_256          | 1  |     pass      |
|          cait_m36_384           | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|        convmixer_768_32         | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|             dla102              | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|       eca_botnext26ts_256       | 1  |     pass      |
|        eca_halonext26ts         | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|          ghostnet_100           | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|            hrnet_w18            | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|            levit_128            | 1  |     pass      |
|      xcit_large_24_p8_224       | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|           volo_d1_224           | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|          pnasnet5large          | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|        res2net101_26w_4s        | 1  |     pass      |
|        res2net50_14w_8s         | 1  |     pass      |
|           res2next50            | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|           resnest101e           | 1  |     pass      |
|           rexnet_100            | 1  |     pass      |
|        sebotnet33ts_256         | 1  |     pass      |
|           selecsls42b           | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|     swsl_resnext101_32x16d      | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|         visformer_small         | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|         coat_lite_mini          | 1  |  fail_to_run  |
|            lcnet_050            | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|          pnasnet5large          | 1  | 62.569584 |
|  swin_base_patch4_window7_224   | 1  | 61.95833  |
|           tf_mixnet_l           | 1  | 53.453168 |
|             dpn107              | 1  | 49.607847 |
|          jx_nest_base           | 1  | 44.675452 |
|           rexnet_100            | 1  | 43.60511  |
|        res2net50_14w_8s         | 1  | 43.391855 |
|         crossvit_9_240          | 1  | 41.955316 |
|          ghostnet_100           | 1  | 40.519681 |
|            mixnet_l             | 1  | 39.314701 |
|        sebotnet33ts_256         | 1  | 38.991523 |
|            levit_128            | 1  | 38.813432 |
|      xcit_large_24_p8_224       | 1  | 38.664216 |
|         poolformer_m36          | 1  | 38.658842 |
|        tnt_s_patch16_224        | 1  | 38.131312 |
|        twins_pcpvt_base         | 1  | 37.890868 |
|        eca_halonext26ts         | 1  | 37.235764 |
|           dm_nfnet_f0           | 1  | 37.233926 |
|           mobilevit_s           | 1  | 36.475158 |
|          cait_m36_384           | 1  | 35.227728 |
|         visformer_small         | 1  | 34.667152 |
|           volo_d1_224           | 1  | 32.377321 |
|           resnest101e           | 1  | 31.951529 |
|       eca_botnext26ts_256       | 1  | 31.507238 |
|       tf_efficientnet_b0        | 1  | 31.437735 |
|            hrnet_w18            | 1  | 31.161566 |
|        res2net101_26w_4s        | 1  | 29.86272  |
|            nfnet_l0             | 1  | 28.898844 |
|          convnext_base          | 1  | 28.342284 |
|          inception_v3           | 1  | 27.227652 |
|       gluon_inception_v3        | 1  | 27.197142 |
|        adv_inception_v3         | 1  | 27.196206 |
|            tinynet_a            | 1  | 26.497222 |
|           res2next50            | 1  | 25.852321 |
|           convit_base           | 1  | 25.822417 |
|            pit_b_224            | 1  | 23.392621 |
|          botnet26t_256          | 1  | 22.717367 |
|             dla102              | 1  | 20.863533 |
|          cspdarknet53           | 1  | 20.861185 |
|            fbnetv3_b            | 1  | 20.61761  |
| deit_base_distilled_patch16_224 | 1  | 19.481875 |
|          gmixer_24_224          | 1  | 18.308356 |
|      vit_base_patch16_224       | 1  | 17.609031 |
|      mobilenetv3_large_100      | 1  | 17.583993 |
|        ese_vovnet19b_dw         | 1  | 17.514963 |
|          gmlp_s16_224           | 1  | 16.872382 |
|           regnety_002           | 1  | 14.196752 |
|          resmlp_12_224          | 1  | 13.867503 |
|      beit_base_patch16_224      | 1  | 13.768181 |
|            repvgg_a2            | 1  | 12.860154 |
|          mixer_b16_224          | 1  | 12.453118 |
|        convmixer_768_32         | 1  | 12.415123 |
|           selecsls42b           | 1  | 12.074957 |
|            lcnet_050            | 1  | 10.377578 |
|     swsl_resnext101_32x16d      | 1  | 9.479505  |
|           fbnetc_100            | 1  | 7.803889  |
|          spnasnet_100           | 1  | 7.799702  |
|            gernet_l             | 1  | 7.725422  |
|         mobilenetv2_100         | 1  | 7.661911  |
|           mnasnet_100           | 1  | 7.410464  |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|            nfnet_l0             | 1  | 0.911884 |
|        convmixer_768_32         | 1  | 0.90884  |
|          convnext_base          | 1  | 0.906196 |
|          jx_nest_base           | 1  | 0.899864 |
|          resmlp_12_224          | 1  | 0.89669  |
|      beit_base_patch16_224      | 1  | 0.895802 |
|          cait_m36_384           | 1  | 0.895407 |
|           dm_nfnet_f0           | 1  | 0.894748 |
|          pnasnet5large          | 1  | 0.892135 |
|      xcit_large_24_p8_224       | 1  | 0.891259 |
|        ese_vovnet19b_dw         | 1  | 0.887857 |
|         mobilenetv2_100         | 1  | 0.88318  |
|           convit_base           | 1  | 0.883169 |
|           volo_d1_224           | 1  | 0.882438 |
|  swin_base_patch4_window7_224   | 1  | 0.882172 |
|         poolformer_m36          | 1  | 0.879996 |
|         visformer_small         | 1  | 0.878023 |
|      vit_base_patch16_224       | 1  | 0.875473 |
|          mixer_b16_224          | 1  | 0.874602 |
| deit_base_distilled_patch16_224 | 1  | 0.874546 |
|      mobilenetv3_large_100      | 1  | 0.873448 |
|           mnasnet_100           | 1  | 0.872357 |
|            pit_b_224            | 1  | 0.871918 |
|          spnasnet_100           | 1  | 0.870367 |
|          gmlp_s16_224           | 1  | 0.87003  |
|           fbnetc_100            | 1  | 0.868933 |
|          gmixer_24_224          | 1  | 0.868503 |
|            lcnet_050            | 1  | 0.86834  |
|        twins_pcpvt_base         | 1  | 0.868172 |
|            tinynet_a            | 1  | 0.866224 |
|       eca_botnext26ts_256       | 1  |  0.8657  |
|            fbnetv3_b            | 1  | 0.865232 |
|       tf_efficientnet_b0        | 1  | 0.865147 |
|        eca_halonext26ts         | 1  | 0.864087 |
|           mobilevit_s           | 1  | 0.863516 |
|          botnet26t_256          | 1  | 0.858262 |
|           rexnet_100            | 1  | 0.856607 |
|           regnety_002           | 1  | 0.847958 |
|          ghostnet_100           | 1  | 0.845712 |
|            mixnet_l             | 1  | 0.840814 |
|        tnt_s_patch16_224        | 1  | 0.839949 |
|        sebotnet33ts_256         | 1  | 0.834522 |
|           tf_mixnet_l           | 1  | 0.833444 |
|            levit_128            | 1  | 0.831283 |
|         crossvit_9_240          | 1  | 0.82902  |
|             dpn107              | 1  | 0.82603  |
|          cspdarknet53           | 1  | 0.825068 |
|           res2next50            | 1  | 0.823421 |
|             dla102              | 1  | 0.811283 |
|       gluon_inception_v3        | 1  | 0.810024 |
|        res2net50_14w_8s         | 1  | 0.809905 |
|            hrnet_w18            | 1  | 0.809646 |
|          inception_v3           | 1  | 0.809171 |
|        adv_inception_v3         | 1  | 0.808561 |
|           resnest101e           | 1  | 0.802587 |
|           selecsls42b           | 1  | 0.796756 |
|            gernet_l             | 1  | 0.786032 |
|            repvgg_a2            | 1  | 0.775278 |
|        res2net101_26w_4s        | 1  | 0.775174 |
|     swsl_resnext101_32x16d      | 1  | 0.736322 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 1575.603874 |
|      xcit_large_24_p8_224       | 1  | 451.328449  |
|          pnasnet5large          | 1  | 124.609614  |
|          convnext_base          | 1  |  105.3706   |
|        convmixer_768_32         | 1  | 100.016842  |
|          jx_nest_base           | 1  |  100.01054  |
|           convit_base           | 1  |  97.067653  |
|     swsl_resnext101_32x16d      | 1  |  96.089212  |
| deit_base_distilled_patch16_224 | 1  |  85.236736  |
|  swin_base_patch4_window7_224   | 1  |  82.919484  |
|      beit_base_patch16_224      | 1  |  76.562672  |
|          mixer_b16_224          | 1  |  73.44318   |
|      vit_base_patch16_224       | 1  |  72.170246  |
|             dpn107              | 1  |  70.96993   |
|            pit_b_224            | 1  |  65.540953  |
|         poolformer_m36          | 1  |  52.264923  |
|           dm_nfnet_f0           | 1  |  50.394491  |
|        tnt_s_patch16_224        | 1  |  50.09593   |
|        twins_pcpvt_base         | 1  |  43.344436  |
|        sebotnet33ts_256         | 1  |  41.512969  |
|           volo_d1_224           | 1  |  40.03856   |
|            nfnet_l0             | 1  |  33.381466  |
|           resnest101e           | 1  |  31.376769  |
|        res2net101_26w_4s        | 1  |  27.966217  |
|          gmlp_s16_224           | 1  |  27.561983  |
|          gmixer_24_224          | 1  |  24.94233   |
|          resmlp_12_224          | 1  |  22.583616  |
|            hrnet_w18            | 1  |  22.060671  |
|           mobilevit_s           | 1  |  21.933227  |
|         visformer_small         | 1  |  21.390631  |
|       gluon_inception_v3        | 1  |  20.774831  |
|        adv_inception_v3         | 1  |  20.748854  |
|          inception_v3           | 1  |  20.747249  |
|             dla102              | 1  |  18.764068  |
|        eca_halonext26ts         | 1  |  18.295516  |
|           tf_mixnet_l           | 1  |  17.282968  |
|          cspdarknet53           | 1  |  17.132876  |
|        res2net50_14w_8s         | 1  |  16.933499  |
|            mixnet_l             | 1  |  16.864413  |
|       eca_botnext26ts_256       | 1  |  14.829464  |
|         crossvit_9_240          | 1  |  14.782979  |
|           res2next50            | 1  |  14.763567  |
|            repvgg_a2            | 1  |  12.730216  |
|            gernet_l             | 1  |  11.778047  |
|          botnet26t_256          | 1  |  10.877219  |
|           selecsls42b           | 1  |  10.706887  |
|       tf_efficientnet_b0        | 1  |  9.813231   |
|        ese_vovnet19b_dw         | 1  |  9.226273   |
|           rexnet_100            | 1  |   8.52196   |
|            fbnetv3_b            | 1  |  8.373898   |
|            tinynet_a            | 1  |  8.225387   |
|            levit_128            | 1  |  5.926393   |
|          ghostnet_100           | 1  |  5.260271   |
|           fbnetc_100            | 1  |  3.914979   |
|      mobilenetv3_large_100      | 1  |  3.882107   |
|          spnasnet_100           | 1  |  3.625048   |
|           mnasnet_100           | 1  |  3.252848   |
|         mobilenetv2_100         | 1  |  3.205327   |
|           regnety_002           | 1  |  3.146904   |
|            lcnet_050            | 1  |  1.492793   |
+---------------------------------+----+-------------+

@WeizhuoZhang-intel
Copy link
Contributor

[amp] Performance Dashboard for amp precision -- Single-core Single-thread (2024-05-05 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 6d30803d64953955df63da56833bf4eb52249aae
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+06ad737
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c7i.metal-24xl
CPU Model Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory 192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS Ubuntu 22.04.3 LTS
Kernel 6.2.0-1017-aws
Microcode 0x2b0004d0
GCC gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.8.18
OpenSSL OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 90%, 71/79 | 98%, 45/46  | 98%, 59/60  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.57x    |    1.77x    |    2.70x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   18.28    |    21.32    |    27.99    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.89x    |    0.90x    |    0.85x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 51.340457 |
|     pyhpc_equation_of_state     |    1    | 21.220869 |
|              dcgan              |    1    | 9.107029  |
|          squeezenet1_1          |    1    | 8.167229  |
|          lennard_jones          |    1    | 6.196331  |
|          timm_resnest           |    1    |  6.01688  |
|         opacus_cifar10          |    1    | 5.837667  |
|            resnet18             |    1    | 5.804076  |
|      functorch_dp_cifar10       |    1    | 5.713329  |
|            resnet50             |    1    | 5.348359  |
|         LearningToPaint         |    1    | 4.863662  |
|           timm_nfnet            |    1    | 4.814404  |
|         resnext50_32x4d         |    1    | 4.773161  |
|              vgg16              |    1    | 4.657874  |
|          mobilenet_v2           |    1    | 4.497343  |
|            resnet152            |    1    |  4.27896  |
|     nvidia_deeprecommender      |    1    | 4.182471  |
|           mnasnet1_0            |    1    | 4.180489  |
|           timm_vovnet           |    1    | 4.164366  |
|             alexnet             |    1    | 4.119477  |
|             yolov3              |    1    | 3.954158  |
|      doctr_reco_predictor       |    1    | 3.889012  |
|              llama              |    1    |  3.60511  |
|       mobilenet_v3_large        |    1    | 3.586752  |
|          basic_gnn_gcn          |    1    | 3.418483  |
|       shufflenet_v2_x1_0        |    1    | 3.311789  |
|     functorch_maml_omniglot     |    1    | 3.290684  |
|           densenet121           |    1    | 3.181477  |
|           timm_regnet           |    1    | 3.005822  |
|         phlippe_resnet          |    1    | 2.866785  |
|              dlrm               |    1    | 2.798416  |
|    detectron2_fcos_r_50_fpn     |    1    | 2.607099  |
|          maml_omniglot          |    5    | 2.581552  |
|        phlippe_densenet         |    1    | 2.580481  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 2.468019  |
|               drq               |    1    | 2.434021  |
|        timm_efficientnet        |    1    | 2.323283  |
|          BERT_pytorch           |    1    | 2.184389  |
|          pytorch_unet           |    1    | 2.120786  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 2.088516  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.999744  |
|          basic_gnn_gin          |    1    | 1.922481  |
|        soft_actor_critic        |   256   | 1.872357  |
|       Background_Matting        |    1    | 1.829031  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.823154  |
|         basic_gnn_sage          |    1    |  1.80526  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  1.7966   |
|     timm_vision_transformer     |    1    | 1.757374  |
|             hf_Bert             |    1    | 1.716659  |
|             hf_GPT2             |    1    | 1.703394  |
|        basic_gnn_edgecnn        |    1    | 1.691593  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.635234  |
|          hf_DistilBert          |    1    | 1.627087  |
|          hf_Bert_large          |    1    | 1.626225  |
|         pytorch_stargan         |   16    | 1.573722  |
|  timm_vision_transformer_large  |    1    | 1.541685  |
|             hf_Bart             |    1    | 1.512577  |
|          hf_GPT2_large          |    1    | 1.470419  |
|           hf_T5_base            |    1    | 1.434473  |
|           hf_BigBird            |    1    |  1.33297  |
|        hf_distil_whisper        |    1    | 1.320776  |
|            moondream            |    1    | 1.284879  |
|       speech_transformer        |    1    | 1.241771  |
|           hf_T5_large           |    1    | 1.217503  |
|            hf_Albert            |    1    | 1.188578  |
|          fastNLP_Bert           |    1    | 1.151654  |
|              hf_T5              |    1    | 1.060454  |
|           hf_Reformer           |    1    | 1.034254  |
|             demucs              |    1    | 1.026111  |
|      torch_multimodal_clip      |    1    | 1.019896  |
|           tts_angular           |    1    |  0.99828  |
|     resnet50_quantized_qat      |    1    | 0.986674  |
|   mobilenet_v2_quantized_qat    |    1    | 0.982674  |
|              maml               |    1    | 0.940552  |
|          hf_Longformer          |    1    | 0.909829  |
|        timm_efficientdet        |    0    |    0.0    |
|       doctr_det_predictor       |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|              llama              |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|               drq               |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|            resnet152            |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|          pytorch_unet           |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|       speech_transformer        |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|              moco               |    0    | model_fail_to_load |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
|        timm_efficientdet        |    0    | model_fail_to_load |
|           Super_SloMo           |    1    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_101_fpn |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_101_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    1    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    1    |   fail_accuracy    |
|       doctr_det_predictor       |    0    | eager_fail_to_run  |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           densenet121           |    1    | 70.043815 |
|           hf_BigBird            |    1    | 65.709609 |
|    detectron2_fcos_r_50_fpn     |    1    | 51.011981 |
|              maml               |    1    | 47.938336 |
|           hf_T5_large           |    1    | 46.994533 |
|          hf_Longformer          |    1    | 40.584863 |
|           timm_nfnet            |    1    | 37.072042 |
|            moondream            |    1    | 36.447821 |
|           hf_Reformer           |    1    | 36.084682 |
|       speech_transformer        |    1    | 33.172262 |
|      torch_multimodal_clip      |    1    | 33.078108 |
|        phlippe_densenet         |    1    | 32.638701 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 30.358321 |
|  timm_vision_transformer_large  |    1    | 29.333089 |
|             demucs              |    1    | 28.946303 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 28.585334 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 27.44104  |
|          hf_GPT2_large          |    1    | 27.24459  |
|        timm_efficientnet        |    1    | 25.726958 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 25.62141  |
|        hf_distil_whisper        |    1    | 24.029531 |
|              hf_T5              |    1    | 23.700915 |
|             yolov3              |    1    | 22.710429 |
|         opacus_cifar10          |    1    | 21.633687 |
|              llama              |    1    | 21.004798 |
|      functorch_dp_cifar10       |    1    | 20.523594 |
|          hf_Bert_large          |    1    | 19.786312 |
|          timm_resnest           |    1    | 19.269012 |
|             hf_GPT2             |    1    | 19.009364 |
|       shufflenet_v2_x1_0        |    1    | 18.433644 |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  18.2322  |
|     timm_vision_transformer     |    1    | 17.359902 |
|       mobilenet_v3_large        |    1    | 17.344576 |
|          BERT_pytorch           |    1    | 16.729592 |
|          fastNLP_Bert           |    1    | 16.662855 |
|           timm_regnet           |    1    | 16.18964  |
|             hf_Bart             |    1    | 16.083135 |
|           timm_vovnet           |    1    | 16.034229 |
|         pytorch_stargan         |   16    | 15.867105 |
|           hf_T5_base            |    1    | 15.335447 |
|            hf_Albert            |    1    | 14.464997 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 14.297158 |
|             hf_Bert             |    1    | 13.823107 |
|       Background_Matting        |    1    | 12.666355 |
|          squeezenet1_1          |    1    | 12.553906 |
|          hf_DistilBert          |    1    | 11.847593 |
|            resnet152            |    1    | 11.396498 |
|      doctr_reco_predictor       |    1    | 10.830129 |
|              vgg16              |    1    | 9.481787  |
|     pyhpc_isoneutral_mixing     |    1    | 9.036423  |
|               drq               |    1    | 8.672142  |
|          basic_gnn_gcn          |    1    | 8.424585  |
|          pytorch_unet           |    1    | 8.413397  |
|            resnet50             |    1    | 8.176779  |
|         resnext50_32x4d         |    1    |  8.11861  |
|             alexnet             |    1    | 8.031709  |
|     functorch_maml_omniglot     |    1    | 7.710447  |
|          maml_omniglot          |    5    | 7.694689  |
|     nvidia_deeprecommender      |    1    | 7.543139  |
|              dlrm               |    1    | 7.507588  |
|          mobilenet_v2           |    1    | 7.253589  |
|            resnet18             |    1    | 7.210924  |
|         basic_gnn_sage          |    1    | 7.189862  |
|           mnasnet1_0            |    1    | 6.976034  |
|     pyhpc_equation_of_state     |    1    | 6.595176  |
|        soft_actor_critic        |   256   |  6.42987  |
|         LearningToPaint         |    1    | 6.262863  |
|        basic_gnn_edgecnn        |    1    | 6.214562  |
|         phlippe_resnet          |    1    |  5.95058  |
|          basic_gnn_gin          |    1    | 5.470687  |
|          lennard_jones          |    1    | 4.624917  |
|              dcgan              |    1    | 4.554907  |
|           tts_angular           |    1    | 4.448192  |
|   mobilenet_v2_quantized_qat    |    1    | 0.191534  |
|     resnet50_quantized_qat      |    1    | 0.174616  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
|       doctr_det_predictor       |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.986917 |
|             demucs              |    1    | 0.984536 |
|           hf_T5_large           |    1    | 0.981249 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.977052 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.97645  |
|           hf_T5_base            |    1    | 0.971299 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.970948 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.970326 |
|          pytorch_unet           |    1    | 0.967975 |
|              llama              |    1    | 0.964977 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.962536 |
|        basic_gnn_edgecnn        |    1    | 0.961188 |
|           hf_BigBird            |    1    | 0.959082 |
|         LearningToPaint         |    1    | 0.953897 |
|       Background_Matting        |    1    | 0.953367 |
|     resnet50_quantized_qat      |    1    | 0.945546 |
|          basic_gnn_gcn          |    1    | 0.943883 |
|      doctr_reco_predictor       |    1    | 0.943114 |
|      torch_multimodal_clip      |    1    | 0.940469 |
|          basic_gnn_gin          |    1    | 0.935276 |
|         basic_gnn_sage          |    1    | 0.935275 |
|        hf_distil_whisper        |    1    | 0.932001 |
|       speech_transformer        |    1    | 0.927031 |
|             hf_Bert             |    1    | 0.924337 |
|          fastNLP_Bert           |    1    | 0.922617 |
|         pytorch_stargan         |   16    | 0.921173 |
|            hf_Albert            |    1    | 0.918055 |
|   mobilenet_v2_quantized_qat    |    1    | 0.915332 |
|          BERT_pytorch           |    1    | 0.912184 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.909857 |
|          hf_GPT2_large          |    1    | 0.909561 |
|             hf_GPT2             |    1    | 0.907822 |
|              hf_T5              |    1    | 0.905492 |
|          hf_DistilBert          |    1    | 0.901837 |
|          hf_Longformer          |    1    | 0.896443 |
|         opacus_cifar10          |    1    | 0.893504 |
|               drq               |    1    | 0.892597 |
|             hf_Bart             |    1    | 0.891573 |
|           tts_angular           |    1    | 0.89011  |
|        soft_actor_critic        |   256   |  0.8875  |
|          mobilenet_v2           |    1    | 0.873617 |
|  timm_vision_transformer_large  |    1    | 0.872448 |
|           timm_nfnet            |    1    | 0.868841 |
|          squeezenet1_1          |    1    | 0.868708 |
|        timm_efficientnet        |    1    | 0.867916 |
|       mobilenet_v3_large        |    1    | 0.864901 |
|            moondream            |    1    | 0.864393 |
|          maml_omniglot          |    5    | 0.863124 |
|          lennard_jones          |    1    | 0.861301 |
|              dcgan              |    1    | 0.860395 |
|     functorch_maml_omniglot     |    1    | 0.860353 |
|           mnasnet1_0            |    1    | 0.859747 |
|     timm_vision_transformer     |    1    | 0.859238 |
|              vgg16              |    1    | 0.858865 |
|          hf_Bert_large          |    1    | 0.858643 |
|             alexnet             |    1    | 0.856226 |
|          timm_resnest           |    1    | 0.85349  |
|         phlippe_resnet          |    1    | 0.852217 |
|      functorch_dp_cifar10       |    1    | 0.84398  |
|       shufflenet_v2_x1_0        |    1    | 0.843656 |
|     nvidia_deeprecommender      |    1    | 0.837379 |
|           hf_Reformer           |    1    | 0.834858 |
|     pyhpc_equation_of_state     |    1    | 0.832226 |
|             yolov3              |    1    | 0.831959 |
|         resnext50_32x4d         |    1    | 0.826326 |
|            resnet18             |    1    | 0.818287 |
|           timm_vovnet           |    1    | 0.810043 |
|     pyhpc_isoneutral_mixing     |    1    | 0.809602 |
|        phlippe_densenet         |    1    | 0.809236 |
|            resnet50             |    1    | 0.806096 |
|           densenet121           |    1    | 0.803876 |
|           timm_regnet           |    1    | 0.796066 |
|              maml               |    1    | 0.795918 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.762141 |
|            resnet152            |    1    | 0.761069 |
|        timm_efficientdet        |    0    |   0.0    |
|       doctr_det_predictor       |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
|              moco               |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|           hf_T5_base            |    1    | 9737.586504 |
|          hf_GPT2_large          |    1    | 3036.105252 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 2923.702252 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2888.135598 |
|            moondream            |    1    | 2434.197099 |
|           hf_T5_large           |    1    | 2177.204301 |
|        hf_distil_whisper        |    1    | 2000.05722  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1757.08876  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1734.535484 |
|          pytorch_unet           |    1    | 1497.96529  |
|       Background_Matting        |    1    | 1465.77828  |
|             demucs              |    1    | 1192.440879 |
|  timm_vision_transformer_large  |    1    |  955.21358  |
|    detectron2_fcos_r_50_fpn     |    1    | 711.381206  |
|           hf_BigBird            |    1    | 517.058065  |
|          hf_Longformer          |    1    | 487.928877  |
|      torch_multimodal_clip      |    1    | 455.986349  |
|          hf_Bert_large          |    1    | 442.238551  |
|             hf_Bart             |    1    | 241.951154  |
|         pytorch_stargan         |   16    | 235.863382  |
|              hf_T5              |    1    |  235.27545  |
|          fastNLP_Bert           |    1    | 222.140964  |
|            hf_Albert            |    1    | 193.836525  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 179.282267  |
|       speech_transformer        |    1    | 178.057441  |
|             hf_Bert             |    1    | 171.696029  |
|           hf_Reformer           |    1    | 168.985439  |
|             hf_GPT2             |    1    |  129.0091   |
|          hf_DistilBert          |    1    | 103.273937  |
|        basic_gnn_edgecnn        |    1    |  87.879884  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  87.748711  |
|             yolov3              |    1    |  69.621428  |
|          BERT_pytorch           |    1    |  49.890485  |
|              maml               |    1    |  48.421027  |
|              vgg16              |    1    |  42.122218  |
|     nvidia_deeprecommender      |    1    |  38.595102  |
|           timm_regnet           |    1    |  35.657185  |
|           timm_nfnet            |    1    |  35.573168  |
|           tts_angular           |    1    |  30.042782  |
|            resnet152            |    1    |  28.879211  |
|          basic_gnn_gcn          |    1    |  23.877932  |
|     timm_vision_transformer     |    1    |  18.613892  |
|         basic_gnn_sage          |    1    |  15.942323  |
|          basic_gnn_gin          |    1    |  15.56955   |
|         resnext50_32x4d         |    1    |  13.000013  |
|           densenet121           |    1    |  12.828301  |
|           timm_vovnet           |    1    |  12.201815  |
|             alexnet             |    1    |  11.694311  |
|              llama              |    1    |  11.283256  |
|            resnet50             |    1    |  10.345841  |
|        timm_efficientnet        |    1    |  9.275398   |
|     resnet50_quantized_qat      |    1    |   7.62851   |
|          timm_resnest           |    1    |  6.284922   |
|      doctr_reco_predictor       |    1    |  5.297163   |
|   mobilenet_v2_quantized_qat    |    1    |  4.818129   |
|       mobilenet_v3_large        |    1    |  3.871787   |
|            resnet18             |    1    |  3.721915   |
|           mnasnet1_0            |    1    |  3.234617   |
|          mobilenet_v2           |    1    |  3.189925   |
|       shufflenet_v2_x1_0        |    1    |  3.070261   |
|         LearningToPaint         |    1    |  2.474217   |
|          squeezenet1_1          |    1    |  1.919178   |
|        phlippe_densenet         |    1    |  1.793091   |
|      functorch_dp_cifar10       |    1    |  1.749761   |
|         opacus_cifar10          |    1    |  1.711844   |
|               drq               |    1    |   0.92819   |
|         phlippe_resnet          |    1    |  0.646541   |
|        soft_actor_critic        |   256   |  0.600311   |
|          maml_omniglot          |    5    |  0.597142   |
|              dcgan              |    1    |   0.54189   |
|     functorch_maml_omniglot     |    1    |  0.514701   |
|              dlrm               |    1    |  0.481489   |
|     pyhpc_isoneutral_mixing     |    1    |  0.028271   |
|     pyhpc_equation_of_state     |    1    |  0.021705   |
|          lennard_jones          |    1    |  0.018074   |
|        timm_efficientdet        |    0    |     0.0     |
|              moco               |    0    |     0.0     |
|       doctr_det_predictor       |    0    |     0.0     |
|         DALLE2_pytorch          |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 3.401134 |
|     PegasusForConditionalGeneration     | 1  | 2.78914  |
|             XGLMForCausalLM             | 1  | 2.671533 |
|     MobileBertForQuestionAnswering      | 1  | 2.646892 |
|          DistilBertForMaskedLM          | 1  | 2.635608 |
|           PegasusForCausalLM            | 1  | 2.572837 |
|     DistilBertForQuestionAnswering      | 1  | 2.548376 |
|     M2M100ForConditionalGeneration      | 1  | 2.538796 |
|          BlenderbotForCausalLM          | 1  | 2.460367 |
|            YituTechConvBert             | 1  | 2.434505 |
|       MT5ForConditionalGeneration       | 1  | 2.401239 |
|       BlenderbotSmallForCausalLM        | 1  | 2.359843 |
|         Speech2Text2ForCausalLM         | 1  | 2.340044 |
| BlenderbotSmallForConditionalGeneration | 1  | 2.244143 |
|       DebertaForQuestionAnswering       | 1  | 1.927332 |
|           DebertaForMaskedLM            | 1  | 1.881492 |
|            XLNetLMHeadModel             | 1  | 1.873083 |
|                CamemBert                | 1  | 1.798686 |
|           LayoutLMForMaskedLM           | 1  | 1.794379 |
|           RobertaForCausalLM            | 1  | 1.793979 |
|           ElectraForCausalLM            | 1  | 1.788646 |
|             BertForMaskedLM             | 1  | 1.775227 |
|    LayoutLMForSequenceClassification    | 1  | 1.736332 |
|    MegatronBertForQuestionAnswering     | 1  | 1.710399 |
|       RobertaForQuestionAnswering       | 1  | 1.707517 |
|            TrOCRForCausalLM             | 1  | 1.704943 |
|        BertForQuestionAnswering         | 1  | 1.694976 |
|         MegatronBertForCausalLM         | 1  | 1.684467 |
|       ElectraForQuestionAnswering       | 1  | 1.605274 |
|          DebertaV2ForMaskedLM           | 1  | 1.548059 |
|               DistillGPT2               | 1  | 1.540589 |
|             OPTForCausalLM              | 1  | 1.47406  |
|      DebertaV2ForQuestionAnswering      | 1  | 1.473694 |
|      GPT2ForSequenceClassification      | 1  | 1.450503 |
|             BartForCausalLM             | 1  | 1.382714 |
|      BartForConditionalGeneration       | 1  | 1.337201 |
|            PLBartForCausalLM            | 1  | 1.32747  |
|     PLBartForConditionalGeneration      | 1  | 1.323348 |
|            MBartForCausalLM             | 1  | 1.313149 |
|      MBartForConditionalGeneration      | 1  | 1.30442  |
|               GoogleFnet                | 1  |  1.2234  |
|       T5ForConditionalGeneration        | 1  | 1.130653 |
|                 T5Small                 | 1  | 1.129907 |
|            AlbertForMaskedLM            | 1  | 1.124276 |
|       AlbertForQuestionAnswering        | 1  | 1.112148 |
|          AllenaiLongformerBase          | 1  | 0.816179 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |  fail_accuracy   |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 47.801013 |
|          MobileBertForMaskedLM          | 1  | 36.24321  |
|     MobileBertForQuestionAnswering      | 1  | 36.203489 |
|     PegasusForConditionalGeneration     | 1  | 33.777903 |
|     M2M100ForConditionalGeneration      | 1  | 32.596825 |
|      MBartForConditionalGeneration      | 1  | 31.739367 |
|          BlenderbotForCausalLM          | 1  | 31.290881 |
|             XGLMForCausalLM             | 1  | 28.381148 |
|       MT5ForConditionalGeneration       | 1  | 27.873205 |
|            XLNetLMHeadModel             | 1  | 27.139643 |
|         MegatronBertForCausalLM         | 1  | 25.50371  |
|       T5ForConditionalGeneration        | 1  | 24.917014 |
|                 T5Small                 | 1  | 24.876195 |
|    MegatronBertForQuestionAnswering     | 1  | 24.534783 |
|      DebertaV2ForQuestionAnswering      | 1  | 23.782485 |
|          DebertaV2ForMaskedLM           | 1  | 23.312221 |
| BlenderbotSmallForConditionalGeneration | 1  | 22.519245 |
|           PegasusForCausalLM            | 1  | 21.810411 |
|      BartForConditionalGeneration       | 1  | 21.534503 |
|            YituTechConvBert             | 1  | 21.035683 |
|     PLBartForConditionalGeneration      | 1  | 20.284503 |
|            MBartForCausalLM             | 1  | 19.977566 |
|      GPT2ForSequenceClassification      | 1  | 19.635648 |
|             OPTForCausalLM              | 1  | 19.128892 |
|               DistillGPT2               | 1  | 17.279885 |
|       AlbertForQuestionAnswering        | 1  | 16.55265  |
|       DebertaForQuestionAnswering       | 1  | 16.477694 |
|            AlbertForMaskedLM            | 1  | 16.470728 |
|           DebertaForMaskedLM            | 1  | 16.403125 |
|            TrOCRForCausalLM             | 1  | 16.384462 |
|       RobertaForQuestionAnswering       | 1  | 15.983623 |
|           RobertaForCausalLM            | 1  | 15.956345 |
|                CamemBert                | 1  | 15.933539 |
|    LayoutLMForSequenceClassification    | 1  | 15.871751 |
|           LayoutLMForMaskedLM           | 1  | 15.226453 |
|           ElectraForCausalLM            | 1  | 15.17632  |
|       ElectraForQuestionAnswering       | 1  | 15.162971 |
|        BertForQuestionAnswering         | 1  | 15.152594 |
|             BertForMaskedLM             | 1  | 15.04748  |
|       BlenderbotSmallForCausalLM        | 1  | 14.087672 |
|         Speech2Text2ForCausalLM         | 1  | 13.962857 |
|               GoogleFnet                | 1  | 13.514652 |
|     DistilBertForQuestionAnswering      | 1  | 13.274253 |
|          DistilBertForMaskedLM          | 1  | 13.248685 |
|            PLBartForCausalLM            | 1  | 13.193687 |
|             BartForCausalLM             | 1  | 12.765113 |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.97694  |
|            XLNetLMHeadModel             | 1  | 0.973129 |
|            MBartForCausalLM             | 1  | 0.961002 |
|               DistillGPT2               | 1  | 0.954329 |
|            PLBartForCausalLM            | 1  | 0.944662 |
|      GPT2ForSequenceClassification      | 1  | 0.94422  |
|       RobertaForQuestionAnswering       | 1  | 0.939197 |
|       MT5ForConditionalGeneration       | 1  | 0.938223 |
|           LayoutLMForMaskedLM           | 1  | 0.937203 |
|     PLBartForConditionalGeneration      | 1  | 0.936769 |
|                CamemBert                | 1  | 0.935249 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.935146 |
|          BlenderbotForCausalLM          | 1  | 0.932732 |
|         MegatronBertForCausalLM         | 1  | 0.932039 |
|             BertForMaskedLM             | 1  | 0.931638 |
|            YituTechConvBert             | 1  | 0.931082 |
|    MegatronBertForQuestionAnswering     | 1  | 0.930814 |
|           DebertaForMaskedLM            | 1  | 0.930273 |
|       T5ForConditionalGeneration        | 1  | 0.92966  |
|            TrOCRForCausalLM             | 1  | 0.928092 |
|             BartForCausalLM             | 1  | 0.925763 |
|                 T5Small                 | 1  | 0.925716 |
|    LayoutLMForSequenceClassification    | 1  | 0.923557 |
|           RobertaForCausalLM            | 1  | 0.92271  |
|      BartForConditionalGeneration       | 1  | 0.921274 |
|       BlenderbotSmallForCausalLM        | 1  | 0.919453 |
|        BertForQuestionAnswering         | 1  | 0.918576 |
|             XGLMForCausalLM             | 1  | 0.91837  |
|          DistilBertForMaskedLM          | 1  | 0.90974  |
|     M2M100ForConditionalGeneration      | 1  | 0.909587 |
|          DebertaV2ForMaskedLM           | 1  | 0.907016 |
|           PegasusForCausalLM            | 1  | 0.906941 |
|          AllenaiLongformerBase          | 1  | 0.904698 |
|      MBartForConditionalGeneration      | 1  | 0.899318 |
|           ElectraForCausalLM            | 1  | 0.895394 |
|       DebertaForQuestionAnswering       | 1  | 0.886769 |
|     PegasusForConditionalGeneration     | 1  | 0.885541 |
|     DistilBertForQuestionAnswering      | 1  | 0.875801 |
|       ElectraForQuestionAnswering       | 1  | 0.875074 |
|         Speech2Text2ForCausalLM         | 1  | 0.866976 |
|               GoogleFnet                | 1  | 0.854519 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.849722 |
|          MobileBertForMaskedLM          | 1  | 0.804956 |
|     MobileBertForQuestionAnswering      | 1  | 0.770319 |
|            AlbertForMaskedLM            | 1  | 0.688378 |
|       AlbertForQuestionAnswering        | 1  | 0.667758 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+-------------+
|                  name                   | bs |  inductor   |
+-----------------------------------------+----+-------------+
|            AlbertForMaskedLM            | 1  | 4179.08557  |
|       AlbertForQuestionAnswering        | 1  | 4168.474775 |
|             OPTForCausalLM              | 1  | 2139.16817  |
|      MBartForConditionalGeneration      | 1  | 1779.419989 |
|      BartForConditionalGeneration       | 1  | 1406.739853 |
|          DebertaV2ForMaskedLM           | 1  | 1354.545111 |
|          AllenaiLongformerBase          | 1  | 1094.933013 |
|            XLNetLMHeadModel             | 1  | 1044.196963 |
|      DebertaV2ForQuestionAnswering      | 1  | 1037.119262 |
|            MBartForCausalLM             | 1  | 961.222805  |
|       T5ForConditionalGeneration        | 1  | 807.002844  |
|                 T5Small                 | 1  | 806.933216  |
|          BlenderbotForCausalLM          | 1  | 751.996666  |
|     PLBartForConditionalGeneration      | 1  | 650.611017  |
|             BartForCausalLM             | 1  | 610.993586  |
|         MegatronBertForCausalLM         | 1  |  508.04413  |
|      GPT2ForSequenceClassification      | 1  | 468.590863  |
|    MegatronBertForQuestionAnswering     | 1  | 464.019245  |
|               GoogleFnet                | 1  |  449.80051  |
|            PLBartForCausalLM            | 1  | 376.394897  |
|             XGLMForCausalLM             | 1  | 241.163504  |
|           DebertaForMaskedLM            | 1  | 213.105534  |
|     M2M100ForConditionalGeneration      | 1  | 209.548992  |
|           RobertaForCausalLM            | 1  | 205.965462  |
|            YituTechConvBert             | 1  | 195.365946  |
|           LayoutLMForMaskedLM           | 1  | 179.837434  |
|             BertForMaskedLM             | 1  | 179.780642  |
|                CamemBert                | 1  | 179.525516  |
|     PegasusForConditionalGeneration     | 1  | 177.516104  |
|            TrOCRForCausalLM             | 1  | 172.420132  |
|               DistillGPT2               | 1  |  165.91547  |
|       DebertaForQuestionAnswering       | 1  | 145.943567  |
|    LayoutLMForSequenceClassification    | 1  | 140.942419  |
|        BertForQuestionAnswering         | 1  | 140.518393  |
|       RobertaForQuestionAnswering       | 1  | 140.190284  |
|       MT5ForConditionalGeneration       | 1  |  109.62334  |
|           PegasusForCausalLM            | 1  |  88.209668  |
| BlenderbotSmallForConditionalGeneration | 1  |  52.563956  |
|           ElectraForCausalLM            | 1  |  49.822402  |
|          DistilBertForMaskedLM          | 1  |  30.192318  |
|          MobileBertForMaskedLM          | 1  |  29.567696  |
|       ElectraForQuestionAnswering       | 1  |  29.256155  |
|       BlenderbotSmallForCausalLM        | 1  |  28.26878   |
|     DistilBertForQuestionAnswering      | 1  |  18.018566  |
|     MobileBertForQuestionAnswering      | 1  |  16.714212  |
|         Speech2Text2ForCausalLM         | 1  |  5.406542   |
+-----------------------------------------+----+-------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|           resnest101e           | 1  | 5.359961 |
|          pnasnet5large          | 1  | 5.33098  |
|        ese_vovnet19b_dw         | 1  | 5.267322 |
|          cspdarknet53           | 1  | 4.650105 |
|         mobilenetv2_100         | 1  | 4.596714 |
|     swsl_resnext101_32x16d      | 1  | 4.54053  |
|             dla102              | 1  | 4.421938 |
|          botnet26t_256          | 1  | 4.419584 |
|           fbnetc_100            | 1  | 4.416711 |
|          inception_v3           | 1  | 4.406799 |
|           res2next50            | 1  | 4.343754 |
|       gluon_inception_v3        | 1  | 4.306506 |
|           mnasnet_100           | 1  | 4.298575 |
|        adv_inception_v3         | 1  | 4.285055 |
|          spnasnet_100           | 1  | 4.167029 |
|           selecsls42b           | 1  | 4.098824 |
|            gernet_l             | 1  | 4.025898 |
|           dm_nfnet_f0           | 1  | 3.946476 |
|            nfnet_l0             | 1  | 3.798018 |
|        res2net50_14w_8s         | 1  | 3.794369 |
|      mobilenetv3_large_100      | 1  | 3.786246 |
|        res2net101_26w_4s        | 1  | 3.748474 |
|            fbnetv3_b            | 1  | 3.733446 |
|            repvgg_a2            | 1  | 3.617298 |
|       eca_botnext26ts_256       | 1  | 3.551933 |
|           regnety_002           | 1  | 3.370277 |
|            hrnet_w18            | 1  | 3.345086 |
|            lcnet_050            | 1  | 3.222906 |
|          ghostnet_100           | 1  | 3.169166 |
|         visformer_small         | 1  | 3.068618 |
|        eca_halonext26ts         | 1  | 2.921004 |
|         poolformer_m36          | 1  | 2.886148 |
|             dpn107              | 1  | 2.855218 |
|           mobilevit_s           | 1  | 2.620883 |
|           rexnet_100            | 1  | 2.587463 |
|            tinynet_a            | 1  | 2.490341 |
|       tf_efficientnet_b0        | 1  | 2.283731 |
|            levit_128            | 1  | 2.262918 |
|           tf_mixnet_l           | 1  | 2.207211 |
|            mixnet_l             | 1  | 2.118589 |
|        twins_pcpvt_base         | 1  | 2.018131 |
|          gmixer_24_224          | 1  | 1.863598 |
|           volo_d1_224           | 1  | 1.824303 |
|        convmixer_768_32         | 1  | 1.774412 |
|          gmlp_s16_224           | 1  | 1.741088 |
|            pit_b_224            | 1  | 1.730164 |
|      beit_base_patch16_224      | 1  | 1.665689 |
|  swin_base_patch4_window7_224   | 1  | 1.62512  |
|      vit_base_patch16_224       | 1  | 1.609743 |
|      xcit_large_24_p8_224       | 1  | 1.554026 |
|          convnext_base          | 1  | 1.542004 |
|           convit_base           | 1  | 1.491719 |
|          cait_m36_384           | 1  | 1.449225 |
|         crossvit_9_240          | 1  | 1.42635  |
|        sebotnet33ts_256         | 1  | 1.391339 |
|        tnt_s_patch16_224        | 1  | 1.365293 |
| deit_base_distilled_patch16_224 | 1  | 1.354875 |
|          jx_nest_base           | 1  | 1.327547 |
|          resmlp_12_224          | 1  | 1.171794 |
|          mixer_b16_224          | 1  | 1.16962  |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|          botnet26t_256          | 1  |     pass      |
|          cait_m36_384           | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|        convmixer_768_32         | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|             dla102              | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|       eca_botnext26ts_256       | 1  |     pass      |
|        eca_halonext26ts         | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|          ghostnet_100           | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|            hrnet_w18            | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|            levit_128            | 1  |     pass      |
|      xcit_large_24_p8_224       | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|           volo_d1_224           | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|          pnasnet5large          | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|        res2net101_26w_4s        | 1  |     pass      |
|        res2net50_14w_8s         | 1  |     pass      |
|           res2next50            | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|           resnest101e           | 1  |     pass      |
|           rexnet_100            | 1  |     pass      |
|        sebotnet33ts_256         | 1  |     pass      |
|           selecsls42b           | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|     swsl_resnext101_32x16d      | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|         visformer_small         | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|         coat_lite_mini          | 1  |  fail_to_run  |
|            lcnet_050            | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|          pnasnet5large          | 1  |  62.4629  |
|  swin_base_patch4_window7_224   | 1  | 62.024627 |
|           tf_mixnet_l           | 1  | 53.405446 |
|             dpn107              | 1  | 49.46573  |
|          jx_nest_base           | 1  | 44.609373 |
|           rexnet_100            | 1  | 43.477843 |
|        res2net50_14w_8s         | 1  | 43.303275 |
|         crossvit_9_240          | 1  | 41.808247 |
|          ghostnet_100           | 1  | 40.476079 |
|            mixnet_l             | 1  | 39.363988 |
|        sebotnet33ts_256         | 1  | 38.946364 |
|      xcit_large_24_p8_224       | 1  | 38.718177 |
|            levit_128            | 1  | 38.637806 |
|         poolformer_m36          | 1  | 38.613973 |
|        tnt_s_patch16_224        | 1  | 38.137102 |
|        twins_pcpvt_base         | 1  | 37.84007  |
|        eca_halonext26ts         | 1  | 37.310104 |
|           dm_nfnet_f0           | 1  | 37.214694 |
|           mobilevit_s           | 1  | 36.38603  |
|          cait_m36_384           | 1  | 35.287686 |
|         visformer_small         | 1  | 34.641667 |
|           volo_d1_224           | 1  | 32.350415 |
|           resnest101e           | 1  | 31.892246 |
|       eca_botnext26ts_256       | 1  | 31.48073  |
|       tf_efficientnet_b0        | 1  | 31.36412  |
|            hrnet_w18            | 1  | 30.921626 |
|        res2net101_26w_4s        | 1  | 29.865931 |
|            nfnet_l0             | 1  | 28.828383 |
|          convnext_base          | 1  | 28.270064 |
|          inception_v3           | 1  | 27.180344 |
|       gluon_inception_v3        | 1  | 27.141572 |
|        adv_inception_v3         | 1  | 27.107787 |
|            tinynet_a            | 1  | 26.453568 |
|           convit_base           | 1  | 25.803646 |
|           res2next50            | 1  | 25.76454  |
|            pit_b_224            | 1  | 23.362273 |
|          botnet26t_256          | 1  | 22.703933 |
|             dla102              | 1  | 20.816743 |
|          cspdarknet53           | 1  | 20.793342 |
|            fbnetv3_b            | 1  | 20.579908 |
| deit_base_distilled_patch16_224 | 1  | 19.45338  |
|          gmixer_24_224          | 1  | 18.305001 |
|      vit_base_patch16_224       | 1  | 17.642473 |
|      mobilenetv3_large_100      | 1  | 17.587533 |
|        ese_vovnet19b_dw         | 1  | 17.488258 |
|          gmlp_s16_224           | 1  | 16.818247 |
|           regnety_002           | 1  | 14.203071 |
|          resmlp_12_224          | 1  | 13.843577 |
|      beit_base_patch16_224      | 1  | 13.795193 |
|            repvgg_a2            | 1  | 12.79628  |
|          mixer_b16_224          | 1  | 12.472169 |
|        convmixer_768_32         | 1  | 12.443871 |
|           selecsls42b           | 1  | 12.076811 |
|            lcnet_050            | 1  | 10.399254 |
|     swsl_resnext101_32x16d      | 1  | 9.445943  |
|           fbnetc_100            | 1  | 7.825978  |
|          spnasnet_100           | 1  | 7.783814  |
|            gernet_l             | 1  | 7.685368  |
|         mobilenetv2_100         | 1  | 7.651583  |
|           mnasnet_100           | 1  | 7.397758  |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|            nfnet_l0             | 1  | 0.913321 |
|        convmixer_768_32         | 1  | 0.908241 |
|          convnext_base          | 1  | 0.906022 |
|          pnasnet5large          | 1  | 0.900807 |
|          jx_nest_base           | 1  | 0.897237 |
|          resmlp_12_224          | 1  | 0.896455 |
|      beit_base_patch16_224      | 1  | 0.896131 |
|          cait_m36_384           | 1  | 0.894729 |
|           dm_nfnet_f0           | 1  | 0.892274 |
|      xcit_large_24_p8_224       | 1  | 0.891233 |
|        ese_vovnet19b_dw         | 1  | 0.88591  |
|           volo_d1_224           | 1  | 0.884538 |
|         mobilenetv2_100         | 1  | 0.884067 |
|           convit_base           | 1  | 0.883506 |
|  swin_base_patch4_window7_224   | 1  | 0.881881 |
|         visformer_small         | 1  | 0.879398 |
|         poolformer_m36          | 1  |  0.8786  |
|      vit_base_patch16_224       | 1  | 0.876355 |
|           mnasnet_100           | 1  | 0.875138 |
| deit_base_distilled_patch16_224 | 1  | 0.874623 |
|          mixer_b16_224          | 1  | 0.874537 |
|      mobilenetv3_large_100      | 1  | 0.873659 |
|          spnasnet_100           | 1  | 0.87214  |
|            pit_b_224            | 1  | 0.87138  |
|          gmixer_24_224          | 1  | 0.869179 |
|          gmlp_s16_224           | 1  | 0.869023 |
|        twins_pcpvt_base         | 1  | 0.86821  |
|           fbnetc_100            | 1  | 0.868179 |
|            lcnet_050            | 1  | 0.867397 |
|            fbnetv3_b            | 1  | 0.866302 |
|            tinynet_a            | 1  | 0.865279 |
|       tf_efficientnet_b0        | 1  | 0.864938 |
|       eca_botnext26ts_256       | 1  | 0.864643 |
|           mobilevit_s           | 1  | 0.863809 |
|        eca_halonext26ts         | 1  | 0.863452 |
|          botnet26t_256          | 1  | 0.858519 |
|           rexnet_100            | 1  | 0.857564 |
|           regnety_002           | 1  | 0.846561 |
|          ghostnet_100           | 1  | 0.84466  |
|            mixnet_l             | 1  | 0.840496 |
|        tnt_s_patch16_224        | 1  | 0.839699 |
|           tf_mixnet_l           | 1  | 0.836046 |
|        sebotnet33ts_256         | 1  | 0.833854 |
|         crossvit_9_240          | 1  | 0.833337 |
|            levit_128            | 1  | 0.831235 |
|             dpn107              | 1  | 0.826296 |
|          cspdarknet53           | 1  | 0.824505 |
|           res2next50            | 1  | 0.823062 |
|             dla102              | 1  | 0.810705 |
|        res2net50_14w_8s         | 1  | 0.810607 |
|          inception_v3           | 1  | 0.809075 |
|        adv_inception_v3         | 1  | 0.808783 |
|       gluon_inception_v3        | 1  | 0.808521 |
|            hrnet_w18            | 1  | 0.808187 |
|           resnest101e           | 1  | 0.802234 |
|           selecsls42b           | 1  | 0.79682  |
|            gernet_l             | 1  | 0.786631 |
|            repvgg_a2            | 1  | 0.775111 |
|        res2net101_26w_4s        | 1  | 0.774397 |
|     swsl_resnext101_32x16d      | 1  | 0.736202 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 1582.581035 |
|      xcit_large_24_p8_224       | 1  | 451.400227  |
|          pnasnet5large          | 1  |  124.42753  |
|          convnext_base          | 1  | 105.016659  |
|          jx_nest_base           | 1  | 100.130783  |
|        convmixer_768_32         | 1  |  99.978013  |
|           convit_base           | 1  |  96.745034  |
|     swsl_resnext101_32x16d      | 1  |  96.104345  |
| deit_base_distilled_patch16_224 | 1  |  84.519539  |
|  swin_base_patch4_window7_224   | 1  |  83.006148  |
|      beit_base_patch16_224      | 1  |  77.334049  |
|          mixer_b16_224          | 1  |  73.246213  |
|      vit_base_patch16_224       | 1  |  72.284632  |
|             dpn107              | 1  |  69.827386  |
|            pit_b_224            | 1  |  65.106567  |
|         poolformer_m36          | 1  |  52.10808   |
|           dm_nfnet_f0           | 1  |  50.221821  |
|        tnt_s_patch16_224        | 1  |  50.156871  |
|        twins_pcpvt_base         | 1  |  43.222116  |
|        sebotnet33ts_256         | 1  |  41.407541  |
|           volo_d1_224           | 1  |  39.94823   |
|            nfnet_l0             | 1  |  33.523866  |
|           resnest101e           | 1  |  31.03231   |
|        res2net101_26w_4s        | 1  |  27.86646   |
|          gmlp_s16_224           | 1  |  27.545436  |
|          gmixer_24_224          | 1  |  24.941699  |
|          resmlp_12_224          | 1  |  22.379232  |
|            hrnet_w18            | 1  |  22.050248  |
|           mobilevit_s           | 1  |  21.89713   |
|         visformer_small         | 1  |  21.399011  |
|       gluon_inception_v3        | 1  |  20.733642  |
|          inception_v3           | 1  |  20.718375  |
|        adv_inception_v3         | 1  |  20.642795  |
|             dla102              | 1  |  18.524101  |
|        eca_halonext26ts         | 1  |  18.322757  |
|           tf_mixnet_l           | 1  |  17.320142  |
|        res2net50_14w_8s         | 1  |  17.050629  |
|          cspdarknet53           | 1  |  16.918202  |
|            mixnet_l             | 1  |  16.792389  |
|           res2next50            | 1  |  14.939298  |
|       eca_botnext26ts_256       | 1  |  14.880216  |
|         crossvit_9_240          | 1  |  14.732967  |
|            repvgg_a2            | 1  |  12.471839  |
|            gernet_l             | 1  |  11.866284  |
|           selecsls42b           | 1  |  10.827379  |
|          botnet26t_256          | 1  |  10.807102  |
|       tf_efficientnet_b0        | 1  |  9.769603   |
|        ese_vovnet19b_dw         | 1  |  9.211577   |
|           rexnet_100            | 1  |   8.64734   |
|            fbnetv3_b            | 1  |  8.358876   |
|            tinynet_a            | 1  |  8.275024   |
|            levit_128            | 1  |  5.832481   |
|          ghostnet_100           | 1  |  5.083121   |
|           fbnetc_100            | 1  |  3.892688   |
|      mobilenetv3_large_100      | 1  |  3.861487   |
|          spnasnet_100           | 1  |  3.603876   |
|           mnasnet_100           | 1  |  3.242534   |
|         mobilenetv2_100         | 1  |  3.204066   |
|           regnety_002           | 1  |  3.086825   |
|            lcnet_050            | 1  |  1.511401   |
+---------------------------------+----+-------------+

@zxd1997066
Copy link
Contributor

[dynamic] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-05-05 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 6d30803d64953955df63da56833bf4eb52249aae
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+06ad737
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000" 

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 91%, 71/78 | 100%, 46/46 | 100%, 60/60 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.43x    |    1.27x    |    1.84x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   26.46    |    41.79    |    51.55    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.91x    |    0.98x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 11.259789 |
|           mnasnet1_0            |   32    | 2.863261  |
|          squeezenet1_1          |   16    | 2.786321  |
|       mobilenet_v3_large        |   32    |  2.77195  |
|        timm_efficientnet        |   64    | 2.741552  |
|          mobilenet_v2           |   16    |  2.65292  |
|       shufflenet_v2_x1_0        |   64    | 2.362526  |
|          timm_resnest           |   32    | 2.244287  |
|            resnet50             |   32    | 2.180039  |
|        phlippe_densenet         |   128   | 1.975375  |
|           densenet121           |   64    |  1.9622   |
|            resnet152            |   32    | 1.957765  |
|       doctr_det_predictor       |    1    | 1.897011  |
|           timm_nfnet            |   128   | 1.851854  |
|           timm_regnet           |   32    | 1.847643  |
|             hf_GPT2             |    1    | 1.801867  |
|         resnext50_32x4d         |    8    |  1.77818  |
|         phlippe_resnet          |   128   | 1.739694  |
|            resnet18             |    8    | 1.693146  |
|           timm_vovnet           |   32    | 1.658504  |
|             alexnet             |   128   | 1.576479  |
|      doctr_reco_predictor       |    1    | 1.535409  |
|          hf_Bert_large          |    1    | 1.530859  |
|            moondream            |    1    | 1.516325  |
|             yolov3              |    8    |  1.4984   |
|          hf_GPT2_large          |    1    | 1.498221  |
|        basic_gnn_edgecnn        |    1    | 1.486401  |
|            hf_Albert            |    1    | 1.483349  |
|          fastNLP_Bert           |    1    | 1.450559  |
|             hf_Bert             |    1    | 1.423694  |
|         LearningToPaint         |   96    | 1.405335  |
|          hf_Longformer          |    1    | 1.382357  |
|     functorch_maml_omniglot     |    1    | 1.325042  |
|              dcgan              |   256   |  1.32008  |
|          basic_gnn_gcn          |    1    | 1.314013  |
|          hf_DistilBert          |    1    | 1.297588  |
|              vgg16              |    4    | 1.285865  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.282445  |
|             hf_Bart             |    1    | 1.252564  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.225122  |
|          pytorch_unet           |    1    |  1.19175  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.189794  |
|           hf_T5_large           |    1    |  1.18934  |
|         pytorch_stargan         |   16    | 1.184644  |
|        hf_distil_whisper        |    1    |  1.18046  |
|         basic_gnn_sage          |    1    | 1.173509  |
|    detectron2_fcos_r_50_fpn     |    1    |  1.15534  |
|           hf_BigBird            |    1    | 1.152948  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.146208  |
|      torch_multimodal_clip      |   32    | 1.127587  |
|              hf_T5              |    1    | 1.126781  |
|              dlrm               |  2048   | 1.121335  |
|        soft_actor_critic        |   256   | 1.117451  |
|          basic_gnn_gin          |    1    | 1.111121  |
|          lennard_jones          |  1000   | 1.101497  |
|          maml_omniglot          |    5    | 1.093379  |
|          BERT_pytorch           |    2    | 1.085483  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.069763  |
|     nvidia_deeprecommender      |   256   | 1.064113  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  1.04367  |
|     timm_vision_transformer     |   32    | 1.019912  |
|           hf_Reformer           |    1    | 1.014317  |
|             demucs              |    1    | 1.006776  |
|  timm_vision_transformer_large  |   32    | 1.004808  |
|   mobilenet_v2_quantized_qat    |   96    | 1.002467  |
|           tts_angular           |   64    | 0.999512  |
|       speech_transformer        |    1    | 0.999241  |
|     resnet50_quantized_qat      |   32    | 0.993544  |
|               drq               |    1    | 0.965703  |
|           hf_T5_base            |    1    | 0.808566  |
|       Background_Matting        |    1    | 0.798096  |
|              maml               |    1    | 0.745008  |
|     pyhpc_isoneutral_mixing     | 1048576 |  0.64341  |
|         opacus_cifar10          |   64    | 0.638045  |
|      functorch_dp_cifar10       |   64    | 0.591604  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+--------------------------------+---------+--------------------+
|              name              |   bs    |      inductor      |
+--------------------------------+---------+--------------------+
|       Background_Matting       |    1    |  pass_due_to_skip  |
|              maml              |    1    |  pass_due_to_skip  |
| timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|          hf_T5_large           |    4    |  pass_due_to_skip  |
|         hf_GPT2_large          |    4    |  pass_due_to_skip  |
|         basic_gnn_sage         |    1    |        pass        |
|      doctr_det_predictor       |    4    |        pass        |
|              dlrm              |    4    |        pass        |
|    detectron2_fcos_r_50_fpn    |    4    |        pass        |
|          densenet121           |    4    |        pass        |
|             demucs             |    1    |        pass        |
|             dcgan              |    4    |        pass        |
|         basic_gnn_gcn          |    1    |        pass        |
|         basic_gnn_gin          |    1    |        pass        |
|         lennard_jones          |    4    |        pass        |
|       basic_gnn_edgecnn        |    1    |        pass        |
|            alexnet             |    4    |        pass        |
|        LearningToPaint         |    4    |        pass        |
|          fastNLP_Bert          |    4    |        pass        |
|             llama              |    4    |        pass        |
|      doctr_reco_predictor      |    4    |        pass        |
|      functorch_dp_cifar10      |    4    |        pass        |
|              drq               |    1    |        pass        |
|         hf_DistilBert          |    4    |        pass        |
|       hf_distil_whisper        |    4    |        pass        |
|           hf_T5_base           |    4    |        pass        |
|             hf_T5              |    4    |        pass        |
|          hf_Reformer           |    4    |        pass        |
|    functorch_maml_omniglot     |    1    |        pass        |
|            hf_GPT2             |    2    |        pass        |
|         hf_Longformer          |    4    |        pass        |
|           hf_BigBird           |    4    |        pass        |
|         hf_Bert_large          |    4    |        pass        |
|            hf_Bert             |    4    |        pass        |
|            hf_Bart             |    4    |        pass        |
|           hf_Albert            |    4    |        pass        |
|             yolov3             |    4    |        pass        |
|         maml_omniglot          |    5    |        pass        |
|            resnet18            |    4    |        pass        |
|     resnet50_quantized_qat     |    4    |        pass        |
|        phlippe_densenet        |    4    |        pass        |
|        resnext50_32x4d         |    4    |        pass        |
|   mobilenet_v2_quantized_qat   |    4    |        pass        |
|       mobilenet_v3_large       |    4    |        pass        |
|           moondream            |    4    |        pass        |
|     nvidia_deeprecommender     |    4    |        pass        |
|           resnet152            |    4    |        pass        |
|         opacus_cifar10         |    4    |        pass        |
|          BERT_pytorch          |    4    |        pass        |
|         phlippe_resnet         |    4    |        pass        |
|    pyhpc_equation_of_state     |    4    |        pass        |
|    pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix  |    1    |        pass        |
|        pytorch_stargan         |   16    |        pass        |
|          pytorch_unet          |    2    |        pass        |
|          mobilenet_v2          |    4    |        pass        |
|           mnasnet1_0           |    4    |        pass        |
|          timm_regnet           |    4    |        pass        |
|       shufflenet_v2_x1_0       |    4    |        pass        |
|       soft_actor_critic        |   256   |        pass        |
|       speech_transformer       |    1    |        pass        |
|         squeezenet1_1          |    4    |        pass        |
|            resnet50            |    4    |        pass        |
|           timm_nfnet           |    4    |        pass        |
|       timm_efficientnet        |    4    |        pass        |
|          timm_resnest          |    4    |        pass        |
|    timm_vision_transformer     |    4    |        pass        |
|          timm_vovnet           |    4    |        pass        |
|     torch_multimodal_clip      |    4    |        pass        |
|          tts_angular           |    4    |        pass        |
|             vgg16              |    4    |        pass        |
|       timm_efficientdet        |    0    | model_fail_to_load |
|              moco              |    0    | model_fail_to_load |
|         DALLE2_pytorch         |    0    | model_fail_to_load |
|          Super_SloMo           |    4    |    fail_to_run     |
|        vision_maskrcnn         |    1    |    fail_to_run     |
+--------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           densenet121           |   64    | 105.119911 |
|           hf_BigBird            |    1    |  76.78078  |
|    detectron2_fcos_r_50_fpn     |    1    | 70.218861  |
|  timm_vision_transformer_large  |   32    | 63.906405  |
|           hf_T5_large           |    1    | 62.565655  |
|           timm_nfnet            |   128   | 57.460152  |
|              maml               |    1    | 53.976927  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 53.390874  |
|          hf_Longformer          |    1    | 51.384736  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 51.073541  |
|           hf_Reformer           |    1    | 47.283824  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 46.548704  |
|        phlippe_densenet         |   128   | 45.703788  |
|           hf_T5_base            |    1    | 45.053875  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  44.28668  |
|      torch_multimodal_clip      |   32    | 40.623791  |
|     pyhpc_isoneutral_mixing     | 1048576 | 40.224404  |
|       speech_transformer        |    1    |  39.45063  |
|        timm_efficientnet        |   64    | 39.324307  |
|          hf_GPT2_large          |    1    |  36.41344  |
|             yolov3              |    8    | 35.947801  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 34.964477  |
|          BERT_pytorch           |    2    | 34.943084  |
|             demucs              |    1    | 34.711717  |
|            moondream            |    1    |  33.74711  |
|         opacus_cifar10          |   64    |  31.79436  |
|              hf_T5              |    1    | 31.118618  |
|        hf_distil_whisper        |    1    | 30.273707  |
|      functorch_dp_cifar10       |   64    | 30.071864  |
|     timm_vision_transformer     |   32    | 28.957859  |
|       mobilenet_v3_large        |   32    | 27.985055  |
|          timm_resnest           |   32    | 27.732891  |
|       shufflenet_v2_x1_0        |   64    | 26.803861  |
|       doctr_det_predictor       |    1    | 26.714076  |
|          hf_Bert_large          |    1    | 26.587203  |
|           timm_regnet           |   32    | 25.347458  |
|       Background_Matting        |    1    | 24.065178  |
|           timm_vovnet           |   32    | 23.075015  |
|          fastNLP_Bert           |    1    | 22.734635  |
|             hf_Bart             |    1    | 22.504599  |
|          pytorch_unet           |    1    |  21.50157  |
|         pytorch_stargan         |   16    | 20.994625  |
|            resnet152            |   32    | 20.492127  |
|            hf_Albert            |    1    | 20.470149  |
|             hf_GPT2             |    1    |  19.73514  |
|             hf_Bert             |    1    | 19.094729  |
|          hf_DistilBert          |    1    | 19.042033  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 18.415017  |
|          squeezenet1_1          |   16    | 17.666561  |
|              vgg16              |    4    |  14.32869  |
|      doctr_reco_predictor       |    1    | 13.633276  |
|          mobilenet_v2           |   16    | 12.719877  |
|            resnet50             |   32    | 12.609386  |
|         resnext50_32x4d         |    8    | 12.504521  |
|             alexnet             |   128   | 12.138367  |
|          basic_gnn_gcn          |    1    | 11.994028  |
|          basic_gnn_gin          |    1    | 11.342458  |
|         basic_gnn_sage          |    1    | 11.286637  |
|               drq               |    1    | 10.926106  |
|              dlrm               |  2048   | 10.821792  |
|           mnasnet1_0            |   32    | 10.335421  |
|            resnet18             |    8    | 10.091183  |
|         LearningToPaint         |   96    |  9.971251  |
|     functorch_maml_omniglot     |    1    |  9.764888  |
|        basic_gnn_edgecnn        |    1    |  9.282731  |
|     pyhpc_equation_of_state     | 1048576 |  9.236942  |
|     nvidia_deeprecommender      |   256   |  8.821833  |
|          maml_omniglot          |    5    |  8.687429  |
|         phlippe_resnet          |   128   |  8.413416  |
|        soft_actor_critic        |   256   |  8.040964  |
|          lennard_jones          |  1000   |  6.801684  |
|              dcgan              |   256   |  5.974187  |
|           tts_angular           |   64    |  5.597085  |
|   mobilenet_v2_quantized_qat    |   96    |  0.134938  |
|     resnet50_quantized_qat      |   32    |  0.096164  |
|        timm_efficientdet        |    0    |    0.0     |
|              moco               |    0    |    0.0     |
|         DALLE2_pytorch          |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|  timm_vision_transformer_large  |   32    | 0.996264 |
|           timm_nfnet            |   128   | 0.991579 |
|              dlrm               |  2048   | 0.988079 |
|           hf_T5_base            |    1    | 0.987627 |
|        timm_efficientnet        |   64    | 0.984834 |
|           timm_regnet           |   32    | 0.98285  |
|       Background_Matting        |    1    | 0.982228 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981141 |
|             yolov3              |    8    | 0.980472 |
|             demucs              |    1    | 0.979896 |
|           densenet121           |   64    | 0.979329 |
|          pytorch_unet           |    1    | 0.978556 |
|            resnet152            |   32    | 0.978545 |
|     nvidia_deeprecommender      |   256   | 0.978513 |
|          hf_GPT2_large          |    1    | 0.977691 |
|      torch_multimodal_clip      |   32    | 0.97655  |
|           timm_vovnet           |   32    | 0.974033 |
|            resnet50             |   32    | 0.972851 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.972851 |
|          timm_resnest           |   32    | 0.971719 |
|         LearningToPaint         |   96    | 0.970699 |
|        basic_gnn_edgecnn        |    1    | 0.97067  |
|    detectron2_fcos_r_50_fpn     |    1    | 0.965887 |
|       doctr_det_predictor       |    1    | 0.964612 |
|           mnasnet1_0            |   32    | 0.963671 |
|     timm_vision_transformer     |   32    | 0.962957 |
|   mobilenet_v2_quantized_qat    |   96    | 0.962144 |
|       mobilenet_v3_large        |   32    | 0.960978 |
|     resnet50_quantized_qat      |   32    | 0.960893 |
|          mobilenet_v2           |   16    | 0.958658 |
|       shufflenet_v2_x1_0        |   64    | 0.956314 |
|             alexnet             |   128   | 0.954442 |
|           hf_BigBird            |    1    | 0.952208 |
|         resnext50_32x4d         |    8    | 0.950799 |
|        phlippe_densenet         |   128   | 0.946408 |
|         pytorch_stargan         |   16    | 0.944007 |
|              vgg16              |    4    | 0.936946 |
|          basic_gnn_gcn          |    1    | 0.935802 |
|      doctr_reco_predictor       |    1    | 0.933288 |
|          BERT_pytorch           |    2    | 0.931798 |
|           tts_angular           |   64    | 0.930894 |
|          squeezenet1_1          |   16    | 0.915829 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.912187 |
|        hf_distil_whisper        |    1    | 0.911927 |
|              dcgan              |   256   | 0.91156  |
|     pyhpc_equation_of_state     | 1048576 | 0.90912  |
|            resnet18             |    8    | 0.899893 |
|         phlippe_resnet          |   128   | 0.899238 |
|         opacus_cifar10          |   64    | 0.892653 |
|        soft_actor_critic        |   256   | 0.891667 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.885664 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.881651 |
|          lennard_jones          |  1000   | 0.870455 |
|          maml_omniglot          |    5    | 0.859275 |
|         basic_gnn_sage          |    1    | 0.858808 |
|     functorch_maml_omniglot     |    1    | 0.857598 |
|          basic_gnn_gin          |    1    | 0.856629 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.847716 |
|          fastNLP_Bert           |    1    | 0.843374 |
|      functorch_dp_cifar10       |   64    | 0.827464 |
|            moondream            |    1    | 0.824159 |
|       speech_transformer        |    1    | 0.818362 |
|          hf_Bert_large          |    1    | 0.810399 |
|           hf_T5_large           |    1    | 0.803549 |
|          hf_Longformer          |    1    | 0.799787 |
|              maml               |    1    | 0.797349 |
|             hf_Bert             |    1    | 0.794695 |
|            hf_Albert            |    1    | 0.792212 |
|               drq               |    1    | 0.769954 |
|              hf_T5              |    1    | 0.769263 |
|             hf_Bart             |    1    | 0.768443 |
|          hf_DistilBert          |    1    | 0.762155 |
|             hf_GPT2             |    1    | 0.760855 |
|           hf_Reformer           |    1    | 0.735063 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.691834 |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4545.655149 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1439.062483 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1307.015621 |
|           hf_T5_base            |    1    | 1261.486334 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1176.424864 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1090.797069 |
|          hf_GPT2_large          |    1    | 559.106597  |
|           timm_nfnet            |   128   | 535.480325  |
|           hf_T5_large           |    1    | 403.168218  |
|            moondream            |    1    | 392.507465  |
|        hf_distil_whisper        |    1    | 355.542531  |
|       Background_Matting        |    1    | 347.687809  |
|          pytorch_unet           |    1    | 227.991672  |
|           timm_regnet           |   32    |  221.68743  |
|           densenet121           |   64    | 193.842501  |
|            resnet152            |   32    | 191.886386  |
|    detectron2_fcos_r_50_fpn     |    1    | 180.728347  |
|      torch_multimodal_clip      |   32    | 170.303282  |
|             yolov3              |    8    | 166.214755  |
|             demucs              |    1    | 145.339224  |
|           hf_BigBird            |    1    | 130.581829  |
|           timm_vovnet           |   32    | 117.023397  |
|     timm_vision_transformer     |   32    | 107.790727  |
|          hf_Bert_large          |    1    | 107.383024  |
|         pytorch_stargan         |   16    |  98.174538  |
|       doctr_det_predictor       |    1    |  87.51525   |
|            resnet50             |   32    |  75.647032  |
|          hf_Longformer          |    1    |  71.780177  |
|          timm_resnest           |   32    |  54.43942   |
|       speech_transformer        |    1    |  53.676188  |
|             hf_Bart             |    1    |  53.338148  |
|        timm_efficientnet        |   64    |  51.736104  |
|              maml               |    1    |  48.377572  |
|              hf_T5              |    1    |  44.104261  |
|             alexnet             |   128   |  43.747718  |
|   mobilenet_v2_quantized_qat    |   96    |  43.350053  |
|             hf_Bert             |    1    |  41.527544  |
|           hf_Reformer           |    1    |  37.234545  |
|         LearningToPaint         |   96    |  37.203817  |
|            hf_Albert            |    1    |  36.604295  |
|              vgg16              |    4    |  36.470228  |
|     nvidia_deeprecommender      |   256   |  34.286595  |
|          fastNLP_Bert           |    1    |  34.24526   |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  33.385612  |
|          BERT_pytorch           |    2    |  30.117327  |
|     pyhpc_isoneutral_mixing     | 1048576 |  29.606454  |
|          hf_DistilBert          |    1    |  27.663415  |
|     resnet50_quantized_qat      |   32    |  25.482375  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  25.109594  |
|         resnext50_32x4d         |    8    |  24.767427  |
|             hf_GPT2             |    1    |  24.409242  |
|        phlippe_densenet         |   128   |  21.474155  |
|           tts_angular           |   64    |  20.332387  |
|        basic_gnn_edgecnn        |    1    |  19.34921   |
|              dcgan              |   256   |  18.950832  |
|       shufflenet_v2_x1_0        |   64    |  17.041364  |
|           mnasnet1_0            |   32    |  15.233267  |
|       mobilenet_v3_large        |   32    |  14.862339  |
|      functorch_dp_cifar10       |   64    |  10.82307   |
|         opacus_cifar10          |   64    |  10.757343  |
|          basic_gnn_gcn          |    1    |  9.870074   |
|          mobilenet_v2           |   16    |  9.819882   |
|            resnet18             |    8    |  9.569042   |
|              dlrm               |  2048   |  6.713707   |
|          squeezenet1_1          |   16    |  6.212817   |
|         basic_gnn_sage          |    1    |  5.122462   |
|          basic_gnn_gin          |    1    |  4.705408   |
|         phlippe_resnet          |   128   |  4.485158   |
|      doctr_reco_predictor       |    1    |  3.523632   |
|     pyhpc_equation_of_state     | 1048576 |  1.110726   |
|               drq               |    1    |  0.950013   |
|        soft_actor_critic        |   256   |  0.585438   |
|          maml_omniglot          |    5    |  0.541928   |
|     functorch_maml_omniglot     |    1    |  0.515903   |
|          lennard_jones          |  1000   |  0.202489   |
|        timm_efficientdet        |    0    |     0.0     |
|         DALLE2_pytorch          |    0    |     0.0     |
|              moco               |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.488408 |
|     MobileBertForQuestionAnswering      | 128 | 1.748237 |
|      GPT2ForSequenceClassification      |  4  | 1.74354  |
|           ElectraForCausalLM            | 32  | 1.684313 |
|       ElectraForQuestionAnswering       | 64  | 1.648586 |
|          MobileBertForMaskedLM          | 128 | 1.585642 |
|               DistillGPT2               | 16  | 1.487109 |
|            YituTechConvBert             | 16  | 1.417392 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.413709 |
|       RobertaForQuestionAnswering       | 16  | 1.403541 |
|    LayoutLMForSequenceClassification    | 16  | 1.396844 |
|        BertForQuestionAnswering         | 16  | 1.380101 |
|           RobertaForCausalLM            | 16  | 1.37427  |
|               GoogleFnet                | 16  | 1.34341  |
|           LayoutLMForMaskedLM           | 16  | 1.332131 |
|             BertForMaskedLM             | 16  | 1.325126 |
|                CamemBert                | 16  | 1.32276  |
|          AllenaiLongformerBase          |  4  | 1.295624 |
|    MegatronBertForQuestionAnswering     |  8  | 1.282592 |
|         MegatronBertForCausalLM         |  4  | 1.265992 |
|       DebertaForQuestionAnswering       | 16  | 1.21512  |
|     PLBartForConditionalGeneration      |  4  | 1.211398 |
|      MBartForConditionalGeneration      |  2  | 1.175296 |
|             OPTForCausalLM              |  2  | 1.166626 |
|                 T5Small                 |  4  | 1.166263 |
|       T5ForConditionalGeneration        |  4  | 1.165215 |
|           DebertaForMaskedLM            |  8  | 1.162561 |
|       MT5ForConditionalGeneration       | 16  | 1.155344 |
| BlenderbotSmallForConditionalGeneration | 64  | 1.137212 |
|            AlbertForMaskedLM            |  4  | 1.129024 |
|       AlbertForQuestionAnswering        |  4  | 1.128793 |
|          DistilBertForMaskedLM          | 128 | 1.093849 |
|     DistilBertForQuestionAnswering      | 256 | 1.079065 |
|         Speech2Text2ForCausalLM         | 256 | 1.076358 |
|             XGLMForCausalLM             |  8  | 1.071553 |
|       BlenderbotSmallForCausalLM        | 64  | 1.071454 |
|     M2M100ForConditionalGeneration      | 16  | 1.070537 |
|      BartForConditionalGeneration       |  2  | 1.063976 |
|          DebertaV2ForMaskedLM           |  2  | 1.059209 |
|            PLBartForCausalLM            |  8  | 1.046927 |
|            TrOCRForCausalLM             | 32  | 1.04052  |
|            MBartForCausalLM             |  4  | 1.037491 |
|     PegasusForConditionalGeneration     | 32  | 1.035855 |
|          BlenderbotForCausalLM          |  4  | 1.022194 |
|           PegasusForCausalLM            | 32  | 1.019686 |
|             BartForCausalLM             |  4  | 1.019186 |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 132.045769 |
|     PegasusForConditionalGeneration     | 32  | 72.403133  |
|          MobileBertForMaskedLM          | 128 | 70.368593  |
|      MBartForConditionalGeneration      |  2  | 69.682957  |
|     MobileBertForQuestionAnswering      | 128 | 69.049428  |
|     M2M100ForConditionalGeneration      | 16  | 68.193052  |
|       MT5ForConditionalGeneration       | 16  | 64.858085  |
|          BlenderbotForCausalLM          |  4  | 58.151578  |
|             XGLMForCausalLM             |  8  |  54.88697  |
|       T5ForConditionalGeneration        |  4  | 54.130868  |
|                 T5Small                 |  4  | 53.961608  |
| BlenderbotSmallForConditionalGeneration | 64  | 51.880585  |
|      BartForConditionalGeneration       |  2  | 49.969584  |
|          DebertaV2ForMaskedLM           |  2  | 49.156614  |
|    MegatronBertForQuestionAnswering     |  8  | 47.457412  |
|         MegatronBertForCausalLM         |  4  | 47.176241  |
|            YituTechConvBert             | 16  | 45.660254  |
|            XLNetLMHeadModel             |  8  | 43.841092  |
|     PLBartForConditionalGeneration      |  4  | 43.121576  |
|             OPTForCausalLM              |  2  | 39.381942  |
|           PegasusForCausalLM            | 32  | 37.714499  |
|            MBartForCausalLM             |  4  | 35.943219  |
|       DebertaForQuestionAnswering       | 16  |  33.3943   |
|           DebertaForMaskedLM            |  8  | 33.024666  |
|            TrOCRForCausalLM             | 32  | 32.556474  |
|      DebertaV2ForQuestionAnswering      |  1  | 32.159934  |
|      GPT2ForSequenceClassification      |  4  | 31.151968  |
|           RobertaForCausalLM            | 16  |  29.3894   |
|                CamemBert                | 16  | 28.636269  |
|       RobertaForQuestionAnswering       | 16  | 28.463069  |
|           ElectraForCausalLM            | 32  | 28.090161  |
|             BertForMaskedLM             | 16  | 27.388047  |
|       AlbertForQuestionAnswering        |  4  | 27.337193  |
|           LayoutLMForMaskedLM           | 16  | 27.316192  |
|       ElectraForQuestionAnswering       | 64  | 27.300694  |
|            AlbertForMaskedLM            |  4  | 27.045328  |
|        BertForQuestionAnswering         | 16  | 27.028762  |
|    LayoutLMForSequenceClassification    | 16  | 26.843926  |
|       BlenderbotSmallForCausalLM        | 64  | 25.764393  |
|     DistilBertForQuestionAnswering      | 256 | 25.728194  |
|               DistillGPT2               | 16  | 25.594175  |
|          DistilBertForMaskedLM          | 128 |  25.17588  |
|             BartForCausalLM             |  4  | 24.971253  |
|         Speech2Text2ForCausalLM         | 256 | 23.849432  |
|            PLBartForCausalLM            |  8  | 23.635621  |
|               GoogleFnet                | 16  | 21.449586  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  | 0.994045 |
|       AlbertForQuestionAnswering        |  4  | 0.994026 |
|     DistilBertForQuestionAnswering      | 256 | 0.993085 |
|            TrOCRForCausalLM             | 32  | 0.992353 |
|               DistillGPT2               | 16  | 0.992207 |
|           RobertaForCausalLM            | 16  | 0.992127 |
|          DistilBertForMaskedLM          | 128 | 0.991948 |
|             OPTForCausalLM              |  2  | 0.991702 |
|               GoogleFnet                | 16  | 0.991279 |
|       ElectraForQuestionAnswering       | 64  | 0.990922 |
|           ElectraForCausalLM            | 32  | 0.990922 |
|                CamemBert                | 16  | 0.990837 |
|             BertForMaskedLM             | 16  | 0.990823 |
|            PLBartForCausalLM            |  8  | 0.99064  |
|           LayoutLMForMaskedLM           | 16  | 0.990205 |
|    MegatronBertForQuestionAnswering     |  8  | 0.990105 |
|            YituTechConvBert             | 16  | 0.989613 |
|            MBartForCausalLM             |  4  | 0.989548 |
|     PegasusForConditionalGeneration     | 32  | 0.988955 |
|        BertForQuestionAnswering         | 16  | 0.988844 |
|       DebertaForQuestionAnswering       | 16  | 0.98872  |
|    LayoutLMForSequenceClassification    | 16  | 0.988694 |
|       RobertaForQuestionAnswering       | 16  | 0.988686 |
|         Speech2Text2ForCausalLM         | 256 | 0.988157 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.987915 |
|      GPT2ForSequenceClassification      |  4  | 0.987814 |
|     PLBartForConditionalGeneration      |  4  | 0.987488 |
|           PegasusForCausalLM            | 32  | 0.987347 |
|      MBartForConditionalGeneration      |  2  | 0.98717  |
|       BlenderbotSmallForCausalLM        | 64  | 0.986766 |
|             BartForCausalLM             |  4  | 0.985744 |
|          BlenderbotForCausalLM          |  4  | 0.985279 |
|           DebertaForMaskedLM            |  8  | 0.985114 |
|         MegatronBertForCausalLM         |  4  | 0.984346 |
|      BartForConditionalGeneration       |  2  | 0.983468 |
|          MobileBertForMaskedLM          | 128 | 0.982608 |
|       MT5ForConditionalGeneration       | 16  | 0.982569 |
|            XLNetLMHeadModel             |  8  | 0.982202 |
|                 T5Small                 |  4  | 0.980933 |
|       T5ForConditionalGeneration        |  4  | 0.980867 |
|     M2M100ForConditionalGeneration      | 16  | 0.97781  |
|     MobileBertForQuestionAnswering      | 128 | 0.975064 |
|          DebertaV2ForMaskedLM           |  2  | 0.974351 |
|          AllenaiLongformerBase          |  4  | 0.972315 |
|             XGLMForCausalLM             |  8  | 0.971558 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.869538 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|            AlbertForMaskedLM            |  4  | 2706.049141 |
|       AlbertForQuestionAnswering        |  4  | 2692.116313 |
|            XLNetLMHeadModel             |  8  | 1324.900845 |
|     PegasusForConditionalGeneration     | 32  | 1012.263144 |
|            TrOCRForCausalLM             | 32  | 986.780975  |
|     DistilBertForQuestionAnswering      | 256 | 896.977797  |
|    MegatronBertForQuestionAnswering     |  8  |   800.147   |
|            MBartForCausalLM             |  4  | 688.028529  |
|      MBartForConditionalGeneration      |  2  | 687.116678  |
|          BlenderbotForCausalLM          |  4  | 684.804834  |
|          DistilBertForMaskedLM          | 128 | 678.828344  |
|          DebertaV2ForMaskedLM           |  2  | 626.457659  |
|           RobertaForCausalLM            | 16  | 614.866698  |
|     M2M100ForConditionalGeneration      | 16  | 610.452307  |
|      BartForConditionalGeneration       |  2  | 609.317822  |
|             OPTForCausalLM              |  2  | 604.127836  |
|            YituTechConvBert             | 16  | 581.181631  |
|                CamemBert                | 16  | 573.319803  |
|             BertForMaskedLM             | 16  | 567.041286  |
|           LayoutLMForMaskedLM           | 16  | 566.574055  |
|          AllenaiLongformerBase          |  4  | 544.589196  |
|             BartForCausalLM             |  4  | 537.215971  |
|       DebertaForQuestionAnswering       | 16  | 537.036909  |
|            PLBartForCausalLM            |  8  | 515.910824  |
|           PegasusForCausalLM            | 32  | 498.887705  |
| BlenderbotSmallForConditionalGeneration | 64  | 496.033675  |
|     PLBartForConditionalGeneration      |  4  | 484.222161  |
|         MegatronBertForCausalLM         |  4  | 458.166767  |
|        BertForQuestionAnswering         | 16  |  454.25486  |
|    LayoutLMForSequenceClassification    | 16  | 452.289759  |
|       RobertaForQuestionAnswering       | 16  | 439.674945  |
|               GoogleFnet                | 16  | 415.082058  |
|          MobileBertForMaskedLM          | 128 | 403.327641  |
|               DistillGPT2               | 16  | 390.518986  |
|             XGLMForCausalLM             |  8  | 388.352512  |
|           DebertaForMaskedLM            |  8  | 378.465147  |
|       ElectraForQuestionAnswering       | 64  | 339.150558  |
|       T5ForConditionalGeneration        |  4  |  336.10845  |
|                 T5Small                 |  4  | 335.550979  |
|         Speech2Text2ForCausalLM         | 256 | 283.411883  |
|      GPT2ForSequenceClassification      |  4  | 282.272886  |
|       BlenderbotSmallForCausalLM        | 64  | 281.570356  |
|           ElectraForCausalLM            | 32  | 260.621014  |
|     MobileBertForQuestionAnswering      | 128 | 250.341074  |
|       MT5ForConditionalGeneration       | 16  | 230.384096  |
|      DebertaV2ForQuestionAnswering      |  1  | 228.207349  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           fbnetc_100            | 512  | 3.943099 |
|           mnasnet_100           | 512  | 3.851247 |
|            lcnet_050            | 256  | 3.749292 |
|         mobilenetv2_100         | 128  | 3.700808 |
|      mobilenetv3_large_100      | 512  |  3.6087  |
|          spnasnet_100           | 128  | 3.44228  |
|            fbnetv3_b            | 256  | 3.409014 |
|           regnety_002           | 1024 | 3.265449 |
|           rexnet_100            | 256  | 3.048194 |
|       tf_efficientnet_b0        | 128  | 2.864606 |
|            tinynet_a            | 128  | 2.725792 |
|        ese_vovnet19b_dw         | 256  | 2.616952 |
|          botnet26t_256          | 128  | 2.548142 |
|          pnasnet5large          |  16  | 2.509901 |
|            hrnet_w18            | 128  | 2.506645 |
|           res2next50            | 128  | 2.356142 |
|          ghostnet_100           | 512  | 2.331404 |
|       eca_botnext26ts_256       | 128  | 2.303153 |
|       gluon_inception_v3        | 256  | 2.272313 |
|          inception_v3           | 128  | 2.225342 |
|           resnest101e           |  64  | 2.209786 |
|        eca_halonext26ts         | 128  | 2.187441 |
|        adv_inception_v3         | 128  | 2.187339 |
|             dla102              | 128  | 2.186889 |
|        res2net50_14w_8s         | 128  | 2.097236 |
|        res2net101_26w_4s        | 128  | 2.090369 |
|            repvgg_a2            | 128  | 2.050954 |
|          cspdarknet53           |  64  | 2.036805 |
|            nfnet_l0             | 128  | 1.989183 |
|        convmixer_768_32         |  32  | 1.953507 |
|            gernet_l             | 128  | 1.883343 |
|           dm_nfnet_f0           | 128  | 1.852381 |
|           tf_mixnet_l           | 128  | 1.831815 |
|           selecsls42b           | 128  | 1.773016 |
|        sebotnet33ts_256         |  64  | 1.754239 |
|            mixnet_l             | 128  | 1.657897 |
|         visformer_small         | 128  | 1.648058 |
|         poolformer_m36          |  64  | 1.64343  |
|           volo_d1_224           |  64  | 1.605101 |
|     swsl_resnext101_32x16d      |  32  | 1.570148 |
|             dpn107              |  64  | 1.480236 |
|            levit_128            | 1024 | 1.453024 |
|           mobilevit_s           |  64  | 1.430767 |
|          gmlp_s16_224           | 128  | 1.195283 |
|          resmlp_12_224          | 128  | 1.186141 |
|      xcit_large_24_p8_224       |  16  | 1.176684 |
|           convit_base           |  64  | 1.165255 |
|          gmixer_24_224          | 128  | 1.142726 |
|          cait_m36_384           |  4   | 1.120068 |
|  swin_base_patch4_window7_224   |  64  | 1.099129 |
|        tnt_s_patch16_224        | 128  | 1.087625 |
|        twins_pcpvt_base         | 128  | 1.069288 |
|      beit_base_patch16_224      |  64  | 1.053041 |
|          mixer_b16_224          | 128  | 1.051456 |
|          convnext_base          |  64  | 1.047041 |
|      vit_base_patch16_224       |  64  | 1.025853 |
|            pit_b_224            |  64  | 1.024879 |
| deit_base_distilled_patch16_224 |  64  | 1.022862 |
|          jx_nest_base           |  32  | 1.013499 |
|         crossvit_9_240          | 256  | 1.001284 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|        adv_inception_v3         | 8  |    pass     |
|      beit_base_patch16_224      | 8  |    pass     |
|          botnet26t_256          | 8  |    pass     |
|          cait_m36_384           | 8  |    pass     |
|           convit_base           | 8  |    pass     |
|        convmixer_768_32         | 8  |    pass     |
|          convnext_base          | 8  |    pass     |
|         crossvit_9_240          | 8  |    pass     |
|          cspdarknet53           | 8  |    pass     |
| deit_base_distilled_patch16_224 | 8  |    pass     |
|             dla102              | 8  |    pass     |
|           dm_nfnet_f0           | 8  |    pass     |
|             dpn107              | 8  |    pass     |
|       eca_botnext26ts_256       | 8  |    pass     |
|        eca_halonext26ts         | 8  |    pass     |
|        ese_vovnet19b_dw         | 8  |    pass     |
|           fbnetc_100            | 8  |    pass     |
|            fbnetv3_b            | 8  |    pass     |
|            gernet_l             | 8  |    pass     |
|          ghostnet_100           | 8  |    pass     |
|       gluon_inception_v3        | 8  |    pass     |
|          gmixer_24_224          | 8  |    pass     |
|          gmlp_s16_224           | 8  |    pass     |
|            hrnet_w18            | 8  |    pass     |
|          inception_v3           | 8  |    pass     |
|          jx_nest_base           | 8  |    pass     |
|            lcnet_050            | 8  |    pass     |
|            levit_128            | 8  |    pass     |
|      xcit_large_24_p8_224       | 8  |    pass     |
|          mixer_b16_224          | 8  |    pass     |
|            mixnet_l             | 8  |    pass     |
|           mnasnet_100           | 8  |    pass     |
|         mobilenetv2_100         | 8  |    pass     |
|      mobilenetv3_large_100      | 8  |    pass     |
|           mobilevit_s           | 8  |    pass     |
|            nfnet_l0             | 8  |    pass     |
|            pit_b_224            | 8  |    pass     |
|          pnasnet5large          | 8  |    pass     |
|         poolformer_m36          | 8  |    pass     |
|           regnety_002           | 8  |    pass     |
|            repvgg_a2            | 8  |    pass     |
|        res2net101_26w_4s        | 8  |    pass     |
|        res2net50_14w_8s         | 8  |    pass     |
|           res2next50            | 8  |    pass     |
|          resmlp_12_224          | 8  |    pass     |
|           resnest101e           | 8  |    pass     |
|           rexnet_100            | 8  |    pass     |
|        sebotnet33ts_256         | 8  |    pass     |
|           selecsls42b           | 8  |    pass     |
|          spnasnet_100           | 8  |    pass     |
|  swin_base_patch4_window7_224   | 8  |    pass     |
|     swsl_resnext101_32x16d      | 8  |    pass     |
|       tf_efficientnet_b0        | 8  |    pass     |
|           tf_mixnet_l           | 8  |    pass     |
|            tinynet_a            | 8  |    pass     |
|        tnt_s_patch16_224        | 8  |    pass     |
|        twins_pcpvt_base         | 8  |    pass     |
|         visformer_small         | 8  |    pass     |
|      vit_base_patch16_224       | 8  |    pass     |
|           volo_d1_224           | 8  |    pass     |
|         coat_lite_mini          | 8  | fail_to_run |
+---------------------------------+----+-------------+

Compilation latency (sec)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|  swin_base_patch4_window7_224   |  64  | 130.908683 |
|          pnasnet5large          |  16  | 108.393319 |
|           mobilevit_s           |  64  | 98.395692  |
|           tf_mixnet_l           | 128  | 98.227384  |
|      xcit_large_24_p8_224       |  16  |  95.39715  |
|          cait_m36_384           |  4   |  95.37439  |
|        twins_pcpvt_base         | 128  | 94.327167  |
|          jx_nest_base           |  32  | 88.374169  |
|             dpn107              |  64  | 87.647816  |
|        tnt_s_patch16_224        | 128  | 81.805122  |
|           volo_d1_224           |  64  | 78.161697  |
|            levit_128            | 1024 | 76.221297  |
|         crossvit_9_240          | 256  | 74.737096  |
|           rexnet_100            | 256  | 73.886613  |
|        res2net50_14w_8s         | 128  | 73.624459  |
|            mixnet_l             | 128  | 72.341749  |
|        eca_halonext26ts         | 128  | 72.143713  |
|         poolformer_m36          |  64  |  70.20761  |
|        sebotnet33ts_256         |  64  | 70.177806  |
|          ghostnet_100           | 512  |  67.93474  |
|           dm_nfnet_f0           | 128  |  60.34578  |
|           convit_base           |  64  |  59.5111   |
|            hrnet_w18            | 128  | 58.779232  |
|       eca_botnext26ts_256       | 128  | 57.391923  |
|          convnext_base          |  64  | 55.150342  |
|        res2net101_26w_4s        | 128  | 53.379047  |
|       tf_efficientnet_b0        | 128  | 49.351562  |
|            nfnet_l0             | 128  | 47.200433  |
|            pit_b_224            |  64  | 45.710197  |
|          gmixer_24_224          | 128  |  44.28057  |
|       gluon_inception_v3        | 256  | 43.953614  |
|          gmlp_s16_224           | 128  | 43.605377  |
|           resnest101e           |  64  | 43.184925  |
|            fbnetv3_b            | 256  | 42.887978  |
|          botnet26t_256          | 128  | 42.767785  |
|           res2next50            | 128  | 42.350229  |
|          inception_v3           | 128  | 42.080731  |
|        adv_inception_v3         | 128  | 42.028478  |
|            tinynet_a            | 128  | 41.846292  |
|         visformer_small         | 128  | 35.663283  |
|             dla102              | 128  | 33.291449  |
|          cspdarknet53           |  64  | 32.231495  |
| deit_base_distilled_patch16_224 |  64  | 30.463404  |
|      vit_base_patch16_224       |  64  | 30.344233  |
|          mixer_b16_224          | 128  | 30.331557  |
|      mobilenetv3_large_100      | 512  | 30.198185  |
|        ese_vovnet19b_dw         | 256  | 29.569399  |
|      beit_base_patch16_224      |  64  | 25.750071  |
|           regnety_002           | 1024 | 22.320963  |
|        convmixer_768_32         |  32  | 21.763003  |
|          resmlp_12_224          | 128  | 21.114816  |
|            repvgg_a2            | 128  | 17.630771  |
|           selecsls42b           | 128  | 17.535685  |
|            lcnet_050            | 256  | 17.353313  |
|     swsl_resnext101_32x16d      |  32  | 16.525342  |
|         mobilenetv2_100         | 128  | 13.310079  |
|          spnasnet_100           | 128  | 11.670055  |
|            gernet_l             | 128  | 11.343919  |
|           fbnetc_100            | 512  | 10.718542  |
|           mnasnet_100           | 512  | 10.046169  |
+---------------------------------+------+------------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.997678 |
|           fbnetc_100            | 512  | 0.996914 |
|      mobilenetv3_large_100      | 512  | 0.996266 |
|           dm_nfnet_f0           | 128  | 0.996011 |
|           regnety_002           | 1024 | 0.995824 |
|           mnasnet_100           | 512  | 0.995804 |
|            fbnetv3_b            | 256  | 0.995787 |
|          convnext_base          |  64  | 0.995777 |
|          ghostnet_100           | 512  | 0.99574  |
|            levit_128            | 1024 | 0.994806 |
|        res2net101_26w_4s        | 128  | 0.994455 |
|       eca_botnext26ts_256       | 128  | 0.994264 |
|        eca_halonext26ts         | 128  | 0.993999 |
|       gluon_inception_v3        | 256  | 0.993879 |
|             dpn107              |  64  | 0.993856 |
|           rexnet_100            | 256  | 0.993817 |
|             dla102              | 128  | 0.993677 |
|        twins_pcpvt_base         | 128  | 0.993431 |
|           res2next50            | 128  | 0.993389 |
|           tf_mixnet_l           | 128  | 0.993303 |
|          mixer_b16_224          | 128  | 0.993289 |
|           convit_base           |  64  | 0.99306  |
|        convmixer_768_32         |  32  | 0.992995 |
|      xcit_large_24_p8_224       |  16  | 0.99296  |
|          gmixer_24_224          | 128  | 0.992897 |
|          gmlp_s16_224           | 128  | 0.992839 |
|       tf_efficientnet_b0        | 128  | 0.992736 |
|        res2net50_14w_8s         | 128  | 0.992704 |
|          botnet26t_256          | 128  | 0.992509 |
|          cspdarknet53           |  64  | 0.992404 |
|         visformer_small         | 128  | 0.992112 |
|      beit_base_patch16_224      |  64  | 0.991982 |
|            mixnet_l             | 128  | 0.991776 |
|            gernet_l             | 128  | 0.991736 |
|            nfnet_l0             | 128  | 0.991588 |
|        adv_inception_v3         | 128  | 0.991358 |
|          inception_v3           | 128  | 0.991228 |
|           resnest101e           |  64  | 0.991081 |
|        sebotnet33ts_256         |  64  | 0.990891 |
|           mobilevit_s           |  64  | 0.990745 |
|          pnasnet5large          |  16  | 0.989736 |
|           selecsls42b           | 128  | 0.98965  |
|         mobilenetv2_100         | 128  | 0.989459 |
|            pit_b_224            |  64  | 0.989299 |
|          resmlp_12_224          | 128  | 0.989099 |
|          cait_m36_384           |  4   | 0.988893 |
|        tnt_s_patch16_224        | 128  | 0.98884  |
|      vit_base_patch16_224       |  64  | 0.988773 |
|         poolformer_m36          |  64  | 0.988565 |
|            hrnet_w18            | 128  | 0.988412 |
|            tinynet_a            | 128  | 0.988336 |
| deit_base_distilled_patch16_224 |  64  | 0.987815 |
|  swin_base_patch4_window7_224   |  64  | 0.987625 |
|          spnasnet_100           | 128  | 0.987566 |
|     swsl_resnext101_32x16d      |  32  | 0.987196 |
|            lcnet_050            | 256  | 0.985657 |
|            repvgg_a2            | 128  | 0.984332 |
|           volo_d1_224           |  64  | 0.984026 |
|          jx_nest_base           |  32  | 0.983562 |
|         crossvit_9_240          | 256  | 0.974483 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+-------------+
|              name               |  bs  |  inductor   |
+---------------------------------+------+-------------+
|      xcit_large_24_p8_224       |  16  | 1481.06995  |
|          convnext_base          |  64  | 1181.872861 |
|          cait_m36_384           |  4   | 1142.749737 |
|          mixer_b16_224          | 128  | 1052.602809 |
|           dm_nfnet_f0           | 128  | 949.292494  |
|           convit_base           |  64  | 938.134276  |
|             dpn107              |  64  | 931.861017  |
|  swin_base_patch4_window7_224   |  64  | 848.795209  |
|        twins_pcpvt_base         | 128  | 841.560911  |
|        tnt_s_patch16_224        | 128  | 833.794081  |
|       gluon_inception_v3        | 256  | 814.212788  |
| deit_base_distilled_patch16_224 |  64  | 696.095962  |
|      vit_base_patch16_224       |  64  | 693.447478  |
|      beit_base_patch16_224      |  64  | 685.142196  |
|        res2net101_26w_4s        | 128  | 654.102381  |
|     swsl_resnext101_32x16d      |  32  | 642.175023  |
|            nfnet_l0             | 128  | 607.998671  |
|          gmlp_s16_224           | 128  | 594.528252  |
|            levit_128            | 1024 | 568.010934  |
|          gmixer_24_224          | 128  | 567.820961  |
|            pit_b_224            |  64  | 565.348903  |
|        ese_vovnet19b_dw         | 256  |  552.75331  |
|          jx_nest_base           |  32  | 535.962953  |
|             dla102              | 128  | 532.805378  |
|         crossvit_9_240          | 256  | 515.561175  |
|           resnest101e           |  64  | 486.772068  |
|         poolformer_m36          |  64  |  476.36724  |
|        convmixer_768_32         |  32  | 444.300745  |
|            hrnet_w18            | 128  | 441.889714  |
|           volo_d1_224           |  64  | 440.869911  |
|        adv_inception_v3         | 128  | 409.654337  |
|          inception_v3           | 128  | 409.193773  |
|        res2net50_14w_8s         | 128  | 404.632867  |
|         visformer_small         | 128  | 394.772762  |
|            mixnet_l             | 128  | 365.844095  |
|           res2next50            | 128  | 362.901758  |
|          ghostnet_100           | 512  | 359.626773  |
|           tf_mixnet_l           | 128  | 354.606441  |
|          pnasnet5large          |  16  | 354.481994  |
|            repvgg_a2            | 128  | 336.933225  |
|        eca_halonext26ts         | 128  |  316.88768  |
|           fbnetc_100            | 512  | 305.306735  |
|       eca_botnext26ts_256       | 128  | 294.939586  |
|            gernet_l             | 128  | 294.762426  |
|        sebotnet33ts_256         |  64  | 282.334166  |
|           regnety_002           | 1024 | 282.254583  |
|          botnet26t_256          | 128  | 279.103636  |
|          resmlp_12_224          | 128  | 266.297118  |
|           mobilevit_s           |  64  | 262.664895  |
|          cspdarknet53           |  64  | 262.325319  |
|           mnasnet_100           | 512  | 258.155183  |
|            fbnetv3_b            | 256  | 244.466648  |
|           selecsls42b           | 128  | 231.808619  |
|      mobilenetv3_large_100      | 512  | 229.756583  |
|           rexnet_100            | 256  | 228.118868  |
|       tf_efficientnet_b0        | 128  | 120.832276  |
|            tinynet_a            | 128  |  86.546358  |
|         mobilenetv2_100         | 128  |  73.600689  |
|          spnasnet_100           | 128  |  67.772561  |
|            lcnet_050            | 256  |  27.514478  |
+---------------------------------+------+-------------+

@zxd1997066
Copy link
Contributor

[dynamic] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-05-05 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 6d30803d64953955df63da56833bf4eb52249aae
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+06ad737
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 96%, 76/79 | 100%, 46/46 | 100%, 60/60 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.55x    |    1.19x    |    1.50x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   25.69    |    26.78    |    36.29    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.85x    |    0.81x    |    0.82x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 64.706471 |
|     pyhpc_equation_of_state     |    1    | 23.225694 |
|         basic_gnn_sage          |    1    | 3.548749  |
|          basic_gnn_gcn          |    1    | 3.485282  |
|          basic_gnn_gin          |    1    | 3.446196  |
|     functorch_maml_omniglot     |    1    |  3.31748  |
|          squeezenet1_1          |    1    | 3.289923  |
|           timm_nfnet            |    1    | 2.752744  |
|          maml_omniglot          |    5    | 2.738171  |
|         opacus_cifar10          |    1    | 2.247841  |
|            resnet18             |    1    | 2.177481  |
|              dcgan              |    1    | 2.144122  |
|       shufflenet_v2_x1_0        |    1    | 2.012722  |
|          timm_resnest           |    1    | 1.959362  |
|      functorch_dp_cifar10       |    1    | 1.882748  |
|          mobilenet_v2           |    1    | 1.851102  |
|          lennard_jones          |    1    | 1.817732  |
|            resnet50             |    1    | 1.759249  |
|           mnasnet1_0            |    1    |  1.73922  |
|       mobilenet_v3_large        |    1    |  1.66744  |
|         phlippe_resnet          |    1    | 1.666692  |
|            resnet152            |    1    | 1.655067  |
|           densenet121           |    1    | 1.624018  |
|           timm_vovnet           |    1    | 1.588066  |
|        timm_efficientnet        |    1    |  1.57767  |
|         LearningToPaint         |    1    | 1.560081  |
|      doctr_reco_predictor       |    1    | 1.480065  |
|           timm_regnet           |    1    | 1.472881  |
|         resnext50_32x4d         |    1    | 1.470932  |
|              vgg16              |    1    |  1.44603  |
|        phlippe_densenet         |    1    | 1.418833  |
|        basic_gnn_edgecnn        |    1    | 1.397063  |
|              llama              |    1    | 1.390349  |
|             yolov3              |    1    | 1.376633  |
|             alexnet             |    1    | 1.347931  |
|          BERT_pytorch           |    1    | 1.296773  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.283388  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.278872  |
|            hf_Albert            |    1    | 1.275422  |
|              maml               |    1    | 1.263621  |
|             hf_GPT2             |    1    | 1.246072  |
|       doctr_det_predictor       |    1    | 1.232767  |
|          fastNLP_Bert           |    1    | 1.232652  |
|          hf_GPT2_large          |    1    | 1.226757  |
|            moondream            |    1    | 1.218712  |
|     timm_vision_transformer     |    1    | 1.197172  |
|               drq               |    1    |  1.19338  |
|         pytorch_stargan         |   16    | 1.190327  |
|  timm_vision_transformer_large  |    1    | 1.165264  |
|          hf_Bert_large          |    1    | 1.164599  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.163518  |
|           hf_BigBird            |    1    |  1.15614  |
|             hf_Bert             |    1    | 1.152323  |
|      torch_multimodal_clip      |    1    | 1.140328  |
|              dlrm               |    1    | 1.139811  |
|          hf_DistilBert          |    1    | 1.128827  |
|             hf_Bart             |    1    |  1.09818  |
|          pytorch_unet           |    1    |  1.07553  |
|       speech_transformer        |    1    | 1.066267  |
|        hf_distil_whisper        |    1    | 1.066128  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.050732  |
|        soft_actor_critic        |   256   | 1.042271  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.041585  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.031939  |
|          hf_Longformer          |    1    |  1.01142  |
|             demucs              |    1    | 1.007274  |
|           tts_angular           |    1    | 1.000322  |
|     resnet50_quantized_qat      |    1    | 0.996289  |
|   mobilenet_v2_quantized_qat    |    1    | 0.983499  |
|     nvidia_deeprecommender      |    1    | 0.931755  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.914424  |
|           hf_Reformer           |    1    | 0.854107  |
|       Background_Matting        |    1    | 0.831271  |
|           hf_T5_large           |    1    | 0.790129  |
|              hf_T5              |    1    | 0.709232  |
|           hf_T5_base            |    1    | 0.587937  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|             yolov3              |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|           densenet121           |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|             demucs              |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|              llama              |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|            moondream            |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_fpn |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    1    |    fail_to_run     |
|           Super_SloMo           |    1    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           hf_T5_base            |    1    | 99.230978 |
|           densenet121           |    1    | 90.707937 |
|           hf_BigBird            |    1    | 75.816389 |
|           hf_T5_large           |    1    | 71.607923 |
|    detectron2_fcos_r_50_fpn     |    1    | 69.59907  |
|              maml               |    1    | 53.899783 |
|          hf_Longformer          |    1    | 52.587123 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 51.147532 |
|           timm_nfnet            |    1    | 48.449317 |
|           hf_Reformer           |    1    | 47.644653 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 43.793973 |
|        phlippe_densenet         |    1    | 42.859414 |
|       speech_transformer        |    1    | 39.411546 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 38.823046 |
|  timm_vision_transformer_large  |    1    | 36.546607 |
|      torch_multimodal_clip      |    1    | 35.838315 |
|             demucs              |    1    | 35.378986 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 33.949525 |
|        timm_efficientnet        |    1    | 33.643778 |
|              hf_T5              |    1    | 32.684557 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 32.042103 |
|             yolov3              |    1    | 28.950602 |
|       Background_Matting        |    1    | 28.908242 |
|         opacus_cifar10          |    1    | 28.72294  |
|        hf_distil_whisper        |    1    | 28.211874 |
|            moondream            |    1    | 27.872568 |
|      functorch_dp_cifar10       |    1    | 27.319336 |
|          hf_GPT2_large          |    1    | 26.777984 |
|          hf_Bert_large          |    1    | 25.686709 |
|          timm_resnest           |    1    | 25.603818 |
|       doctr_det_predictor       |    1    | 25.084121 |
|       shufflenet_v2_x1_0        |    1    | 24.153606 |
|       mobilenet_v3_large        |    1    | 24.12644  |
|              llama              |    1    | 23.890641 |
|          BERT_pytorch           |    1    | 23.067912 |
|          fastNLP_Bert           |    1    | 22.415745 |
|             hf_Bart             |    1    | 22.349034 |
|           timm_vovnet           |    1    | 21.628012 |
|           timm_regnet           |    1    | 21.36721  |
|     timm_vision_transformer     |    1    | 21.326421 |
|          pytorch_unet           |    1    | 20.038389 |
|            hf_Albert            |    1    | 20.001903 |
|             hf_GPT2             |    1    | 19.46397  |
|         pytorch_stargan         |   16    | 19.276685 |
|          hf_DistilBert          |    1    | 18.976415 |
|             hf_Bert             |    1    | 18.798043 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 17.96656  |
|          squeezenet1_1          |    1    | 16.037949 |
|            resnet152            |    1    | 15.757366 |
|              vgg16              |    1    | 15.045166 |
|      doctr_reco_predictor       |    1    | 13.729214 |
|             alexnet             |    1    | 12.730316 |
|     pyhpc_isoneutral_mixing     |    1    | 12.204166 |
|         resnext50_32x4d         |    1    | 11.547972 |
|            resnet50             |    1    | 11.524165 |
|               drq               |    1    | 10.946628 |
|              dlrm               |    1    | 10.606158 |
|            resnet18             |    1    | 10.183728 |
|          mobilenet_v2           |    1    | 10.106115 |
|           mnasnet1_0            |    1    | 9.873988  |
|     functorch_maml_omniglot     |    1    | 9.732541  |
|          basic_gnn_gcn          |    1    | 9.516348  |
|     nvidia_deeprecommender      |    1    | 9.234759  |
|         LearningToPaint         |    1    | 8.999753  |
|          basic_gnn_gin          |    1    |  8.98922  |
|        basic_gnn_edgecnn        |    1    | 8.927257  |
|     pyhpc_equation_of_state     |    1    | 8.763813  |
|          maml_omniglot          |    5    | 8.675051  |
|         phlippe_resnet          |    1    | 8.620924  |
|        soft_actor_critic        |   256   | 8.027205  |
|         basic_gnn_sage          |    1    | 7.870296  |
|          lennard_jones          |    1    |  5.76899  |
|              dcgan              |    1    | 5.696608  |
|           tts_angular           |    1    |  5.5379   |
|   mobilenet_v2_quantized_qat    |    1    | 0.097452  |
|     resnet50_quantized_qat      |    1    | 0.070606  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.98856  |
|           hf_T5_base            |    1    | 0.987337 |
|             demucs              |    1    | 0.984633 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.982591 |
|       Background_Matting        |    1    | 0.982049 |
|          pytorch_unet           |    1    | 0.978444 |
|          hf_GPT2_large          |    1    | 0.978369 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.972352 |
|        basic_gnn_edgecnn        |    1    | 0.971675 |
|       doctr_det_predictor       |    1    | 0.97004  |
|    detectron2_fcos_r_50_fpn     |    1    | 0.962537 |
|     resnet50_quantized_qat      |    1    | 0.956354 |
|           hf_BigBird            |    1    | 0.949094 |
|         LearningToPaint         |    1    | 0.947613 |
|      doctr_reco_predictor       |    1    | 0.943922 |
|         pytorch_stargan         |   16    | 0.943173 |
|         basic_gnn_sage          |    1    | 0.93895  |
|          basic_gnn_gin          |    1    | 0.938682 |
|          basic_gnn_gcn          |    1    | 0.938524 |
|      torch_multimodal_clip      |    1    | 0.927522 |
|   mobilenet_v2_quantized_qat    |    1    | 0.926944 |
|              llama              |    1    | 0.919574 |
|        hf_distil_whisper        |    1    | 0.915485 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.914192 |
|           tts_angular           |    1    | 0.889599 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.889318 |
|        soft_actor_critic        |   256   | 0.888472 |
|         opacus_cifar10          |    1    | 0.88393  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.883488 |
|        timm_efficientnet        |    1    |  0.8777  |
|          mobilenet_v2           |    1    | 0.873207 |
|          lennard_jones          |    1    | 0.862812 |
|          maml_omniglot          |    5    | 0.860638 |
|           mnasnet1_0            |    1    | 0.860385 |
|          fastNLP_Bert           |    1    | 0.859749 |
|     functorch_maml_omniglot     |    1    | 0.858661 |
|          squeezenet1_1          |    1    | 0.857028 |
|          timm_resnest           |    1    | 0.856744 |
|              dcgan              |    1    | 0.852823 |
|       mobilenet_v3_large        |    1    | 0.847854 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.847604 |
|         phlippe_resnet          |    1    | 0.839451 |
|       shufflenet_v2_x1_0        |    1    | 0.839163 |
|     pyhpc_equation_of_state     |    1    | 0.834342 |
|       speech_transformer        |    1    | 0.828269 |
|            moondream            |    1    | 0.825462 |
|        phlippe_densenet         |    1    | 0.819332 |
|           timm_nfnet            |    1    | 0.81634  |
|          hf_Bert_large          |    1    | 0.815538 |
|         resnext50_32x4d         |    1    | 0.811488 |
|     pyhpc_isoneutral_mixing     |    1    | 0.811436 |
|     timm_vision_transformer     |    1    | 0.804058 |
|             hf_Bert             |    1    | 0.803597 |
|            hf_Albert            |    1    | 0.802302 |
|           hf_T5_large           |    1    | 0.799614 |
|          hf_Longformer          |    1    | 0.799395 |
|              maml               |    1    | 0.794876 |
|             hf_Bart             |    1    | 0.781839 |
|             yolov3              |    1    | 0.78095  |
|          BERT_pytorch           |    1    | 0.779048 |
|          hf_DistilBert          |    1    | 0.777344 |
|            resnet50             |    1    | 0.769329 |
|             hf_GPT2             |    1    | 0.768281 |
|            resnet18             |    1    | 0.764112 |
|               drq               |    1    | 0.762415 |
|           timm_regnet           |    1    | 0.761354 |
|           timm_vovnet           |    1    | 0.758996 |
|           densenet121           |    1    | 0.758076 |
|              hf_T5              |    1    | 0.750968 |
|      functorch_dp_cifar10       |    1    | 0.743732 |
|             alexnet             |    1    | 0.735705 |
|  timm_vision_transformer_large  |    1    | 0.73301  |
|           hf_Reformer           |    1    | 0.723766 |
|              vgg16              |    1    | 0.723098 |
|            resnet152            |    1    | 0.694075 |
|     nvidia_deeprecommender      |    1    | 0.672984 |
|              moco               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26186.188628 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11772.379405 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11199.507821 |
|          hf_GPT2_large          |    1    | 10136.885941 |
|           hf_T5_large           |    1    | 7452.772051  |
|            moondream            |    1    | 7356.989746  |
|        hf_distil_whisper        |    1    | 6958.146773  |
|       Background_Matting        |    1    | 6702.643974  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5826.330995  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 5041.922115  |
|          pytorch_unet           |    1    | 4695.050197  |
|  timm_vision_transformer_large  |    1    | 2787.029725  |
|    detectron2_fcos_r_50_fpn     |    1    | 2540.901144  |
|             demucs              |    1    | 2238.818701  |
|         pytorch_stargan         |   16    | 2017.708017  |
|       doctr_det_predictor       |    1    | 1798.917762  |
|          hf_Bert_large          |    1    | 1762.955808  |
|           hf_BigBird            |    1    | 1470.409877  |
|      torch_multimodal_clip      |    1    |  1238.90883  |
|          hf_Longformer          |    1    | 1106.365377  |
|             hf_Bart             |    1    |  882.172596  |
|              hf_T5              |    1    |  766.439827  |
|             hf_Bert             |    1    |  680.323344  |
|       speech_transformer        |    1    |  674.209081  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  623.479006  |
|            hf_Albert            |    1    |  573.095982  |
|          fastNLP_Bert           |    1    |  522.127346  |
|             yolov3              |    1    |  430.163249  |
|           hf_Reformer           |    1    |  415.93393   |
|          hf_DistilBert          |    1    |  415.368953  |
|             hf_GPT2             |    1    |  356.632343  |
|        basic_gnn_edgecnn        |    1    |  232.220772  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  210.457932  |
|              vgg16              |    1    |  189.497444  |
|           timm_regnet           |    1    |  150.169471  |
|          BERT_pytorch           |    1    |  141.391847  |
|            resnet152            |    1    |  137.816845  |
|           timm_nfnet            |    1    |  97.123494   |
|           timm_vovnet           |    1    |  80.109073   |
|              maml               |    1    |  74.123784   |
|     timm_vision_transformer     |    1    |  59.053722   |
|     nvidia_deeprecommender      |    1    |  58.193819   |
|         resnext50_32x4d         |    1    |  57.411162   |
|           tts_angular           |    1    |  54.184316   |
|            resnet50             |    1    |  51.430051   |
|           densenet121           |    1    |  46.232799   |
|          timm_resnest           |    1    |  33.624208   |
|          basic_gnn_gcn          |    1    |  29.280304   |
|      doctr_reco_predictor       |    1    |  23.373619   |
|              llama              |    1    |  22.849224   |
|            resnet18             |    1    |  22.251442   |
|             alexnet             |    1    |  22.228278   |
|     resnet50_quantized_qat      |    1    |   18.16646   |
|          basic_gnn_gin          |    1    |  16.746368   |
|         basic_gnn_sage          |    1    |  16.533805   |
|        timm_efficientnet        |    1    |  13.818029   |
|         LearningToPaint         |    1    |   9.88914    |
|           mnasnet1_0            |    1    |    7.9714    |
|          mobilenet_v2           |    1    |   7.783691   |
|       mobilenet_v3_large        |    1    |   7.624351   |
|   mobilenet_v2_quantized_qat    |    1    |   6.911742   |
|          squeezenet1_1          |    1    |   6.016333   |
|       shufflenet_v2_x1_0        |    1    |   5.708962   |
|        phlippe_densenet         |    1    |   3.449618   |
|        soft_actor_critic        |   256   |   3.388228   |
|      functorch_dp_cifar10       |    1    |   2.65388    |
|         opacus_cifar10          |    1    |   2.380773   |
|               drq               |    1    |   1.948878   |
|              dcgan              |    1    |   1.73872    |
|         phlippe_resnet          |    1    |   1.386716   |
|     functorch_maml_omniglot     |    1    |   0.871505   |
|          maml_omniglot          |    5    |   0.804689   |
|              dlrm               |    1    |   0.696392   |
|     pyhpc_equation_of_state     |    1    |   0.044226   |
|     pyhpc_isoneutral_mixing     |    1    |   0.041712   |
|          lennard_jones          |    1    |   0.038005   |
|              moco               |    0    |     0.0      |
|         DALLE2_pytorch          |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 1.980003 |
|     MobileBertForQuestionAnswering      | 1  | 1.569622 |
|            XLNetLMHeadModel             | 1  | 1.379623 |
|      GPT2ForSequenceClassification      | 1  | 1.320686 |
|         Speech2Text2ForCausalLM         | 1  | 1.310454 |
|            YituTechConvBert             | 1  | 1.302241 |
|          DistilBertForMaskedLM          | 1  | 1.292837 |
|       BlenderbotSmallForCausalLM        | 1  | 1.278334 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.274647 |
|     DistilBertForQuestionAnswering      | 1  | 1.272065 |
|       DebertaForQuestionAnswering       | 1  | 1.254865 |
|          BlenderbotForCausalLM          | 1  | 1.246034 |
|       MT5ForConditionalGeneration       | 1  | 1.237426 |
|     M2M100ForConditionalGeneration      | 1  | 1.236964 |
|           DebertaForMaskedLM            | 1  | 1.233318 |
|             XGLMForCausalLM             | 1  | 1.231814 |
|     PegasusForConditionalGeneration     | 1  | 1.228547 |
|           PegasusForCausalLM            | 1  | 1.226805 |
|               GoogleFnet                | 1  | 1.217561 |
|            AlbertForMaskedLM            | 1  | 1.204472 |
|       AlbertForQuestionAnswering        | 1  | 1.202731 |
|               DistillGPT2               | 1  | 1.190392 |
|           ElectraForCausalLM            | 1  | 1.185131 |
|    LayoutLMForSequenceClassification    | 1  | 1.172132 |
|                CamemBert                | 1  | 1.169571 |
|    MegatronBertForQuestionAnswering     | 1  | 1.167869 |
|        BertForQuestionAnswering         | 1  | 1.164472 |
|           LayoutLMForMaskedLM           | 1  | 1.162251 |
|         MegatronBertForCausalLM         | 1  | 1.157497 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.155409 |
|             BertForMaskedLM             | 1  | 1.153731 |
|       RobertaForQuestionAnswering       | 1  | 1.149661 |
|            TrOCRForCausalLM             | 1  | 1.149432 |
|          DebertaV2ForMaskedLM           | 1  | 1.149422 |
|           RobertaForCausalLM            | 1  | 1.139507 |
|       ElectraForQuestionAnswering       | 1  | 1.138572 |
|     PLBartForConditionalGeneration      | 1  | 1.076135 |
|      MBartForConditionalGeneration      | 1  | 1.061386 |
|      BartForConditionalGeneration       | 1  | 1.058672 |
|             BartForCausalLM             | 1  | 1.055014 |
|             OPTForCausalLM              | 1  | 1.029582 |
|            PLBartForCausalLM            | 1  | 1.028871 |
|            MBartForCausalLM             | 1  | 1.012807 |
|          AllenaiLongformerBase          | 1  | 0.959847 |
|                 T5Small                 | 1  | 0.617464 |
|       T5ForConditionalGeneration        | 1  | 0.613298 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 59.515544 |
|          MobileBertForMaskedLM          | 1  | 43.793349 |
|     MobileBertForQuestionAnswering      | 1  | 42.847482 |
|     PegasusForConditionalGeneration     | 1  | 41.085883 |
|     M2M100ForConditionalGeneration      | 1  | 40.372299 |
|      MBartForConditionalGeneration      | 1  | 39.624096 |
|                 T5Small                 | 1  | 38.29452  |
|       T5ForConditionalGeneration        | 1  | 38.256529 |
|          BlenderbotForCausalLM          | 1  | 37.990523 |
|       MT5ForConditionalGeneration       | 1  | 35.985397 |
|            XLNetLMHeadModel             | 1  | 34.702657 |
|             XGLMForCausalLM             | 1  | 34.06444  |
|          DebertaV2ForMaskedLM           | 1  | 31.868666 |
| BlenderbotSmallForConditionalGeneration | 1  | 30.996347 |
|      DebertaV2ForQuestionAnswering      | 1  | 30.48892  |
|      BartForConditionalGeneration       | 1  | 29.365127 |
|         MegatronBertForCausalLM         | 1  | 28.834366 |
|            YituTechConvBert             | 1  | 28.582344 |
|     PLBartForConditionalGeneration      | 1  | 28.117925 |
|    MegatronBertForQuestionAnswering     | 1  | 27.522989 |
|             OPTForCausalLM              | 1  | 25.385904 |
|           PegasusForCausalLM            | 1  | 24.868347 |
|            MBartForCausalLM             | 1  | 24.582432 |
|            TrOCRForCausalLM             | 1  | 22.596875 |
|           DebertaForMaskedLM            | 1  | 22.499927 |
|       DebertaForQuestionAnswering       | 1  | 21.205306 |
|           RobertaForCausalLM            | 1  | 20.316018 |
|           ElectraForCausalLM            | 1  | 20.272246 |
|                CamemBert                | 1  | 20.250064 |
|          DistilBertForMaskedLM          | 1  | 19.587803 |
|       BlenderbotSmallForCausalLM        | 1  | 19.580566 |
|      GPT2ForSequenceClassification      | 1  | 19.322452 |
|       RobertaForQuestionAnswering       | 1  | 19.067418 |
|           LayoutLMForMaskedLM           | 1  | 19.031522 |
|             BertForMaskedLM             | 1  | 19.006129 |
|       ElectraForQuestionAnswering       | 1  | 19.002215 |
|         Speech2Text2ForCausalLM         | 1  | 18.94483  |
|    LayoutLMForSequenceClassification    | 1  | 18.745878 |
|            PLBartForCausalLM            | 1  | 18.742856 |
|             BartForCausalLM             | 1  | 18.534721 |
|     DistilBertForQuestionAnswering      | 1  | 18.308968 |
|        BertForQuestionAnswering         | 1  | 17.845363 |
|               GoogleFnet                | 1  | 17.44054  |
|               DistillGPT2               | 1  | 17.234003 |
|            AlbertForMaskedLM            | 1  | 14.234259 |
|       AlbertForQuestionAnswering        | 1  | 12.80683  |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.986728 |
|      MBartForConditionalGeneration      | 1  | 0.975042 |
|       T5ForConditionalGeneration        | 1  | 0.954232 |
|      GPT2ForSequenceClassification      | 1  | 0.95263  |
|          AllenaiLongformerBase          | 1  | 0.947829 |
|            MBartForCausalLM             | 1  | 0.923562 |
|     PLBartForConditionalGeneration      | 1  | 0.909782 |
|            XLNetLMHeadModel             | 1  | 0.907287 |
|                 T5Small                 | 1  | 0.905797 |
|            PLBartForCausalLM            | 1  | 0.90313  |
|       DebertaForQuestionAnswering       | 1  | 0.871928 |
|      BartForConditionalGeneration       | 1  | 0.86542  |
|               GoogleFnet                | 1  | 0.853593 |
|       RobertaForQuestionAnswering       | 1  | 0.851794 |
|    LayoutLMForSequenceClassification    | 1  | 0.850136 |
|        BertForQuestionAnswering         | 1  | 0.848981 |
|       ElectraForQuestionAnswering       | 1  | 0.841532 |
|    MegatronBertForQuestionAnswering     | 1  | 0.840037 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.835376 |
|               DistillGPT2               | 1  | 0.827281 |
|           DebertaForMaskedLM            | 1  | 0.825377 |
|         MegatronBertForCausalLM         | 1  | 0.818793 |
|           LayoutLMForMaskedLM           | 1  | 0.818073 |
|             BertForMaskedLM             | 1  | 0.813558 |
|                CamemBert                | 1  | 0.813113 |
|           RobertaForCausalLM            | 1  | 0.812081 |
|         Speech2Text2ForCausalLM         | 1  | 0.809223 |
|            YituTechConvBert             | 1  | 0.80873  |
|           ElectraForCausalLM            | 1  | 0.806208 |
|             BartForCausalLM             | 1  | 0.801286 |
|     DistilBertForQuestionAnswering      | 1  | 0.799562 |
|          BlenderbotForCausalLM          | 1  | 0.797928 |
|          DebertaV2ForMaskedLM           | 1  | 0.797202 |
|            TrOCRForCausalLM             | 1  | 0.787546 |
|       MT5ForConditionalGeneration       | 1  | 0.778739 |
|       BlenderbotSmallForCausalLM        | 1  | 0.76541  |
|           PegasusForCausalLM            | 1  | 0.750863 |
|          DistilBertForMaskedLM          | 1  | 0.745834 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.739684 |
|     MobileBertForQuestionAnswering      | 1  | 0.73331  |
|     M2M100ForConditionalGeneration      | 1  | 0.719123 |
|     PegasusForConditionalGeneration     | 1  | 0.716515 |
|          MobileBertForMaskedLM          | 1  | 0.704161 |
|             XGLMForCausalLM             | 1  | 0.699344 |
|            AlbertForMaskedLM            | 1  | 0.448122 |
|       AlbertForQuestionAnswering        | 1  | 0.443745 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12745.174787 |
|       AlbertForQuestionAnswering        | 1  | 12700.501817 |
|      MBartForConditionalGeneration      | 1  | 6172.071229  |
|      BartForConditionalGeneration       | 1  |  5656.74353  |
|             OPTForCausalLM              | 1  | 5217.220665  |
|          DebertaV2ForMaskedLM           | 1  | 5054.606178  |
|      DebertaV2ForQuestionAnswering      | 1  | 3959.226759  |
|            XLNetLMHeadModel             | 1  | 3119.374159  |
|            MBartForCausalLM             | 1  | 3039.998485  |
|          BlenderbotForCausalLM          | 1  | 2633.272304  |
|             BartForCausalLM             | 1  | 2550.781874  |
|                 T5Small                 | 1  | 2480.789365  |
|       T5ForConditionalGeneration        | 1  | 2471.632651  |
|          AllenaiLongformerBase          | 1  | 2417.028482  |
|     PLBartForConditionalGeneration      | 1  | 2201.336265  |
|         MegatronBertForCausalLM         | 1  |  2034.2536   |
|    MegatronBertForQuestionAnswering     | 1  | 1861.154101  |
|      GPT2ForSequenceClassification      | 1  | 1313.447782  |
|            PLBartForCausalLM            | 1  | 1212.748944  |
|             XGLMForCausalLM             | 1  |  833.405191  |
|           DebertaForMaskedLM            | 1  |  786.455602  |
|           RobertaForCausalLM            | 1  |  779.144285  |
|     M2M100ForConditionalGeneration      | 1  |  715.856566  |
|                CamemBert                | 1  |  690.26142   |
|           LayoutLMForMaskedLM           | 1  |  685.215384  |
|             BertForMaskedLM             | 1  |  684.971082  |
|            YituTechConvBert             | 1  |  683.423702  |
|     PegasusForConditionalGeneration     | 1  |  609.269303  |
|            TrOCRForCausalLM             | 1  |  586.69132   |
|       DebertaForQuestionAnswering       | 1  |  560.722446  |
|        BertForQuestionAnswering         | 1  |  545.040704  |
|    LayoutLMForSequenceClassification    | 1  |  544.682358  |
|       RobertaForQuestionAnswering       | 1  |  544.539533  |
|               DistillGPT2               | 1  |  504.268859  |
|               GoogleFnet                | 1  |  473.443752  |
|       MT5ForConditionalGeneration       | 1  |  302.340716  |
|           PegasusForCausalLM            | 1  |  302.102663  |
| BlenderbotSmallForConditionalGeneration | 1  |  146.220576  |
|           ElectraForCausalLM            | 1  |  134.503275  |
|          DistilBertForMaskedLM          | 1  |  100.674699  |
|       ElectraForQuestionAnswering       | 1  |  95.722758   |
|       BlenderbotSmallForCausalLM        | 1  |  85.350156   |
|          MobileBertForMaskedLM          | 1  |  67.102008   |
|     DistilBertForQuestionAnswering      | 1  |  64.774828   |
|     MobileBertForQuestionAnswering      | 1  |  40.965806   |
|         Speech2Text2ForCausalLM         | 1  |   18.97299   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.349335 |
|          inception_v3           | 1  | 2.22235  |
|       gluon_inception_v3        | 1  | 2.198417 |
|           dm_nfnet_f0           | 1  | 2.192239 |
|        adv_inception_v3         | 1  | 2.175486 |
|            nfnet_l0             | 1  | 2.152283 |
|            repvgg_a2            | 1  | 1.970498 |
|         mobilenetv2_100         | 1  | 1.925797 |
|            hrnet_w18            | 1  | 1.912378 |
|          spnasnet_100           | 1  | 1.85078  |
|           mnasnet_100           | 1  | 1.815229 |
|           fbnetc_100            | 1  | 1.802531 |
|            levit_128            | 1  | 1.799658 |
|          ghostnet_100           | 1  | 1.768189 |
|           selecsls42b           | 1  | 1.745913 |
|            lcnet_050            | 1  | 1.726305 |
|      mobilenetv3_large_100      | 1  | 1.711211 |
|           regnety_002           | 1  | 1.685046 |
|             dla102              | 1  | 1.678551 |
|        ese_vovnet19b_dw         | 1  | 1.667603 |
|          botnet26t_256          | 1  | 1.644625 |
|       eca_botnext26ts_256       | 1  | 1.631031 |
|       tf_efficientnet_b0        | 1  | 1.626089 |
|           rexnet_100            | 1  | 1.622559 |
|            fbnetv3_b            | 1  | 1.614003 |
|           resnest101e           | 1  | 1.601668 |
|          cspdarknet53           | 1  | 1.585665 |
|        eca_halonext26ts         | 1  | 1.517291 |
|           res2next50            | 1  | 1.51232  |
|         poolformer_m36          | 1  | 1.512207 |
|            tinynet_a            | 1  | 1.499877 |
|           volo_d1_224           | 1  | 1.469749 |
|        res2net50_14w_8s         | 1  | 1.463867 |
|        res2net101_26w_4s        | 1  | 1.451522 |
|           mobilevit_s           | 1  | 1.446754 |
|         visformer_small         | 1  | 1.382008 |
|           convit_base           | 1  | 1.381174 |
|          gmixer_24_224          | 1  | 1.345319 |
|     swsl_resnext101_32x16d      | 1  | 1.332967 |
|           tf_mixnet_l           | 1  | 1.300605 |
|            gernet_l             | 1  | 1.292991 |
|        twins_pcpvt_base         | 1  | 1.286324 |
|      beit_base_patch16_224      | 1  | 1.273656 |
|          resmlp_12_224          | 1  | 1.265133 |
|        convmixer_768_32         | 1  | 1.255662 |
|  swin_base_patch4_window7_224   | 1  | 1.25143  |
|          mixer_b16_224          | 1  | 1.199799 |
|      vit_base_patch16_224       | 1  | 1.195733 |
| deit_base_distilled_patch16_224 | 1  | 1.19351  |
|             dpn107              | 1  | 1.191051 |
|      xcit_large_24_p8_224       | 1  | 1.187407 |
|            mixnet_l             | 1  | 1.173057 |
|          jx_nest_base           | 1  | 1.159344 |
|            pit_b_224            | 1  | 1.149739 |
|         crossvit_9_240          | 1  | 1.141265 |
|        tnt_s_patch16_224        | 1  | 1.139439 |
|          convnext_base          | 1  | 1.135757 |
|          gmlp_s16_224           | 1  | 1.111327 |
|        sebotnet33ts_256         | 1  | 1.091987 |
|          cait_m36_384           | 1  | 0.974159 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|        adv_inception_v3         | 1  |    pass     |
|      beit_base_patch16_224      | 1  |    pass     |
|          botnet26t_256          | 1  |    pass     |
|          cait_m36_384           | 1  |    pass     |
|           convit_base           | 1  |    pass     |
|        convmixer_768_32         | 1  |    pass     |
|          convnext_base          | 1  |    pass     |
|         crossvit_9_240          | 1  |    pass     |
|          cspdarknet53           | 1  |    pass     |
| deit_base_distilled_patch16_224 | 1  |    pass     |
|             dla102              | 1  |    pass     |
|           dm_nfnet_f0           | 1  |    pass     |
|             dpn107              | 1  |    pass     |
|       eca_botnext26ts_256       | 1  |    pass     |
|        eca_halonext26ts         | 1  |    pass     |
|        ese_vovnet19b_dw         | 1  |    pass     |
|           fbnetc_100            | 1  |    pass     |
|            fbnetv3_b            | 1  |    pass     |
|            gernet_l             | 1  |    pass     |
|          ghostnet_100           | 1  |    pass     |
|       gluon_inception_v3        | 1  |    pass     |
|          gmixer_24_224          | 1  |    pass     |
|          gmlp_s16_224           | 1  |    pass     |
|            hrnet_w18            | 1  |    pass     |
|          inception_v3           | 1  |    pass     |
|          jx_nest_base           | 1  |    pass     |
|            lcnet_050            | 1  |    pass     |
|            levit_128            | 1  |    pass     |
|      xcit_large_24_p8_224       | 1  |    pass     |
|          mixer_b16_224          | 1  |    pass     |
|            mixnet_l             | 1  |    pass     |
|           mnasnet_100           | 1  |    pass     |
|         mobilenetv2_100         | 1  |    pass     |
|      mobilenetv3_large_100      | 1  |    pass     |
|           mobilevit_s           | 1  |    pass     |
|            nfnet_l0             | 1  |    pass     |
|            pit_b_224            | 1  |    pass     |
|          pnasnet5large          | 1  |    pass     |
|         poolformer_m36          | 1  |    pass     |
|           regnety_002           | 1  |    pass     |
|            repvgg_a2            | 1  |    pass     |
|        res2net101_26w_4s        | 1  |    pass     |
|        res2net50_14w_8s         | 1  |    pass     |
|           res2next50            | 1  |    pass     |
|          resmlp_12_224          | 1  |    pass     |
|           resnest101e           | 1  |    pass     |
|           rexnet_100            | 1  |    pass     |
|        sebotnet33ts_256         | 1  |    pass     |
|           selecsls42b           | 1  |    pass     |
|          spnasnet_100           | 1  |    pass     |
|  swin_base_patch4_window7_224   | 1  |    pass     |
|     swsl_resnext101_32x16d      | 1  |    pass     |
|       tf_efficientnet_b0        | 1  |    pass     |
|           tf_mixnet_l           | 1  |    pass     |
|            tinynet_a            | 1  |    pass     |
|        tnt_s_patch16_224        | 1  |    pass     |
|        twins_pcpvt_base         | 1  |    pass     |
|         visformer_small         | 1  |    pass     |
|      vit_base_patch16_224       | 1  |    pass     |
|           volo_d1_224           | 1  |    pass     |
|         coat_lite_mini          | 1  | fail_to_run |
+---------------------------------+----+-------------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|          pnasnet5large          | 1  | 80.834214 |
|  swin_base_patch4_window7_224   | 1  | 80.51164  |
|           tf_mixnet_l           | 1  | 68.421982 |
|             dpn107              | 1  | 62.741278 |
|        twins_pcpvt_base         | 1  | 61.005521 |
|           mobilevit_s           | 1  | 58.366045 |
|          jx_nest_base           | 1  | 58.295701 |
|        res2net50_14w_8s         | 1  | 56.844511 |
|           rexnet_100            | 1  | 55.394119 |
|      xcit_large_24_p8_224       | 1  | 54.486365 |
|          cait_m36_384           | 1  | 53.958934 |
|          ghostnet_100           | 1  | 52.815621 |
|            mixnet_l             | 1  | 50.859333 |
|        sebotnet33ts_256         | 1  | 50.45584  |
|         poolformer_m36          | 1  | 50.334922 |
|            levit_128            | 1  | 49.257885 |
|           dm_nfnet_f0           | 1  | 48.527984 |
|        eca_halonext26ts         | 1  | 48.45768  |
|         crossvit_9_240          | 1  | 48.11499  |
|        tnt_s_patch16_224        | 1  | 46.960463 |
|           volo_d1_224           | 1  | 42.758041 |
|       eca_botnext26ts_256       | 1  | 41.129728 |
|            hrnet_w18            | 1  | 40.611857 |
|        res2net101_26w_4s        | 1  | 39.755405 |
|       tf_efficientnet_b0        | 1  | 39.268369 |
|            nfnet_l0             | 1  | 37.793927 |
|          convnext_base          | 1  | 37.638356 |
|           resnest101e           | 1  | 37.486348 |
|       gluon_inception_v3        | 1  | 35.773466 |
|        adv_inception_v3         | 1  | 35.723872 |
|          inception_v3           | 1  | 35.680561 |
|            tinynet_a            | 1  | 34.402881 |
|           res2next50            | 1  | 34.200518 |
|            pit_b_224            | 1  | 33.946026 |
|           convit_base           | 1  | 31.725608 |
|          botnet26t_256          | 1  | 30.225496 |
|          cspdarknet53           | 1  | 27.43687  |
|            fbnetv3_b            | 1  | 26.884812 |
|             dla102              | 1  | 26.676257 |
|          gmlp_s16_224           | 1  | 26.330641 |
|          gmixer_24_224          | 1  | 25.22183  |
|         visformer_small         | 1  | 24.483247 |
|        ese_vovnet19b_dw         | 1  | 23.434961 |
|      mobilenetv3_large_100      | 1  | 23.142726 |
|      vit_base_patch16_224       | 1  | 21.470103 |
| deit_base_distilled_patch16_224 | 1  | 20.388739 |
|      beit_base_patch16_224      | 1  | 20.280812 |
|          mixer_b16_224          | 1  | 19.342063 |
|           regnety_002           | 1  | 18.919224 |
|            repvgg_a2            | 1  | 17.349649 |
|        convmixer_768_32         | 1  | 17.217071 |
|          resmlp_12_224          | 1  | 17.172457 |
|           selecsls42b           | 1  | 16.529236 |
|            lcnet_050            | 1  | 14.129388 |
|     swsl_resnext101_32x16d      | 1  | 12.730096 |
|          spnasnet_100           | 1  | 10.899002 |
|            gernet_l             | 1  | 10.813786 |
|           fbnetc_100            | 1  | 10.810884 |
|         mobilenetv2_100         | 1  | 10.55918  |
|           mnasnet_100           | 1  | 10.364006 |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.946276 |
|          pnasnet5large          | 1  | 0.92894  |
|        convmixer_768_32         | 1  | 0.91757  |
|            nfnet_l0             | 1  | 0.896202 |
|      xcit_large_24_p8_224       | 1  | 0.890833 |
|        ese_vovnet19b_dw         | 1  | 0.885603 |
|         mobilenetv2_100         | 1  | 0.878463 |
|            fbnetv3_b            | 1  | 0.878342 |
|           mnasnet_100           | 1  | 0.877618 |
|          spnasnet_100           | 1  | 0.875792 |
|       tf_efficientnet_b0        | 1  | 0.874001 |
|           fbnetc_100            | 1  | 0.871174 |
|           rexnet_100            | 1  | 0.869579 |
|      mobilenetv3_large_100      | 1  | 0.867844 |
|            tinynet_a            | 1  | 0.867184 |
|            lcnet_050            | 1  | 0.863743 |
|       eca_botnext26ts_256       | 1  | 0.860919 |
|         poolformer_m36          | 1  | 0.854727 |
|           dm_nfnet_f0           | 1  | 0.854573 |
|           tf_mixnet_l           | 1  | 0.850551 |
|           mobilevit_s           | 1  | 0.847367 |
|        eca_halonext26ts         | 1  | 0.846867 |
|           regnety_002           | 1  | 0.843197 |
|          ghostnet_100           | 1  | 0.842041 |
|          botnet26t_256          | 1  | 0.840518 |
|            mixnet_l             | 1  | 0.828756 |
|          resmlp_12_224          | 1  | 0.828235 |
|         visformer_small         | 1  | 0.820784 |
|           res2next50            | 1  | 0.814655 |
|            levit_128            | 1  | 0.809558 |
|          convnext_base          | 1  | 0.802187 |
|        sebotnet33ts_256         | 1  | 0.798923 |
|             dpn107              | 1  | 0.797967 |
|        res2net50_14w_8s         | 1  | 0.797416 |
|            hrnet_w18            | 1  | 0.796675 |
|          gmlp_s16_224           | 1  | 0.794832 |
|          gmixer_24_224          | 1  | 0.794209 |
|          cspdarknet53           | 1  | 0.793416 |
|           volo_d1_224           | 1  | 0.78646  |
|        tnt_s_patch16_224        | 1  | 0.782994 |
|           convit_base           | 1  | 0.781659 |
|         crossvit_9_240          | 1  | 0.778607 |
|          mixer_b16_224          | 1  | 0.777888 |
|           resnest101e           | 1  | 0.77597  |
|             dla102              | 1  | 0.775111 |
|      beit_base_patch16_224      | 1  | 0.772723 |
|        twins_pcpvt_base         | 1  | 0.771896 |
|          jx_nest_base           | 1  | 0.771123 |
|       gluon_inception_v3        | 1  | 0.763655 |
|      vit_base_patch16_224       | 1  | 0.763359 |
|          inception_v3           | 1  | 0.763296 |
|        adv_inception_v3         | 1  | 0.762353 |
| deit_base_distilled_patch16_224 | 1  | 0.761027 |
|            pit_b_224            | 1  | 0.75289  |
|        res2net101_26w_4s        | 1  | 0.741458 |
|  swin_base_patch4_window7_224   | 1  | 0.740933 |
|           selecsls42b           | 1  | 0.739352 |
|            gernet_l             | 1  | 0.735443 |
|            repvgg_a2            | 1  | 0.690773 |
|     swsl_resnext101_32x16d      | 1  | 0.638161 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 3654.342241 |
|      xcit_large_24_p8_224       | 1  | 1535.493966 |
|     swsl_resnext101_32x16d      | 1  |  444.97976  |
|          pnasnet5large          | 1  | 370.269016  |
|          convnext_base          | 1  | 308.696829  |
|             dpn107              | 1  | 262.743936  |
|        convmixer_768_32         | 1  | 244.275282  |
|          jx_nest_base           | 1  | 235.832379  |
|  swin_base_patch4_window7_224   | 1  | 200.017817  |
|      beit_base_patch16_224      | 1  | 198.430086  |
| deit_base_distilled_patch16_224 | 1  | 197.582093  |
|      vit_base_patch16_224       | 1  | 197.089722  |
|           convit_base           | 1  | 195.602583  |
|            pit_b_224            | 1  | 170.232521  |
|           resnest101e           | 1  | 166.881568  |
|           dm_nfnet_f0           | 1  |  159.33083  |
|          mixer_b16_224          | 1  | 141.090875  |
|         poolformer_m36          | 1  | 138.597962  |
|        res2net101_26w_4s        | 1  |  113.83002  |
|        twins_pcpvt_base         | 1  | 109.102481  |
|           volo_d1_224           | 1  |  95.806521  |
|            nfnet_l0             | 1  |  95.667445  |
|        tnt_s_patch16_224        | 1  |  95.525687  |
|             dla102              | 1  |  91.304549  |
|            hrnet_w18            | 1  |  86.775465  |
|        sebotnet33ts_256         | 1  |  85.469331  |
|          cspdarknet53           | 1  |  83.418134  |
|       gluon_inception_v3        | 1  |  73.895629  |
|        adv_inception_v3         | 1  |  73.810611  |
|          inception_v3           | 1  |  73.721735  |
|          gmlp_s16_224           | 1  |  72.230752  |
|         visformer_small         | 1  |  67.646285  |
|        res2net50_14w_8s         | 1  |  65.69712   |
|          gmixer_24_224          | 1  |  63.653622  |
|            repvgg_a2            | 1  |  63.276521  |
|           res2next50            | 1  |  60.017781  |
|            gernet_l             | 1  |  57.000275  |
|           selecsls42b           | 1  |  45.097331  |
|          botnet26t_256          | 1  |  44.813939  |
|        eca_halonext26ts         | 1  |  44.721817  |
|           mobilevit_s           | 1  |  42.51194   |
|       eca_botnext26ts_256       | 1  |  40.699779  |
|          resmlp_12_224          | 1  |  35.87799   |
|         crossvit_9_240          | 1  |  34.344772  |
|            mixnet_l             | 1  |  31.668991  |
|        ese_vovnet19b_dw         | 1  |  31.621647  |
|           tf_mixnet_l           | 1  |  31.220768  |
|            fbnetv3_b            | 1  |  16.341471  |
|       tf_efficientnet_b0        | 1  |  13.979396  |
|           rexnet_100            | 1  |  13.860263  |
|            tinynet_a            | 1  |  12.800897  |
|           fbnetc_100            | 1  |  9.647488   |
|          ghostnet_100           | 1  |  9.399844   |
|            levit_128            | 1  |  9.354443   |
|          spnasnet_100           | 1  |  8.671277   |
|           mnasnet_100           | 1  |   7.94629   |
|         mobilenetv2_100         | 1  |  7.749635   |
|      mobilenetv3_large_100      | 1  |  7.603145   |
|           regnety_002           | 1  |  6.603027   |
|            lcnet_050            | 1  |  2.633659   |
+---------------------------------+----+-------------+

@zxd1997066
Copy link
Contributor

[cppwrapper_dynamic_shape] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-05-05 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 6d30803d64953955df63da56833bf4eb52249aae
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+06ad737
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000" 

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 91%, 71/78 | 100%, 46/46 | 100%, 60/60 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.46x    |    1.28x    |    1.87x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   45.38    |    45.17    |    57.67    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.90x    |    0.98x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 10.590452 |
|          squeezenet1_1          |   16    |  3.04626  |
|           mnasnet1_0            |   32    | 2.921755  |
|       mobilenet_v3_large        |   32    | 2.877071  |
|        timm_efficientnet        |   64    |  2.86334  |
|          mobilenet_v2           |   16    | 2.765737  |
|       shufflenet_v2_x1_0        |   64    |  2.54731  |
|          timm_resnest           |   32    | 2.307617  |
|            resnet50             |   32    | 2.227182  |
|        phlippe_densenet         |   128   | 2.051457  |
|           densenet121           |   64    | 1.994597  |
|            resnet152            |   32    | 1.993492  |
|       doctr_det_predictor       |    1    | 1.937329  |
|             hf_GPT2             |    1    | 1.888355  |
|           timm_regnet           |   32    | 1.879603  |
|           timm_nfnet            |   128   | 1.870156  |
|         resnext50_32x4d         |    8    | 1.832865  |
|         phlippe_resnet          |   128   | 1.723495  |
|            resnet18             |    8    |  1.68859  |
|           timm_vovnet           |   32    | 1.678052  |
|      doctr_reco_predictor       |    1    |  1.63427  |
|             alexnet             |   128   | 1.595849  |
|            hf_Albert            |    1    | 1.558633  |
|          hf_Bert_large          |    1    | 1.532795  |
|             yolov3              |    8    | 1.524146  |
|            moondream            |    1    | 1.521194  |
|          fastNLP_Bert           |    1    | 1.508036  |
|          hf_GPT2_large          |    1    | 1.504558  |
|     functorch_maml_omniglot     |    1    | 1.473847  |
|             hf_Bert             |    1    | 1.471061  |
|        basic_gnn_edgecnn        |    1    | 1.466961  |
|          hf_Longformer          |    1    | 1.450997  |
|         LearningToPaint         |   96    | 1.429515  |
|              dcgan              |   256   | 1.396845  |
|          hf_DistilBert          |    1    | 1.343111  |
|              vgg16              |    4    | 1.315299  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.288785  |
|             hf_Bart             |    1    | 1.279466  |
|           hf_BigBird            |    1    | 1.251609  |
|          basic_gnn_gcn          |    1    | 1.245106  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.233634  |
|         basic_gnn_sage          |    1    | 1.220759  |
|        hf_distil_whisper        |    1    | 1.215936  |
|           hf_T5_large           |    1    | 1.210897  |
|          pytorch_unet           |    1    | 1.204967  |
|          BERT_pytorch           |    2    | 1.203244  |
|  detectron2_fasterrcnn_r_50_c4  |    1    |  1.19217  |
|         pytorch_stargan         |   16    |  1.18797  |
|              dlrm               |  2048   | 1.175865  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.158595  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.157643  |
|          maml_omniglot          |    5    | 1.154047  |
|        soft_actor_critic        |   256   | 1.153661  |
|      torch_multimodal_clip      |   32    | 1.150406  |
|          basic_gnn_gin          |    1    | 1.140723  |
|              hf_T5              |    1    | 1.132189  |
|               drq               |    1    | 1.099695  |
| detectron2_fasterrcnn_r_101_dc5 |    1    |  1.07778  |
|     timm_vision_transformer     |   32    | 1.065691  |
|     nvidia_deeprecommender      |   256   | 1.058447  |
|          lennard_jones          |  1000   | 1.056969  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.048132  |
|       speech_transformer        |    1    | 1.044851  |
|           hf_Reformer           |    1    |  1.03748  |
|             demucs              |    1    | 1.009364  |
|     resnet50_quantized_qat      |   32    | 1.009303  |
|  timm_vision_transformer_large  |   32    |  1.00786  |
|   mobilenet_v2_quantized_qat    |   96    |  1.00124  |
|           tts_angular           |   64    | 0.998605  |
|           hf_T5_base            |    1    | 0.830015  |
|       Background_Matting        |    1    | 0.813684  |
|              maml               |    1    |  0.73277  |
|         opacus_cifar10          |   64    | 0.662082  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.654593  |
|      functorch_dp_cifar10       |   64    | 0.602865  |
|        timm_efficientdet        |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
|              moco               |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+--------------------------------+---------+--------------------+
|              name              |   bs    |      inductor      |
+--------------------------------+---------+--------------------+
|       Background_Matting       |    1    |  pass_due_to_skip  |
|              maml              |    1    |  pass_due_to_skip  |
| timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|          hf_T5_large           |    4    |  pass_due_to_skip  |
|         hf_GPT2_large          |    4    |  pass_due_to_skip  |
|         basic_gnn_sage         |    1    |        pass        |
|      doctr_det_predictor       |    4    |        pass        |
|              dlrm              |    4    |        pass        |
|    detectron2_fcos_r_50_fpn    |    4    |        pass        |
|          densenet121           |    4    |        pass        |
|             demucs             |    1    |        pass        |
|             dcgan              |    4    |        pass        |
|         basic_gnn_gcn          |    1    |        pass        |
|         basic_gnn_gin          |    1    |        pass        |
|         lennard_jones          |    4    |        pass        |
|       basic_gnn_edgecnn        |    1    |        pass        |
|            alexnet             |    4    |        pass        |
|        LearningToPaint         |    4    |        pass        |
|          fastNLP_Bert          |    4    |        pass        |
|             llama              |    4    |        pass        |
|      doctr_reco_predictor      |    4    |        pass        |
|      functorch_dp_cifar10      |    4    |        pass        |
|              drq               |    1    |        pass        |
|         hf_DistilBert          |    4    |        pass        |
|       hf_distil_whisper        |    4    |        pass        |
|           hf_T5_base           |    4    |        pass        |
|             hf_T5              |    4    |        pass        |
|          hf_Reformer           |    4    |        pass        |
|    functorch_maml_omniglot     |    1    |        pass        |
|            hf_GPT2             |    2    |        pass        |
|         hf_Longformer          |    4    |        pass        |
|           hf_BigBird           |    4    |        pass        |
|         hf_Bert_large          |    4    |        pass        |
|            hf_Bert             |    4    |        pass        |
|            hf_Bart             |    4    |        pass        |
|           hf_Albert            |    4    |        pass        |
|             yolov3             |    4    |        pass        |
|         maml_omniglot          |    5    |        pass        |
|            resnet18            |    4    |        pass        |
|     resnet50_quantized_qat     |    4    |        pass        |
|        phlippe_densenet        |    4    |        pass        |
|        resnext50_32x4d         |    4    |        pass        |
|   mobilenet_v2_quantized_qat   |    4    |        pass        |
|       mobilenet_v3_large       |    4    |        pass        |
|           moondream            |    4    |        pass        |
|     nvidia_deeprecommender     |    4    |        pass        |
|           resnet152            |    4    |        pass        |
|         opacus_cifar10         |    4    |        pass        |
|          BERT_pytorch          |    4    |        pass        |
|         phlippe_resnet         |    4    |        pass        |
|    pyhpc_equation_of_state     |    4    |        pass        |
|    pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix  |    1    |        pass        |
|        pytorch_stargan         |   16    |        pass        |
|          pytorch_unet          |    2    |        pass        |
|          mobilenet_v2          |    4    |        pass        |
|           mnasnet1_0           |    4    |        pass        |
|          timm_regnet           |    4    |        pass        |
|       shufflenet_v2_x1_0       |    4    |        pass        |
|       soft_actor_critic        |   256   |        pass        |
|       speech_transformer       |    1    |        pass        |
|         squeezenet1_1          |    4    |        pass        |
|            resnet50            |    4    |        pass        |
|           timm_nfnet           |    4    |        pass        |
|       timm_efficientnet        |    4    |        pass        |
|          timm_resnest          |    4    |        pass        |
|    timm_vision_transformer     |    4    |        pass        |
|          timm_vovnet           |    4    |        pass        |
|     torch_multimodal_clip      |    4    |        pass        |
|          tts_angular           |    4    |        pass        |
|             vgg16              |    4    |        pass        |
|       timm_efficientdet        |    0    | model_fail_to_load |
|              moco              |    0    | model_fail_to_load |
|         DALLE2_pytorch         |    0    | model_fail_to_load |
|          Super_SloMo           |    4    |    fail_to_run     |
|        vision_maskrcnn         |    1    |    fail_to_run     |
+--------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 470.638799 |
|    detectron2_fcos_r_50_fpn     |    1    | 406.51645  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 230.135675 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 222.56655  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 215.819887 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 206.455921 |
|              maml               |    1    | 142.04465  |
|           hf_T5_large           |    1    | 107.368725 |
|       speech_transformer        |    1    | 90.103025  |
|          hf_Longformer          |    1    | 85.623479  |
|           hf_Reformer           |    1    | 80.328663  |
|  timm_vision_transformer_large  |   32    | 68.195269  |
|      torch_multimodal_clip      |   32    | 65.012903  |
|          basic_gnn_gcn          |    1    | 60.846705  |
|           densenet121           |   64    | 59.259255  |
|            resnet152            |   32    | 58.672453  |
|          fastNLP_Bert           |    1    | 55.130117  |
|          hf_GPT2_large          |    1    | 53.568293  |
|           hf_T5_base            |    1    | 50.488524  |
|            moondream            |    1    | 49.328886  |
|     pyhpc_isoneutral_mixing     | 1048576 | 48.493488  |
|       doctr_det_predictor       |    1    | 46.825388  |
|          hf_Bert_large          |    1    | 44.343273  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 43.684575  |
|        hf_distil_whisper        |    1    | 43.458563  |
|           timm_nfnet            |   128   | 37.791113  |
|           timm_regnet           |   32    | 37.430895  |
|          BERT_pytorch           |    2    | 36.501725  |
|             yolov3              |    8    | 35.790833  |
|             demucs              |    1    | 34.241448  |
|        timm_efficientnet        |   64    |  33.0074   |
|             hf_Bart             |    1    | 31.329482  |
|              hf_T5              |    1    | 30.871458  |
|        phlippe_densenet         |   128   | 29.174123  |
|       shufflenet_v2_x1_0        |   64    |  28.84209  |
|     timm_vision_transformer     |   32    | 28.600069  |
|       mobilenet_v3_large        |   32    | 27.765468  |
|             hf_Bert             |    1    | 27.049694  |
|         opacus_cifar10          |   64    | 25.797919  |
|         pytorch_stargan         |   16    | 25.635168  |
|            hf_Albert            |    1    | 25.566805  |
|      doctr_reco_predictor       |    1    | 25.407072  |
|             hf_GPT2             |    1    | 24.744079  |
|       Background_Matting        |    1    | 24.315246  |
|          mobilenet_v2           |   16    |  24.14548  |
|      functorch_dp_cifar10       |   64    | 23.984984  |
|          timm_resnest           |   32    | 23.620863  |
|           timm_vovnet           |   32    | 23.217863  |
|         resnext50_32x4d         |    8    | 23.205415  |
|            resnet50             |   32    | 23.106897  |
|           mnasnet1_0            |   32    | 21.606356  |
|          hf_DistilBert          |    1    |  21.03482  |
|          pytorch_unet           |    1    | 19.830927  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 19.244653  |
|          squeezenet1_1          |   16    | 19.210917  |
|            resnet18             |    8    | 17.791005  |
|         LearningToPaint         |   96    | 17.650685  |
|     pyhpc_equation_of_state     | 1048576 |  17.28047  |
|         phlippe_resnet          |   128   | 17.275865  |
|              vgg16              |    4    | 17.113396  |
|             alexnet             |   128   | 16.781244  |
|               drq               |    1    | 15.569458  |
|     functorch_maml_omniglot     |    1    | 15.454746  |
|          maml_omniglot          |    5    | 15.350654  |
|              dlrm               |  2048   | 15.060935  |
|              dcgan              |   256   | 14.802765  |
|        basic_gnn_edgecnn        |    1    | 14.618291  |
|     nvidia_deeprecommender      |   256   | 14.601879  |
|          basic_gnn_gin          |    1    | 14.529773  |
|         basic_gnn_sage          |    1    | 14.485002  |
|        soft_actor_critic        |   256   | 14.216745  |
|          lennard_jones          |  1000   | 13.718822  |
|           tts_angular           |   64    | 13.659452  |
|   mobilenet_v2_quantized_qat    |   96    |  0.135815  |
|     resnet50_quantized_qat      |   32    |  0.106746  |
|        timm_efficientdet        |    0    |    0.0     |
|              moco               |    0    |    0.0     |
|         DALLE2_pytorch          |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|           timm_nfnet            |   128   | 0.992868 |
|  timm_vision_transformer_large  |   32    | 0.991474 |
|              dlrm               |  2048   | 0.988059 |
|           hf_T5_base            |    1    | 0.987809 |
|        timm_efficientnet        |   64    | 0.98465  |
|           timm_regnet           |   32    | 0.983061 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981946 |
|       Background_Matting        |    1    | 0.981765 |
|             yolov3              |    8    | 0.981004 |
|             demucs              |    1    | 0.980213 |
|           densenet121           |   64    | 0.979905 |
|     nvidia_deeprecommender      |   256   | 0.978509 |
|            resnet152            |   32    | 0.978364 |
|          pytorch_unet           |    1    | 0.978001 |
|      torch_multimodal_clip      |   32    | 0.975871 |
|           timm_vovnet           |   32    | 0.973708 |
|          hf_GPT2_large          |    1    | 0.973218 |
|            resnet50             |   32    | 0.972984 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.972719 |
|         LearningToPaint         |   96    | 0.971143 |
|          timm_resnest           |   32    | 0.970838 |
|        basic_gnn_edgecnn        |    1    | 0.970671 |
|       doctr_det_predictor       |    1    | 0.965739 |
|           mnasnet1_0            |   32    | 0.963681 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.962886 |
|   mobilenet_v2_quantized_qat    |   96    | 0.962142 |
|       mobilenet_v3_large        |   32    | 0.961187 |
|     timm_vision_transformer     |   32    | 0.961116 |
|     resnet50_quantized_qat      |   32    | 0.960871 |
|          mobilenet_v2           |   16    | 0.958866 |
|       shufflenet_v2_x1_0        |   64    | 0.956321 |
|             alexnet             |   128   | 0.954116 |
|         resnext50_32x4d         |    8    | 0.951011 |
|           hf_BigBird            |    1    | 0.950171 |
|        phlippe_densenet         |   128   | 0.945804 |
|         pytorch_stargan         |   16    | 0.943726 |
|          basic_gnn_gcn          |    1    | 0.937294 |
|      doctr_reco_predictor       |    1    | 0.936956 |
|              vgg16              |    4    | 0.936946 |
|          BERT_pytorch           |    2    | 0.933741 |
|           tts_angular           |   64    | 0.929774 |
|          squeezenet1_1          |   16    | 0.918081 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.913066 |
|              dcgan              |   256   | 0.91156  |
|     pyhpc_equation_of_state     | 1048576 | 0.907278 |
|            resnet18             |    8    | 0.905172 |
|         phlippe_resnet          |   128   | 0.899769 |
|        hf_distil_whisper        |    1    | 0.891566 |
|        soft_actor_critic        |   256   | 0.891466 |
|         opacus_cifar10          |   64    | 0.891445 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.883154 |
|          lennard_jones          |  1000   | 0.871445 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.860486 |
|          maml_omniglot          |    5    | 0.859275 |
|          basic_gnn_gin          |    1    | 0.857957 |
|     functorch_maml_omniglot     |    1    | 0.857598 |
|         basic_gnn_sage          |    1    | 0.857087 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.84729  |
|          fastNLP_Bert           |    1    | 0.836309 |
|      functorch_dp_cifar10       |   64    | 0.829748 |
|       speech_transformer        |    1    | 0.819238 |
|          hf_Bert_large          |    1    | 0.805038 |
|          hf_Longformer          |    1    | 0.800449 |
|              maml               |    1    | 0.798514 |
|            moondream            |    1    | 0.797717 |
|             hf_Bert             |    1    | 0.79131  |
|            hf_Albert            |    1    | 0.790955 |
|           hf_T5_large           |    1    | 0.787523 |
|              hf_T5              |    1    | 0.772703 |
|               drq               |    1    | 0.770001 |
|          hf_DistilBert          |    1    | 0.763541 |
|             hf_GPT2             |    1    | 0.758396 |
|           hf_Reformer           |    1    | 0.736972 |
|             hf_Bart             |    1    | 0.733445 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.688792 |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4529.766436 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1435.407886 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1296.280134 |
|           hf_T5_base            |    1    | 1246.317695 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1171.174094 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1081.891198 |
|          hf_GPT2_large          |    1    | 554.166661  |
|           timm_nfnet            |   128   | 529.666171  |
|           hf_T5_large           |    1    | 394.403163  |
|            moondream            |    1    | 391.565954  |
|        hf_distil_whisper        |    1    |  346.66227  |
|       Background_Matting        |    1    | 346.297734  |
|          pytorch_unet           |    1    | 224.281092  |
|           timm_regnet           |   32    | 217.340476  |
|            resnet152            |   32    |  188.02991  |
|           densenet121           |   64    | 184.922389  |
|    detectron2_fcos_r_50_fpn     |    1    | 179.141955  |
|      torch_multimodal_clip      |   32    |  167.31898  |
|             yolov3              |    8    | 162.500457  |
|             demucs              |    1    | 138.228605  |
|           hf_BigBird            |    1    | 120.807685  |
|           timm_vovnet           |   32    | 115.628121  |
|          hf_Bert_large          |    1    | 106.018944  |
|     timm_vision_transformer     |   32    | 102.646905  |
|         pytorch_stargan         |   16    |  95.721441  |
|       doctr_det_predictor       |    1    |  85.64324   |
|            resnet50             |   32    |  74.339038  |
|          hf_Longformer          |    1    |  69.564918  |
|          timm_resnest           |   32    |  52.879205  |
|             hf_Bart             |    1    |  52.335129  |
|       speech_transformer        |    1    |  51.37711   |
|              maml               |    1    |  49.603373  |
|        timm_efficientnet        |   64    |  49.576471  |
|             alexnet             |   128   |  43.452202  |
|   mobilenet_v2_quantized_qat    |   96    |  43.389773  |
|              hf_T5              |    1    |  43.347642  |
|             hf_Bert             |    1    |  40.194155  |
|         LearningToPaint         |   96    |  36.912778  |
|              vgg16              |    4    |  36.039134  |
|           hf_Reformer           |    1    |  36.026833  |
|            hf_Albert            |    1    |  35.002664  |
|     nvidia_deeprecommender      |   256   |  34.456676  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  33.061812  |
|          fastNLP_Bert           |    1    |  32.876934  |
|          hf_DistilBert          |    1    |  31.908615  |
|     pyhpc_isoneutral_mixing     | 1048576 |  29.569647  |
|          BERT_pytorch           |    2    |  27.244863  |
|     resnet50_quantized_qat      |   32    |  25.468858  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  25.061531  |
|         resnext50_32x4d         |    8    |  23.941817  |
|             hf_GPT2             |    1    |  23.165845  |
|           tts_angular           |   64    |  20.383469  |
|        phlippe_densenet         |   128   |  19.613631  |
|        basic_gnn_edgecnn        |    1    |  19.356499  |
|              dcgan              |   256   |  18.162202  |
|       shufflenet_v2_x1_0        |   64    |  15.700223  |
|           mnasnet1_0            |   32    |  14.350652  |
|       mobilenet_v3_large        |   32    |  13.345556  |
|      functorch_dp_cifar10       |   64    |   10.5336   |
|         opacus_cifar10          |   64    |  10.374891  |
|          basic_gnn_gcn          |    1    |  10.240715  |
|            resnet18             |    8    |  9.546213   |
|          mobilenet_v2           |   16    |  9.133911   |
|              dlrm               |  2048   |  6.424794   |
|          squeezenet1_1          |   16    |  5.670664   |
|         basic_gnn_sage          |    1    |  5.181414   |
|          basic_gnn_gin          |    1    |  4.863584   |
|         phlippe_resnet          |   128   |  4.496941   |
|      doctr_reco_predictor       |    1    |   3.32093   |
|     pyhpc_equation_of_state     | 1048576 |  1.162955   |
|               drq               |    1    |  0.832433   |
|        soft_actor_critic        |   256   |  0.518502   |
|          maml_omniglot          |    5    |  0.512617   |
|     functorch_maml_omniglot     |    1    |  0.465114   |
|          lennard_jones          |  1000   |  0.215477   |
|        timm_efficientdet        |    0    |     0.0     |
|              moco               |    0    |     0.0     |
|         DALLE2_pytorch          |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.592982 |
|     MobileBertForQuestionAnswering      | 128 | 1.828406 |
|      GPT2ForSequenceClassification      |  4  | 1.813613 |
|           ElectraForCausalLM            | 32  | 1.713113 |
|       ElectraForQuestionAnswering       | 64  | 1.68237  |
|          MobileBertForMaskedLM          | 128 | 1.639313 |
|               DistillGPT2               | 16  | 1.486713 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.424903 |
|            YituTechConvBert             | 16  | 1.414463 |
|    LayoutLMForSequenceClassification    | 16  | 1.401141 |
|       RobertaForQuestionAnswering       | 16  | 1.39087  |
|           RobertaForCausalLM            | 16  | 1.386358 |
|        BertForQuestionAnswering         | 16  | 1.381536 |
|               GoogleFnet                | 16  | 1.354263 |
|           LayoutLMForMaskedLM           | 16  | 1.330155 |
|                CamemBert                | 16  | 1.324619 |
|             BertForMaskedLM             | 16  | 1.324461 |
|          AllenaiLongformerBase          |  4  | 1.310154 |
|    MegatronBertForQuestionAnswering     |  8  | 1.289099 |
|         MegatronBertForCausalLM         |  4  | 1.265873 |
|       DebertaForQuestionAnswering       | 16  | 1.232513 |
|     PLBartForConditionalGeneration      |  4  | 1.227509 |
|      MBartForConditionalGeneration      |  2  | 1.191525 |
|           DebertaForMaskedLM            |  8  | 1.188211 |
|       MT5ForConditionalGeneration       | 16  | 1.179448 |
|             OPTForCausalLM              |  2  | 1.177746 |
|                 T5Small                 |  4  | 1.176984 |
|       T5ForConditionalGeneration        |  4  | 1.176077 |
| BlenderbotSmallForConditionalGeneration | 64  | 1.15407  |
|            AlbertForMaskedLM            |  4  | 1.131487 |
|       AlbertForQuestionAnswering        |  4  | 1.120831 |
|          DistilBertForMaskedLM          | 128 | 1.100445 |
|         Speech2Text2ForCausalLM         | 256 | 1.089271 |
|       BlenderbotSmallForCausalLM        | 64  | 1.085125 |
|     M2M100ForConditionalGeneration      | 16  | 1.084224 |
|             XGLMForCausalLM             |  8  | 1.083807 |
|     DistilBertForQuestionAnswering      | 256 | 1.082586 |
|      BartForConditionalGeneration       |  2  | 1.079318 |
|          DebertaV2ForMaskedLM           |  2  | 1.069292 |
|            PLBartForCausalLM            |  8  | 1.048646 |
|            TrOCRForCausalLM             | 32  |  1.0462  |
|            MBartForCausalLM             |  4  | 1.045397 |
|     PegasusForConditionalGeneration     | 32  | 1.044223 |
|             BartForCausalLM             |  4  | 1.034653 |
|           PegasusForCausalLM            | 32  | 1.032652 |
|          BlenderbotForCausalLM          |  4  | 1.026928 |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 159.819823 |
|     MobileBertForQuestionAnswering      | 128 | 80.641678  |
|          MobileBertForMaskedLM          | 128 | 80.230184  |
|      MBartForConditionalGeneration      |  2  | 73.367994  |
|     PegasusForConditionalGeneration     | 32  | 73.299017  |
|     M2M100ForConditionalGeneration      | 16  | 71.556383  |
|          BlenderbotForCausalLM          |  4  | 61.572967  |
|             XGLMForCausalLM             |  8  | 60.240223  |
|      BartForConditionalGeneration       |  2  | 59.192759  |
|          DebertaV2ForMaskedLM           |  2  | 58.424493  |
|            XLNetLMHeadModel             |  8  | 56.186229  |
|       MT5ForConditionalGeneration       | 16  | 55.386821  |
| BlenderbotSmallForConditionalGeneration | 64  | 51.888169  |
|         MegatronBertForCausalLM         |  4  | 50.963732  |
|    MegatronBertForQuestionAnswering     |  8  | 50.255146  |
|            YituTechConvBert             | 16  | 49.570155  |
|      DebertaV2ForQuestionAnswering      |  1  | 48.058501  |
|     PLBartForConditionalGeneration      |  4  | 43.298898  |
|       T5ForConditionalGeneration        |  4  |  42.12423  |
|                 T5Small                 |  4  | 41.999083  |
|             OPTForCausalLM              |  2  |  37.06672  |
|            TrOCRForCausalLM             | 32  |  36.79951  |
|       DebertaForQuestionAnswering       | 16  | 36.707396  |
|           PegasusForCausalLM            | 32  | 36.545241  |
|            MBartForCausalLM             |  4  | 36.500834  |
|           DebertaForMaskedLM            |  8  | 36.446406  |
|           ElectraForCausalLM            | 32  | 32.324235  |
|           RobertaForCausalLM            | 16  | 32.138064  |
|       ElectraForQuestionAnswering       | 64  | 31.767182  |
|                CamemBert                | 16  | 31.683564  |
|       RobertaForQuestionAnswering       | 16  | 31.604654  |
|           LayoutLMForMaskedLM           | 16  | 31.446662  |
|        BertForQuestionAnswering         | 16  | 31.420721  |
|             BertForMaskedLM             | 16  | 31.368854  |
|    LayoutLMForSequenceClassification    | 16  | 31.015969  |
|             BartForCausalLM             |  4  | 30.776492  |
|      GPT2ForSequenceClassification      |  4  | 30.764723  |
|       AlbertForQuestionAnswering        |  4  | 30.748973  |
|            AlbertForMaskedLM            |  4  | 30.552965  |
|       BlenderbotSmallForCausalLM        | 64  | 29.030022  |
|     DistilBertForQuestionAnswering      | 256 | 26.453289  |
|          DistilBertForMaskedLM          | 128 | 26.364459  |
|            PLBartForCausalLM            |  8  | 25.962467  |
|         Speech2Text2ForCausalLM         | 256 | 25.552829  |
|               GoogleFnet                | 16  | 25.464923  |
|               DistillGPT2               | 16  | 23.349273  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  | 0.994042 |
|       AlbertForQuestionAnswering        |  4  | 0.993999 |
|     DistilBertForQuestionAnswering      | 256 | 0.993131 |
|               DistillGPT2               | 16  | 0.992159 |
|            TrOCRForCausalLM             | 32  | 0.992145 |
|           RobertaForCausalLM            | 16  | 0.992118 |
|          DistilBertForMaskedLM          | 128 | 0.991883 |
|             OPTForCausalLM              |  2  | 0.99173  |
|               GoogleFnet                | 16  | 0.991219 |
|             BertForMaskedLM             | 16  | 0.990926 |
|       ElectraForQuestionAnswering       | 64  | 0.990877 |
|           LayoutLMForMaskedLM           | 16  | 0.990832 |
|                CamemBert                | 16  | 0.990829 |
|           ElectraForCausalLM            | 32  | 0.990826 |
|            PLBartForCausalLM            |  8  | 0.990283 |
|    MegatronBertForQuestionAnswering     |  8  | 0.98998  |
|            MBartForCausalLM             |  4  | 0.989677 |
|            YituTechConvBert             | 16  | 0.988994 |
|       DebertaForQuestionAnswering       | 16  | 0.988953 |
|     PegasusForConditionalGeneration     | 32  | 0.988841 |
|        BertForQuestionAnswering         | 16  | 0.988835 |
|    LayoutLMForSequenceClassification    | 16  | 0.988731 |
|       RobertaForQuestionAnswering       | 16  | 0.988638 |
|         Speech2Text2ForCausalLM         | 256 | 0.988206 |
|      GPT2ForSequenceClassification      |  4  | 0.987993 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.987786 |
|     PLBartForConditionalGeneration      |  4  | 0.987552 |
|      MBartForConditionalGeneration      |  2  |  0.9873  |
|           PegasusForCausalLM            | 32  | 0.987238 |
|       BlenderbotSmallForCausalLM        | 64  | 0.986759 |
|             BartForCausalLM             |  4  | 0.985848 |
|          BlenderbotForCausalLM          |  4  | 0.98535  |
|         MegatronBertForCausalLM         |  4  | 0.985076 |
|           DebertaForMaskedLM            |  8  | 0.984782 |
|      BartForConditionalGeneration       |  2  | 0.983305 |
|       MT5ForConditionalGeneration       | 16  | 0.982422 |
|            XLNetLMHeadModel             |  8  | 0.982082 |
|          MobileBertForMaskedLM          | 128 | 0.981986 |
|       T5ForConditionalGeneration        |  4  | 0.980633 |
|                 T5Small                 |  4  | 0.980216 |
|     M2M100ForConditionalGeneration      | 16  | 0.977147 |
|     MobileBertForQuestionAnswering      | 128 | 0.974971 |
|          DebertaV2ForMaskedLM           |  2  | 0.974301 |
|          AllenaiLongformerBase          |  4  | 0.972313 |
|             XGLMForCausalLM             |  8  | 0.971803 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.869381 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|       AlbertForQuestionAnswering        |  4  | 2722.075121 |
|            AlbertForMaskedLM            |  4  | 2707.244302 |
|            XLNetLMHeadModel             |  8  | 1301.89097  |
|     PegasusForConditionalGeneration     | 32  | 1002.67746  |
|            TrOCRForCausalLM             | 32  | 980.994018  |
|     DistilBertForQuestionAnswering      | 256 | 894.969197  |
|    MegatronBertForQuestionAnswering     |  8  | 789.699687  |
|            MBartForCausalLM             |  4  | 681.006935  |
|          BlenderbotForCausalLM          |  4  | 680.084922  |
|      MBartForConditionalGeneration      |  2  | 677.725951  |
|          DistilBertForMaskedLM          | 128 | 677.027699  |
|          DebertaV2ForMaskedLM           |  2  | 621.039974  |
|           RobertaForCausalLM            | 16  | 608.681599  |
|     M2M100ForConditionalGeneration      | 16  | 605.413521  |
|      BartForConditionalGeneration       |  2  | 600.600739  |
|             OPTForCausalLM              |  2  | 598.625081  |
|            YituTechConvBert             | 16  | 572.521179  |
|                CamemBert                | 16  | 571.820902  |
|           LayoutLMForMaskedLM           | 16  | 567.417691  |
|             BertForMaskedLM             | 16  | 566.163853  |
|          AllenaiLongformerBase          |  4  | 536.865238  |
|             BartForCausalLM             |  4  | 528.590054  |
|       DebertaForQuestionAnswering       | 16  | 528.298256  |
|            PLBartForCausalLM            |  8  | 515.941053  |
|           PegasusForCausalLM            | 32  | 493.052958  |
| BlenderbotSmallForConditionalGeneration | 64  |  488.28551  |
|     PLBartForConditionalGeneration      |  4  | 476.042889  |
|         MegatronBertForCausalLM         |  4  | 454.919404  |
|        BertForQuestionAnswering         | 16  | 452.783765  |
|    LayoutLMForSequenceClassification    | 16  | 450.253332  |
|       RobertaForQuestionAnswering       | 16  | 442.499167  |
|               GoogleFnet                | 16  |  411.64054  |
|          MobileBertForMaskedLM          | 128 | 395.614915  |
|               DistillGPT2               | 16  | 389.955641  |
|             XGLMForCausalLM             |  8  | 383.450605  |
|           DebertaForMaskedLM            |  8  | 369.943917  |
|       ElectraForQuestionAnswering       | 64  | 333.246036  |
|       T5ForConditionalGeneration        |  4  | 331.196413  |
|                 T5Small                 |  4  | 330.705559  |
|         Speech2Text2ForCausalLM         | 256 | 280.128094  |
|       BlenderbotSmallForCausalLM        | 64  | 279.216816  |
|      GPT2ForSequenceClassification      |  4  | 271.738208  |
|           ElectraForCausalLM            | 32  | 256.116993  |
|     MobileBertForQuestionAnswering      | 128 | 244.002557  |
|       MT5ForConditionalGeneration       | 16  | 226.069362  |
|      DebertaV2ForQuestionAnswering      |  1  | 225.623078  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           fbnetc_100            | 512  | 3.917808 |
|           mnasnet_100           | 512  | 3.882476 |
|            lcnet_050            | 256  | 3.881905 |
|         mobilenetv2_100         | 128  | 3.815165 |
|      mobilenetv3_large_100      | 512  | 3.675843 |
|          spnasnet_100           | 128  | 3.583263 |
|            fbnetv3_b            | 256  | 3.491787 |
|           regnety_002           | 1024 | 3.304499 |
|           rexnet_100            | 256  | 3.100085 |
|       tf_efficientnet_b0        | 128  | 2.951297 |
|            tinynet_a            | 128  | 2.865641 |
|        ese_vovnet19b_dw         | 256  | 2.619016 |
|          pnasnet5large          |  16  | 2.58977  |
|          botnet26t_256          | 128  | 2.583006 |
|            hrnet_w18            | 128  | 2.551546 |
|           res2next50            | 128  | 2.397158 |
|          ghostnet_100           | 512  | 2.364984 |
|       eca_botnext26ts_256       | 128  | 2.343112 |
|       gluon_inception_v3        | 256  | 2.284014 |
|        eca_halonext26ts         | 128  | 2.237016 |
|           resnest101e           |  64  | 2.234324 |
|          inception_v3           | 128  | 2.226208 |
|             dla102              | 128  | 2.207341 |
|        adv_inception_v3         | 128  | 2.193634 |
|        res2net50_14w_8s         | 128  | 2.153422 |
|        res2net101_26w_4s        | 128  | 2.136224 |
|          cspdarknet53           |  64  | 2.065645 |
|            repvgg_a2            | 128  | 2.055382 |
|            nfnet_l0             | 128  | 2.015753 |
|        convmixer_768_32         |  32  | 1.969262 |
|            gernet_l             | 128  | 1.913785 |
|           tf_mixnet_l           | 128  | 1.86735  |
|           dm_nfnet_f0           | 128  | 1.855359 |
|        sebotnet33ts_256         |  64  | 1.786991 |
|           selecsls42b           | 128  | 1.774086 |
|            mixnet_l             | 128  | 1.696346 |
|         visformer_small         | 128  | 1.674445 |
|         poolformer_m36          |  64  | 1.670556 |
|           volo_d1_224           |  64  | 1.638688 |
|     swsl_resnext101_32x16d      |  32  | 1.577134 |
|             dpn107              |  64  | 1.492774 |
|            levit_128            | 1024 | 1.462525 |
|           mobilevit_s           |  64  | 1.437015 |
|          gmlp_s16_224           | 128  | 1.277475 |
|          resmlp_12_224          | 128  | 1.196603 |
|      xcit_large_24_p8_224       |  16  | 1.196542 |
|           convit_base           |  64  | 1.172032 |
|          cait_m36_384           |  4   | 1.14973  |
|          gmixer_24_224          | 128  | 1.149605 |
|  swin_base_patch4_window7_224   |  64  | 1.115678 |
|        tnt_s_patch16_224        | 128  | 1.104758 |
|        twins_pcpvt_base         | 128  | 1.089816 |
|          convnext_base          |  64  | 1.062191 |
|          mixer_b16_224          | 128  | 1.057057 |
|      beit_base_patch16_224      |  64  | 1.052153 |
|            pit_b_224            |  64  | 1.031011 |
|      vit_base_patch16_224       |  64  | 1.030222 |
| deit_base_distilled_patch16_224 |  64  | 1.026721 |
|          jx_nest_base           |  32  | 1.026499 |
|         crossvit_9_240          | 256  | 1.008798 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|        adv_inception_v3         | 8  |    pass     |
|      beit_base_patch16_224      | 8  |    pass     |
|          botnet26t_256          | 8  |    pass     |
|          cait_m36_384           | 8  |    pass     |
|           convit_base           | 8  |    pass     |
|        convmixer_768_32         | 8  |    pass     |
|          convnext_base          | 8  |    pass     |
|         crossvit_9_240          | 8  |    pass     |
|          cspdarknet53           | 8  |    pass     |
| deit_base_distilled_patch16_224 | 8  |    pass     |
|             dla102              | 8  |    pass     |
|           dm_nfnet_f0           | 8  |    pass     |
|             dpn107              | 8  |    pass     |
|       eca_botnext26ts_256       | 8  |    pass     |
|        eca_halonext26ts         | 8  |    pass     |
|        ese_vovnet19b_dw         | 8  |    pass     |
|           fbnetc_100            | 8  |    pass     |
|            fbnetv3_b            | 8  |    pass     |
|            gernet_l             | 8  |    pass     |
|          ghostnet_100           | 8  |    pass     |
|       gluon_inception_v3        | 8  |    pass     |
|          gmixer_24_224          | 8  |    pass     |
|          gmlp_s16_224           | 8  |    pass     |
|            hrnet_w18            | 8  |    pass     |
|          inception_v3           | 8  |    pass     |
|          jx_nest_base           | 8  |    pass     |
|            lcnet_050            | 8  |    pass     |
|            levit_128            | 8  |    pass     |
|      xcit_large_24_p8_224       | 8  |    pass     |
|          mixer_b16_224          | 8  |    pass     |
|            mixnet_l             | 8  |    pass     |
|           mnasnet_100           | 8  |    pass     |
|         mobilenetv2_100         | 8  |    pass     |
|      mobilenetv3_large_100      | 8  |    pass     |
|           mobilevit_s           | 8  |    pass     |
|            nfnet_l0             | 8  |    pass     |
|            pit_b_224            | 8  |    pass     |
|          pnasnet5large          | 8  |    pass     |
|         poolformer_m36          | 8  |    pass     |
|           regnety_002           | 8  |    pass     |
|            repvgg_a2            | 8  |    pass     |
|        res2net101_26w_4s        | 8  |    pass     |
|        res2net50_14w_8s         | 8  |    pass     |
|           res2next50            | 8  |    pass     |
|          resmlp_12_224          | 8  |    pass     |
|           resnest101e           | 8  |    pass     |
|           rexnet_100            | 8  |    pass     |
|        sebotnet33ts_256         | 8  |    pass     |
|           selecsls42b           | 8  |    pass     |
|          spnasnet_100           | 8  |    pass     |
|  swin_base_patch4_window7_224   | 8  |    pass     |
|     swsl_resnext101_32x16d      | 8  |    pass     |
|       tf_efficientnet_b0        | 8  |    pass     |
|           tf_mixnet_l           | 8  |    pass     |
|            tinynet_a            | 8  |    pass     |
|        tnt_s_patch16_224        | 8  |    pass     |
|        twins_pcpvt_base         | 8  |    pass     |
|         visformer_small         | 8  |    pass     |
|      vit_base_patch16_224       | 8  |    pass     |
|           volo_d1_224           | 8  |    pass     |
|         coat_lite_mini          | 8  | fail_to_run |
+---------------------------------+----+-------------+

Compilation latency (sec)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          pnasnet5large          |  16  | 494.080352 |
|            hrnet_w18            | 128  | 274.513215 |
|          cait_m36_384           |  4   | 105.548083 |
|      xcit_large_24_p8_224       |  16  | 99.840807  |
|  swin_base_patch4_window7_224   |  64  | 98.437472  |
|           tf_mixnet_l           | 128  | 89.071411  |
|        res2net101_26w_4s        | 128  | 88.995535  |
|            mixnet_l             | 128  | 80.091538  |
|        res2net50_14w_8s         | 128  | 77.102238  |
|           mobilevit_s           |  64  | 76.507807  |
|           resnest101e           |  64  | 75.863043  |
|         poolformer_m36          |  64  | 74.819306  |
|        tnt_s_patch16_224        | 128  |  69.63615  |
|          jx_nest_base           |  32  | 68.500192  |
|        twins_pcpvt_base         | 128  | 68.010527  |
|             dpn107              |  64  | 65.961804  |
|           volo_d1_224           |  64  |  55.89985  |
|            fbnetv3_b            | 256  | 54.806307  |
|        eca_halonext26ts         | 128  | 50.925377  |
|          gmixer_24_224          | 128  | 48.932355  |
|            levit_128            | 1024 | 48.716086  |
|          gmlp_s16_224           | 128  | 47.583951  |
|         crossvit_9_240          | 256  |  47.35707  |
|          convnext_base          |  64  | 46.643861  |
|           convit_base           |  64  | 44.722819  |
|        sebotnet33ts_256         |  64  | 43.296658  |
|        adv_inception_v3         | 128  | 40.729032  |
|          inception_v3           | 128  | 40.642082  |
|             dla102              | 128  | 39.570416  |
|          ghostnet_100           | 512  | 39.543192  |
|       gluon_inception_v3        | 256  | 39.399585  |
|           res2next50            | 128  | 39.088981  |
|            tinynet_a            | 128  | 37.810588  |
|           rexnet_100            | 256  | 37.206591  |
|           dm_nfnet_f0           | 128  | 37.145522  |
|       tf_efficientnet_b0        | 128  | 36.471054  |
|     swsl_resnext101_32x16d      |  32  | 36.220112  |
|         visformer_small         | 128  |  36.19511  |
|       eca_botnext26ts_256       | 128  | 35.804732  |
|        convmixer_768_32         |  32  | 35.291467  |
|            nfnet_l0             | 128  | 34.335647  |
|          botnet26t_256          | 128  | 32.627765  |
|            pit_b_224            |  64  | 31.586444  |
|          mixer_b16_224          | 128  | 31.359666  |
|      beit_base_patch16_224      |  64  | 30.679212  |
| deit_base_distilled_patch16_224 |  64  | 29.333027  |
|      vit_base_patch16_224       |  64  | 29.184319  |
|          cspdarknet53           |  64  | 28.955838  |
|           regnety_002           | 1024 | 27.616613  |
|      mobilenetv3_large_100      | 512  |  27.54755  |
|          resmlp_12_224          | 128  | 24.622862  |
|          spnasnet_100           | 128  | 24.515334  |
|         mobilenetv2_100         | 128  | 24.479887  |
|           fbnetc_100            | 512  | 23.431253  |
|            gernet_l             | 128  |  23.38568  |
|            repvgg_a2            | 128  | 23.207018  |
|        ese_vovnet19b_dw         | 256  | 22.698074  |
|            lcnet_050            | 256  | 22.066851  |
|           mnasnet_100           | 512  | 21.039453  |
|           selecsls42b           | 128  | 20.801042  |
+---------------------------------+------+------------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.997666 |
|           fbnetc_100            | 512  | 0.996907 |
|      mobilenetv3_large_100      | 512  | 0.996294 |
|           dm_nfnet_f0           | 128  | 0.995968 |
|          convnext_base          |  64  |  0.9959  |
|           mnasnet_100           | 512  | 0.995874 |
|            fbnetv3_b            | 256  | 0.995804 |
|           regnety_002           | 1024 | 0.995786 |
|          ghostnet_100           | 512  | 0.995377 |
|            levit_128            | 1024 | 0.994943 |
|       eca_botnext26ts_256       | 128  | 0.994514 |
|        res2net101_26w_4s        | 128  | 0.99428  |
|       gluon_inception_v3        | 256  | 0.994067 |
|             dpn107              |  64  | 0.993971 |
|           rexnet_100            | 256  | 0.993967 |
|        eca_halonext26ts         | 128  | 0.99381  |
|             dla102              | 128  | 0.99351  |
|           res2next50            | 128  | 0.993487 |
|           tf_mixnet_l           | 128  | 0.993355 |
|          mixer_b16_224          | 128  | 0.99334  |
|      xcit_large_24_p8_224       |  16  | 0.993137 |
|        convmixer_768_32         |  32  | 0.993079 |
|           convit_base           |  64  | 0.993001 |
|          gmlp_s16_224           | 128  | 0.992864 |
|        res2net50_14w_8s         | 128  | 0.992755 |
|          gmixer_24_224          | 128  | 0.992741 |
|       tf_efficientnet_b0        | 128  | 0.992613 |
|          botnet26t_256          | 128  | 0.992575 |
|          cspdarknet53           |  64  | 0.992364 |
|         visformer_small         | 128  | 0.991915 |
|            mixnet_l             | 128  | 0.991787 |
|            nfnet_l0             | 128  | 0.991672 |
|            gernet_l             | 128  | 0.991667 |
|          inception_v3           | 128  | 0.991433 |
|        adv_inception_v3         | 128  | 0.991315 |
|           resnest101e           |  64  | 0.991138 |
|        sebotnet33ts_256         |  64  | 0.990846 |
|           mobilevit_s           |  64  | 0.990589 |
|      beit_base_patch16_224      |  64  | 0.990443 |
|        twins_pcpvt_base         | 128  | 0.990067 |
|           selecsls42b           | 128  | 0.989765 |
|          pnasnet5large          |  16  | 0.989526 |
|         mobilenetv2_100         | 128  | 0.989258 |
|            pit_b_224            |  64  | 0.98923  |
|        tnt_s_patch16_224        | 128  | 0.988899 |
|          resmlp_12_224          | 128  | 0.988685 |
|          cait_m36_384           |  4   | 0.988502 |
|         poolformer_m36          |  64  | 0.988452 |
|            tinynet_a            | 128  | 0.988316 |
|            hrnet_w18            | 128  | 0.988047 |
|          spnasnet_100           | 128  | 0.987827 |
|  swin_base_patch4_window7_224   |  64  | 0.987659 |
|      vit_base_patch16_224       |  64  | 0.98756  |
|     swsl_resnext101_32x16d      |  32  | 0.987103 |
| deit_base_distilled_patch16_224 |  64  | 0.986218 |
|            lcnet_050            | 256  |  0.9856  |
|           volo_d1_224           |  64  | 0.983988 |
|            repvgg_a2            | 128  | 0.983515 |
|          jx_nest_base           |  32  | 0.983429 |
|         crossvit_9_240          | 256  | 0.973081 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+-------------+
|              name               |  bs  |  inductor   |
+---------------------------------+------+-------------+
|      xcit_large_24_p8_224       |  16  | 1471.699763 |
|          convnext_base          |  64  | 1165.432161 |
|          cait_m36_384           |  4   | 1119.237846 |
|          mixer_b16_224          | 128  | 1055.747936 |
|           dm_nfnet_f0           | 128  | 951.122557  |
|           convit_base           |  64  | 929.729879  |
|             dpn107              |  64  | 924.942186  |
|  swin_base_patch4_window7_224   |  64  | 841.179832  |
|        twins_pcpvt_base         | 128  | 827.623749  |
|        tnt_s_patch16_224        | 128  | 821.517977  |
|       gluon_inception_v3        | 256  | 813.632957  |
| deit_base_distilled_patch16_224 |  64  | 696.225448  |
|      vit_base_patch16_224       |  64  | 693.573117  |
|      beit_base_patch16_224      |  64  | 686.183102  |
|     swsl_resnext101_32x16d      |  32  | 640.538382  |
|        res2net101_26w_4s        | 128  | 639.255246  |
|            nfnet_l0             | 128  | 603.132013  |
|            levit_128            | 1024 | 565.764108  |
|          gmixer_24_224          | 128  | 562.680825  |
|            pit_b_224            |  64  | 559.328682  |
|          gmlp_s16_224           | 128  | 556.507158  |
|        ese_vovnet19b_dw         | 256  | 549.074516  |
|          jx_nest_base           |  32  | 531.232903  |
|             dla102              | 128  | 526.439403  |
|         crossvit_9_240          | 256  | 509.161241  |
|           resnest101e           |  64  | 481.257783  |
|         poolformer_m36          |  64  | 470.164657  |
|        convmixer_768_32         |  32  | 438.907538  |
|           volo_d1_224           |  64  | 434.214388  |
|            hrnet_w18            | 128  | 433.935496  |
|          inception_v3           | 128  | 408.034417  |
|        adv_inception_v3         | 128  | 406.805311  |
|        res2net50_14w_8s         | 128  | 395.725444  |
|         visformer_small         | 128  | 390.149267  |
|            mixnet_l             | 128  | 361.546091  |
|           res2next50            | 128  | 359.406037  |
|          ghostnet_100           | 512  | 356.354944  |
|           tf_mixnet_l           | 128  | 349.810535  |
|          pnasnet5large          |  16  | 342.750203  |
|            repvgg_a2            | 128  | 335.231846  |
|        eca_halonext26ts         | 128  | 311.695694  |
|           fbnetc_100            | 512  | 306.487383  |
|       eca_botnext26ts_256       | 128  |  291.25097  |
|            gernet_l             | 128  | 291.192111  |
|           regnety_002           | 1024 |  279.98655  |
|        sebotnet33ts_256         |  64  | 279.156598  |
|          botnet26t_256          | 128  | 275.755115  |
|          resmlp_12_224          | 128  |  264.88106  |
|           mobilevit_s           |  64  | 263.685633  |
|          cspdarknet53           |  64  | 259.593723  |
|           mnasnet_100           | 512  | 255.860581  |
|            fbnetv3_b            | 256  | 240.407924  |
|           selecsls42b           | 128  | 231.410897  |
|      mobilenetv3_large_100      | 512  | 226.486158  |
|           rexnet_100            | 256  | 224.648535  |
|       tf_efficientnet_b0        | 128  | 118.222962  |
|            tinynet_a            | 128  |  82.826322  |
|         mobilenetv2_100         | 128  |  71.544812  |
|          spnasnet_100           | 128  |  65.07594   |
|            lcnet_050            | 256  |  26.835034  |
+---------------------------------+------+-------------+

@zxd1997066
Copy link
Contributor

[cppwrapper_static_shape] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-05-05 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 6d30803d64953955df63da56833bf4eb52249aae
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+06ad737
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000" 

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 96%, 76/79 | 100%, 46/46 | 100%, 60/60 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.51x    |    1.38x    |    1.95x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   52.62    |    39.22    |    49.63    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.90x    |    0.97x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 10.960235 |
|          squeezenet1_1          |   16    | 3.275402  |
|       mobilenet_v3_large        |   32    | 3.228168  |
|          mobilenet_v2           |   16    | 3.156361  |
|           mnasnet1_0            |   32    | 3.041477  |
|        timm_efficientnet        |   64    | 3.017699  |
|       shufflenet_v2_x1_0        |   64    | 2.743781  |
|          timm_resnest           |   32    | 2.353967  |
|            resnet50             |   32    |  2.20623  |
|        soft_actor_critic        |   256   | 2.167187  |
|        phlippe_densenet         |   128   | 2.155112  |
|         resnext50_32x4d         |    8    | 2.063653  |
|            resnet152            |   32    | 2.025009  |
|         phlippe_resnet          |   128   | 2.010139  |
|           densenet121           |   64    | 1.988884  |
|            resnet18             |    8    | 1.950763  |
|       doctr_det_predictor       |    1    |  1.94139  |
|           timm_regnet           |   32    |  1.90724  |
|           timm_nfnet            |   128   | 1.798801  |
|             hf_GPT2             |    1    | 1.768934  |
|          maml_omniglot          |    5    | 1.756669  |
|           timm_vovnet           |   32    | 1.697058  |
|          BERT_pytorch           |    2    | 1.678837  |
|             alexnet             |   128   | 1.645774  |
|      doctr_reco_predictor       |    1    | 1.606675  |
|             yolov3              |    8    | 1.592261  |
|            hf_Albert            |    1    | 1.565823  |
|          hf_Bert_large          |    1    | 1.539326  |
|            moondream            |    1    | 1.531136  |
|          fastNLP_Bert           |    1    | 1.523175  |
|        basic_gnn_edgecnn        |    1    | 1.521642  |
|              llama              |   32    | 1.520146  |
|          hf_GPT2_large          |    1    | 1.500348  |
|     functorch_maml_omniglot     |    1    |  1.4992   |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.478555  |
|             hf_Bert             |    1    |  1.47447  |
|          hf_Longformer          |    1    | 1.457287  |
|              vgg16              |    4    | 1.445794  |
|         LearningToPaint         |   96    | 1.419399  |
|          lennard_jones          |  1000   | 1.363481  |
|          hf_DistilBert          |    1    | 1.358849  |
|      torch_multimodal_clip      |   32    | 1.329493  |
|              dcgan              |   256   | 1.321469  |
|          basic_gnn_gcn          |    1    | 1.297233  |
|             hf_Bart             |    1    | 1.285362  |
|     timm_vision_transformer     |   32    | 1.273756  |
|           hf_BigBird            |    1    |  1.25806  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.233686  |
|        hf_distil_whisper        |    1    | 1.228722  |
|         basic_gnn_sage          |    1    | 1.218849  |
|         pytorch_stargan         |   16    | 1.212728  |
|           hf_T5_large           |    1    | 1.207027  |
|     nvidia_deeprecommender      |   256   | 1.202599  |
|          pytorch_unet           |    1    | 1.192847  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.192478  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.160454  |
|              hf_T5              |    1    | 1.157229  |
|              dlrm               |  2048   | 1.140871  |
|          basic_gnn_gin          |    1    | 1.137548  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.123389  |
|               drq               |    1    | 1.083501  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.078255  |
|  timm_vision_transformer_large  |   32    | 1.057157  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.053351  |
|           hf_Reformer           |    1    | 1.052281  |
|       speech_transformer        |    1    | 1.041612  |
|     resnet50_quantized_qat      |   32    | 1.012338  |
|             demucs              |    1    | 1.011658  |
|           tts_angular           |   64    | 1.000437  |
|   mobilenet_v2_quantized_qat    |   96    | 0.997816  |
|           hf_T5_base            |    1    | 0.800707  |
|       Background_Matting        |    1    | 0.799148  |
|         opacus_cifar10          |   64    | 0.749622  |
|              maml               |    1    | 0.731469  |
|      functorch_dp_cifar10       |   64    | 0.681081  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.640307  |
|              moco               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|             yolov3              |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|              dcgan              |    4    |        pass        |
|           densenet121           |    4    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    4    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    4    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    4    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    4    |        pass        |
|           hf_T5_base            |    4    |        pass        |
|             demucs              |    1    |        pass        |
|        hf_distil_whisper        |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|              dlrm               |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|              llama              |    4    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|            resnet152            |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|            moondream            |    4    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_fpn |    4    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    4    |    fail_to_run     |
|           Super_SloMo           |    4    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 470.965612 |
|    detectron2_fcos_r_50_fpn     |    1    | 406.761158 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 230.010965 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 222.070726 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 215.018467 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 205.872252 |
|              maml               |    1    | 141.897989 |
|           hf_T5_large           |    1    | 106.916129 |
|       speech_transformer        |    1    | 89.691996  |
|          hf_Longformer          |    1    | 85.079201  |
|           hf_Reformer           |    1    | 79.823741  |
|          basic_gnn_gcn          |    1    | 61.047732  |
|  timm_vision_transformer_large  |   32    | 55.073443  |
|          fastNLP_Bert           |    1    | 54.882387  |
|          hf_GPT2_large          |    1    | 53.290378  |
|            resnet152            |   32    | 51.774663  |
|           densenet121           |   64    | 51.305982  |
|           hf_T5_base            |    1    | 50.607895  |
|            moondream            |    1    | 49.391435  |
|     pyhpc_isoneutral_mixing     | 1048576 | 48.539529  |
|       doctr_det_predictor       |    1    | 46.765727  |
|          hf_Bert_large          |    1    | 44.383968  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 43.548639  |
|      torch_multimodal_clip      |   32    | 43.280259  |
|        hf_distil_whisper        |    1    | 43.210571  |
|             demucs              |    1    | 34.170553  |
|           timm_regnet           |   32    | 33.311641  |
|           timm_nfnet            |   128   |  32.61097  |
|             hf_Bart             |    1    | 31.269865  |
|              hf_T5              |    1    |  30.77906  |
|             yolov3              |    8    | 30.377296  |
|          BERT_pytorch           |    2    | 29.780487  |
|        timm_efficientnet        |   64    | 28.758438  |
|             hf_Bert             |    1    | 26.952879  |
|       shufflenet_v2_x1_0        |   64    | 26.267449  |
|        phlippe_densenet         |   128   | 25.879564  |
|            hf_Albert            |    1    | 25.488468  |
|      doctr_reco_predictor       |    1    | 25.366208  |
|             hf_GPT2             |    1    | 24.886122  |
|       Background_Matting        |    1    | 24.321929  |
|     timm_vision_transformer     |   32    | 24.025681  |
|       mobilenet_v3_large        |   32    | 23.586757  |
|              llama              |   32    |  22.73679  |
|         opacus_cifar10          |   64    | 22.231262  |
|          timm_resnest           |   32    | 22.120354  |
|           timm_vovnet           |   32    | 21.841782  |
|         pytorch_stargan         |   16    | 21.406817  |
|         resnext50_32x4d         |    8    | 21.384019  |
|            resnet50             |   32    | 21.310449  |
|      functorch_dp_cifar10       |   64    | 20.891849  |
|          hf_DistilBert          |    1    | 20.851586  |
|          mobilenet_v2           |   16    | 20.609003  |
|           mnasnet1_0            |   32    | 20.307878  |
|          pytorch_unet           |    1    | 19.776141  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 19.208009  |
|          squeezenet1_1          |   16    | 18.114851  |
|     pyhpc_equation_of_state     | 1048576 | 17.236783  |
|            resnet18             |    8    | 17.198931  |
|         LearningToPaint         |   96    | 16.953052  |
|              vgg16              |    4    | 16.661116  |
|             alexnet             |   128   |  16.55595  |
|         phlippe_resnet          |   128   | 16.555487  |
|               drq               |    1    |  15.8002   |
|     functorch_maml_omniglot     |    1    | 15.464135  |
|          maml_omniglot          |    5    | 15.042973  |
|              dlrm               |  2048   | 15.025273  |
|     nvidia_deeprecommender      |   256   | 14.664349  |
|        basic_gnn_edgecnn        |    1    |  14.56482  |
|              dcgan              |   256   | 14.513199  |
|          basic_gnn_gin          |    1    | 14.474408  |
|         basic_gnn_sage          |    1    | 14.444041  |
|          lennard_jones          |  1000   | 14.028132  |
|        soft_actor_critic        |   256   | 14.027505  |
|           tts_angular           |   64    | 13.575812  |
|   mobilenet_v2_quantized_qat    |   96    |  0.109849  |
|     resnet50_quantized_qat      |   32    |  0.065431  |
|         DALLE2_pytorch          |    0    |    0.0     |
|              moco               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|           timm_nfnet            |   128   | 0.992798 |
|              dlrm               |  2048   | 0.988058 |
|           hf_T5_base            |    1    | 0.987741 |
|        timm_efficientnet        |   64    | 0.986083 |
|       Background_Matting        |    1    | 0.981871 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981223 |
|           densenet121           |   64    | 0.97991  |
|             demucs              |    1    | 0.979104 |
|          pytorch_unet           |    1    | 0.978303 |
|             yolov3              |    8    | 0.974681 |
|  timm_vision_transformer_large  |   32    | 0.974216 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973584 |
|          hf_GPT2_large          |    1    | 0.973298 |
|         LearningToPaint         |   96    | 0.972041 |
|            resnet50             |   32    | 0.971159 |
|           timm_vovnet           |   32    | 0.971082 |
|        basic_gnn_edgecnn        |    1    | 0.970259 |
|          timm_resnest           |   32    | 0.969518 |
|            resnet152            |   32    | 0.968516 |
|       doctr_det_predictor       |    1    | 0.964607 |
|      torch_multimodal_clip      |   32    | 0.963186 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.963184 |
|   mobilenet_v2_quantized_qat    |   96    | 0.962721 |
|     resnet50_quantized_qat      |   32    | 0.962611 |
|           timm_regnet           |   32    | 0.962137 |
|           mnasnet1_0            |   32    | 0.960772 |
|     timm_vision_transformer     |   32    |  0.9583  |
|          mobilenet_v2           |   16    | 0.956978 |
|       shufflenet_v2_x1_0        |   64    | 0.953207 |
|       mobilenet_v3_large        |   32    | 0.952756 |
|           hf_BigBird            |    1    | 0.950286 |
|         pytorch_stargan         |   16    | 0.947028 |
|         resnext50_32x4d         |    8    | 0.946825 |
|        phlippe_densenet         |   128   | 0.943469 |
|          basic_gnn_gcn          |    1    | 0.936573 |
|      doctr_reco_predictor       |    1    | 0.933553 |
|           tts_angular           |   64    | 0.929774 |
|          squeezenet1_1          |   16    | 0.917476 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.914006 |
|              llama              |   32    | 0.913905 |
|              dcgan              |   256   | 0.912831 |
|     pyhpc_equation_of_state     | 1048576 | 0.908683 |
|            resnet18             |    8    | 0.900264 |
|         phlippe_resnet          |   128   | 0.899691 |
|             alexnet             |   128   | 0.895171 |
|         opacus_cifar10          |   64    | 0.891811 |
|        hf_distil_whisper        |    1    | 0.891527 |
|        soft_actor_critic        |   256   | 0.885005 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.88383  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.881885 |
|          lennard_jones          |  1000   | 0.862613 |
|     functorch_maml_omniglot     |    1    | 0.861259 |
|          maml_omniglot          |    5    | 0.858811 |
|         basic_gnn_sage          |    1    | 0.858613 |
|          basic_gnn_gin          |    1    | 0.858075 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.846462 |
|          fastNLP_Bert           |    1    | 0.835997 |
|       speech_transformer        |    1    | 0.81768  |
|          hf_Bert_large          |    1    | 0.805327 |
|      functorch_dp_cifar10       |   64    | 0.803758 |
|           hf_T5_large           |    1    | 0.803502 |
|          hf_Longformer          |    1    | 0.800534 |
|          BERT_pytorch           |    2    | 0.800439 |
|            moondream            |    1    | 0.797779 |
|              maml               |    1    | 0.795244 |
|            hf_Albert            |    1    | 0.791634 |
|             hf_Bert             |    1    | 0.790678 |
|     nvidia_deeprecommender      |   256   | 0.774593 |
|              vgg16              |    4    | 0.772337 |
|              hf_T5              |    1    | 0.765321 |
|               drq               |    1    | 0.764769 |
|          hf_DistilBert          |    1    | 0.761757 |
|             hf_GPT2             |    1    | 0.759435 |
|           hf_Reformer           |    1    | 0.745723 |
|             hf_Bart             |    1    | 0.734382 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.689308 |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4313.517283 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1432.417506 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1297.161722 |
|           hf_T5_base            |    1    | 1245.073069 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1168.555105 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1081.513524 |
|          hf_GPT2_large          |    1    | 552.853025  |
|           timm_nfnet            |   128   | 546.102364  |
|           hf_T5_large           |    1    | 394.362285  |
|            moondream            |    1    | 385.891919  |
|        hf_distil_whisper        |    1    | 343.239679  |
|       Background_Matting        |    1    | 342.801265  |
|          pytorch_unet           |    1    | 225.197492  |
|           timm_regnet           |   32    | 213.600205  |
|            resnet152            |   32    |  186.63492  |
|           densenet121           |   64    | 183.632911  |
|    detectron2_fcos_r_50_fpn     |    1    | 176.387773  |
|             yolov3              |    8    | 154.701784  |
|      torch_multimodal_clip      |   32    | 143.607608  |
|             demucs              |    1    | 141.249551  |
|           hf_BigBird            |    1    | 119.442305  |
|           timm_vovnet           |   32    | 113.942779  |
|          hf_Bert_large          |    1    |  105.99013  |
|         pytorch_stargan         |   16    |  94.081297  |
|     timm_vision_transformer     |   32    |  85.792549  |
|       doctr_det_predictor       |    1    |  85.096006  |
|            resnet50             |   32    |  74.721939  |
|          hf_Longformer          |    1    |  69.456403  |
|          timm_resnest           |   32    |  51.82214   |
|             hf_Bart             |    1    |  51.774508  |
|       speech_transformer        |    1    |  51.186165  |
|              maml               |    1    |  49.537362  |
|        timm_efficientnet        |   64    |  46.71127   |
|   mobilenet_v2_quantized_qat    |   96    |  43.115284  |
|              hf_T5              |    1    |  42.516989  |
|             alexnet             |   128   |  41.86034   |
|             hf_Bert             |    1    |  39.873044  |
|         LearningToPaint         |   96    |  36.742144  |
|           hf_Reformer           |    1    |  35.714024  |
|            hf_Albert            |    1    |  34.707613  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  32.806678  |
|          fastNLP_Bert           |    1    |  32.689833  |
|              vgg16              |    4    |  31.87903   |
|     pyhpc_isoneutral_mixing     | 1048576 |  30.439734  |
|     nvidia_deeprecommender      |   256   |  29.648085  |
|          hf_DistilBert          |    1    |  26.301517  |
|             hf_GPT2             |    1    |  26.152295  |
|     resnet50_quantized_qat      |   32    |  25.509876  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  25.434807  |
|         resnext50_32x4d         |    8    |  21.470019  |
|              llama              |   32    |  20.357708  |
|           tts_angular           |   64    |  20.303812  |
|          BERT_pytorch           |    2    |  19.491475  |
|        phlippe_densenet         |   128   |  19.02046   |
|              dcgan              |   256   |  18.910154  |
|        basic_gnn_edgecnn        |    1    |  18.129693  |
|       shufflenet_v2_x1_0        |   64    |  14.448408  |
|           mnasnet1_0            |   32    |  13.975741  |
|       mobilenet_v3_large        |   32    |  12.244837  |
|          basic_gnn_gcn          |    1    |  9.471293   |
|      functorch_dp_cifar10       |   64    |   9.42936   |
|         opacus_cifar10          |   64    |  9.201987   |
|          mobilenet_v2           |   16    |  8.777869   |
|            resnet18             |    8    |   8.60973   |
|              dlrm               |  2048   |  6.530936   |
|          squeezenet1_1          |   16    |  5.256414   |
|         basic_gnn_sage          |    1    |  5.006433   |
|          basic_gnn_gin          |    1    |  4.702549   |
|         phlippe_resnet          |   128   |  3.865406   |
|      doctr_reco_predictor       |    1    |   3.32388   |
|     pyhpc_equation_of_state     | 1048576 |  1.140721   |
|               drq               |    1    |  0.841587   |
|     functorch_maml_omniglot     |    1    |  0.456908   |
|          maml_omniglot          |    5    |  0.332192   |
|        soft_actor_critic        |   256   |  0.285925   |
|          lennard_jones          |  1000   |  0.171542   |
|        timm_efficientdet        |    0    |     0.0     |
|              moco               |    0    |     0.0     |
|         DALLE2_pytorch          |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.862766 |
|     MobileBertForQuestionAnswering      | 128 | 2.117896 |
|      GPT2ForSequenceClassification      |  4  | 2.027434 |
|           ElectraForCausalLM            | 32  | 1.96417  |
|       ElectraForQuestionAnswering       | 64  | 1.873603 |
|          MobileBertForMaskedLM          | 128 | 1.758105 |
|               DistillGPT2               | 16  | 1.548513 |
|       RobertaForQuestionAnswering       | 16  | 1.519572 |
|            YituTechConvBert             | 16  | 1.495981 |
|    LayoutLMForSequenceClassification    | 16  | 1.481433 |
|        BertForQuestionAnswering         | 16  | 1.466259 |
|               GoogleFnet                | 16  | 1.456268 |
|           RobertaForCausalLM            | 16  | 1.434314 |
|    MegatronBertForQuestionAnswering     |  8  | 1.430276 |
|           LayoutLMForMaskedLM           | 16  | 1.419159 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.418326 |
|             BertForMaskedLM             | 16  | 1.398813 |
|                CamemBert                | 16  | 1.395825 |
|          AllenaiLongformerBase          |  4  | 1.388933 |
|         MegatronBertForCausalLM         |  4  | 1.377722 |
|     PLBartForConditionalGeneration      |  4  | 1.311997 |
|             XGLMForCausalLM             |  8  | 1.303845 |
|           DebertaForMaskedLM            |  8  | 1.295742 |
|       DebertaForQuestionAnswering       | 16  | 1.291283 |
|      MBartForConditionalGeneration      |  2  | 1.268555 |
|            AlbertForMaskedLM            |  4  | 1.254093 |
|       AlbertForQuestionAnswering        |  4  | 1.250943 |
| BlenderbotSmallForConditionalGeneration | 64  | 1.248036 |
|             OPTForCausalLM              |  2  | 1.242506 |
|          BlenderbotForCausalLM          |  4  | 1.232158 |
|         Speech2Text2ForCausalLM         | 256 | 1.220077 |
|          DistilBertForMaskedLM          | 128 | 1.214499 |
|          DebertaV2ForMaskedLM           |  2  | 1.201811 |
|       MT5ForConditionalGeneration       | 16  | 1.193153 |
|     DistilBertForQuestionAnswering      | 256 | 1.185379 |
|     M2M100ForConditionalGeneration      | 16  | 1.183868 |
|     PegasusForConditionalGeneration     | 32  | 1.162446 |
|       BlenderbotSmallForCausalLM        | 64  | 1.160222 |
|      BartForConditionalGeneration       |  2  | 1.158122 |
|           PegasusForCausalLM            | 32  | 1.141721 |
|            MBartForCausalLM             |  4  | 1.129371 |
|             BartForCausalLM             |  4  | 1.122791 |
|            TrOCRForCausalLM             | 32  | 1.083405 |
|            PLBartForCausalLM            |  8  | 1.076319 |
|       T5ForConditionalGeneration        |  4  | 1.018122 |
|                 T5Small                 |  4  | 1.017035 |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 125.817611 |
|          MobileBertForMaskedLM          | 128 | 119.219463 |
|     MobileBertForQuestionAnswering      | 128 | 118.637976 |
|      BartForConditionalGeneration       |  2  | 58.100185  |
|      MBartForConditionalGeneration      |  2  | 55.101787  |
|     PegasusForConditionalGeneration     | 32  | 54.746037  |
|     M2M100ForConditionalGeneration      | 16  | 54.292413  |
|          BlenderbotForCausalLM          |  4  | 50.491777  |
|             XGLMForCausalLM             |  8  | 48.845045  |
|          DebertaV2ForMaskedLM           |  2  | 48.331939  |
|      DebertaV2ForQuestionAnswering      |  1  | 47.949641  |
|         MegatronBertForCausalLM         |  4  |  44.59359  |
|    MegatronBertForQuestionAnswering     |  8  | 44.052528  |
|       MT5ForConditionalGeneration       | 16  | 43.844992  |
| BlenderbotSmallForConditionalGeneration | 64  |  41.91594  |
|            YituTechConvBert             | 16  | 37.039286  |
|     PLBartForConditionalGeneration      |  4  | 34.698311  |
|                 T5Small                 |  4  | 34.154757  |
|       T5ForConditionalGeneration        |  4  | 33.890077  |
|            TrOCRForCausalLM             | 32  | 30.566782  |
|             OPTForCausalLM              |  2  | 30.527884  |
|            MBartForCausalLM             |  4  | 30.086114  |
|           PegasusForCausalLM            | 32  | 29.966515  |
|           DebertaForMaskedLM            |  8  |  29.58716  |
|       DebertaForQuestionAnswering       | 16  | 29.094634  |
|            XLNetLMHeadModel             |  8  | 27.744766  |
|           ElectraForCausalLM            | 32  | 27.704235  |
|           RobertaForCausalLM            | 16  | 27.511519  |
|             BartForCausalLM             |  4  | 27.210616  |
|        BertForQuestionAnswering         | 16  | 27.182104  |
|       ElectraForQuestionAnswering       | 64  | 27.098907  |
|                CamemBert                | 16  |  27.03296  |
|           LayoutLMForMaskedLM           | 16  | 26.974241  |
|             BertForMaskedLM             | 16  | 26.894757  |
|       RobertaForQuestionAnswering       | 16  | 26.894348  |
|    LayoutLMForSequenceClassification    | 16  | 26.741169  |
|      GPT2ForSequenceClassification      |  4  | 25.208903  |
|            AlbertForMaskedLM            |  4  |  24.53593  |
|       AlbertForQuestionAnswering        |  4  | 24.459411  |
|       BlenderbotSmallForCausalLM        | 64  | 24.292329  |
|     DistilBertForQuestionAnswering      | 256 | 22.384304  |
|            PLBartForCausalLM            |  8  | 22.327962  |
|          DistilBertForMaskedLM          | 128 | 22.239153  |
|               GoogleFnet                | 16  | 22.196553  |
|         Speech2Text2ForCausalLM         | 256 | 21.805159  |
|               DistillGPT2               | 16  | 20.150223  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  | 0.99434  |
|       AlbertForQuestionAnswering        |  4  | 0.994337 |
|     DistilBertForQuestionAnswering      | 256 | 0.993225 |
|             OPTForCausalLM              |  2  | 0.993104 |
|           RobertaForCausalLM            | 16  | 0.992367 |
|            TrOCRForCausalLM             | 32  | 0.99226  |
|          DistilBertForMaskedLM          | 128 | 0.99184  |
|               DistillGPT2               | 16  | 0.991685 |
|               GoogleFnet                | 16  | 0.991236 |
|       ElectraForQuestionAnswering       | 64  | 0.990905 |
|           ElectraForCausalLM            | 32  | 0.990765 |
|                CamemBert                | 16  | 0.990567 |
|            PLBartForCausalLM            |  8  | 0.990556 |
|           LayoutLMForMaskedLM           | 16  | 0.990473 |
|             BertForMaskedLM             | 16  | 0.990277 |
|            MBartForCausalLM             |  4  | 0.990016 |
|            YituTechConvBert             | 16  | 0.988904 |
|    LayoutLMForSequenceClassification    | 16  | 0.988787 |
|       DebertaForQuestionAnswering       | 16  | 0.988677 |
|        BertForQuestionAnswering         | 16  | 0.988611 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.988515 |
|     PLBartForConditionalGeneration      |  4  | 0.98819  |
|       RobertaForQuestionAnswering       | 16  | 0.988177 |
|         Speech2Text2ForCausalLM         | 256 | 0.987929 |
|      GPT2ForSequenceClassification      |  4  | 0.987587 |
|           PegasusForCausalLM            | 32  | 0.987243 |
|       BlenderbotSmallForCausalLM        | 64  | 0.98625  |
|             BartForCausalLM             |  4  | 0.986164 |
|            XLNetLMHeadModel             |  8  | 0.985312 |
|           DebertaForMaskedLM            |  8  | 0.985106 |
|                 T5Small                 |  4  | 0.98418  |
|       T5ForConditionalGeneration        |  4  | 0.984073 |
|          MobileBertForMaskedLM          | 128 | 0.983679 |
|     MobileBertForQuestionAnswering      | 128 | 0.982454 |
|          AllenaiLongformerBase          |  4  | 0.978125 |
|     PegasusForConditionalGeneration     | 32  | 0.977682 |
|         MegatronBertForCausalLM         |  4  | 0.97646  |
|      BartForConditionalGeneration       |  2  | 0.968444 |
|    MegatronBertForQuestionAnswering     |  8  | 0.967961 |
|       MT5ForConditionalGeneration       | 16  | 0.962743 |
|          DebertaV2ForMaskedLM           |  2  | 0.915391 |
|      MBartForConditionalGeneration      |  2  | 0.906726 |
|             XGLMForCausalLM             |  8  | 0.874533 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.869173 |
|          BlenderbotForCausalLM          |  4  | 0.843754 |
|     M2M100ForConditionalGeneration      | 16  | 0.773898 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|            AlbertForMaskedLM            |  4  | 2442.018643 |
|       AlbertForQuestionAnswering        |  4  | 2439.124653 |
|            XLNetLMHeadModel             |  8  | 1242.33343  |
|            TrOCRForCausalLM             | 32  | 942.965861  |
|     PegasusForConditionalGeneration     | 32  | 901.263597  |
|     DistilBertForQuestionAnswering      | 256 | 816.322728  |
|    MegatronBertForQuestionAnswering     |  8  | 711.246549  |
|      MBartForConditionalGeneration      |  2  | 634.143202  |
|            MBartForCausalLM             |  4  | 629.872276  |
|          DistilBertForMaskedLM          | 128 | 610.525277  |
|           RobertaForCausalLM            | 16  | 587.915826  |
|             OPTForCausalLM              |  2  | 571.234785  |
|          BlenderbotForCausalLM          |  4  | 563.998723  |
|      BartForConditionalGeneration       |  2  |  557.63728  |
|     M2M100ForConditionalGeneration      | 16  | 555.091273  |
|          DebertaV2ForMaskedLM           |  2  | 550.586677  |
|            YituTechConvBert             | 16  | 547.187245  |
|                CamemBert                | 16  | 539.434973  |
|             BertForMaskedLM             | 16  | 533.683173  |
|           LayoutLMForMaskedLM           | 16  | 530.670657  |
|          AllenaiLongformerBase          |  4  | 506.447334  |
|       DebertaForQuestionAnswering       | 16  |  502.52318  |
|            PLBartForCausalLM            |  8  | 501.374718  |
|             BartForCausalLM             |  4  | 486.643814  |
| BlenderbotSmallForConditionalGeneration | 64  |  449.19386  |
|     PLBartForConditionalGeneration      |  4  | 445.156587  |
|           PegasusForCausalLM            | 32  | 444.793995  |
|        BertForQuestionAnswering         | 16  | 426.238036  |
|    LayoutLMForSequenceClassification    | 16  | 423.627502  |
|         MegatronBertForCausalLM         |  4  | 419.129634  |
|       RobertaForQuestionAnswering       | 16  | 405.235946  |
|       T5ForConditionalGeneration        |  4  | 385.900951  |
|                 T5Small                 |  4  | 385.601933  |
|               GoogleFnet                | 16  |  383.07536  |
|               DistillGPT2               | 16  | 374.805376  |
|          MobileBertForMaskedLM          | 128 | 364.810341  |
|           DebertaForMaskedLM            |  8  | 338.771934  |
|             XGLMForCausalLM             |  8  | 316.804915  |
|       ElectraForQuestionAnswering       | 64  | 299.040797  |
|       BlenderbotSmallForCausalLM        | 64  | 261.164634  |
|         Speech2Text2ForCausalLM         | 256 | 249.446242  |
|      GPT2ForSequenceClassification      |  4  |  242.61508  |
|      DebertaV2ForQuestionAnswering      |  1  | 225.611777  |
|       MT5ForConditionalGeneration       | 16  |  224.91267  |
|           ElectraForCausalLM            | 32  | 222.910134  |
|     MobileBertForQuestionAnswering      | 128 |  209.69438  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|            lcnet_050            | 256  | 3.992152 |
|           fbnetc_100            | 512  | 3.961521 |
|           mnasnet_100           | 512  | 3.906734 |
|         mobilenetv2_100         | 128  | 3.852341 |
|          spnasnet_100           | 128  | 3.656221 |
|      mobilenetv3_large_100      | 512  | 3.631715 |
|            fbnetv3_b            | 256  | 3.472914 |
|           regnety_002           | 1024 | 3.332656 |
|           rexnet_100            | 256  | 3.133117 |
|       tf_efficientnet_b0        | 128  | 2.995864 |
|            tinynet_a            | 128  | 2.966209 |
|          pnasnet5large          |  16  | 2.691171 |
|        ese_vovnet19b_dw         | 256  | 2.594175 |
|            hrnet_w18            | 128  | 2.584113 |
|          botnet26t_256          | 128  |  2.5615  |
|           res2next50            | 128  | 2.438386 |
|       eca_botnext26ts_256       | 128  | 2.397143 |
|          ghostnet_100           | 512  | 2.347295 |
|           resnest101e           |  64  | 2.286709 |
|       gluon_inception_v3        | 256  | 2.272004 |
|        eca_halonext26ts         | 128  | 2.242183 |
|          inception_v3           | 128  | 2.239709 |
|             dla102              | 128  | 2.228263 |
|        adv_inception_v3         | 128  | 2.203009 |
|        res2net50_14w_8s         | 128  | 2.158284 |
|        res2net101_26w_4s        | 128  | 2.143422 |
|          cspdarknet53           |  64  | 2.075814 |
|            repvgg_a2            | 128  | 2.073474 |
|            nfnet_l0             | 128  | 1.992921 |
|        convmixer_768_32         |  32  | 1.988395 |
|           tf_mixnet_l           | 128  | 1.930642 |
|            gernet_l             | 128  | 1.910491 |
|           dm_nfnet_f0           | 128  | 1.819081 |
|        sebotnet33ts_256         |  64  | 1.812501 |
|           selecsls42b           | 128  | 1.794309 |
|            mixnet_l             | 128  | 1.764658 |
|           volo_d1_224           |  64  | 1.745635 |
|           mobilevit_s           |  64  | 1.704213 |
|         poolformer_m36          |  64  | 1.672673 |
|         visformer_small         | 128  | 1.665604 |
|     swsl_resnext101_32x16d      |  32  | 1.608844 |
|           convit_base           |  64  | 1.604549 |
|             dpn107              |  64  | 1.511334 |
|            levit_128            | 1024 | 1.477317 |
|          gmlp_s16_224           | 128  | 1.445108 |
|      xcit_large_24_p8_224       |  16  | 1.362673 |
|          gmixer_24_224          | 128  | 1.360497 |
|  swin_base_patch4_window7_224   |  64  | 1.319634 |
|        twins_pcpvt_base         | 128  | 1.245331 |
|          mixer_b16_224          | 128  | 1.223202 |
|          convnext_base          |  64  | 1.212088 |
|        tnt_s_patch16_224        | 128  | 1.210526 |
|      beit_base_patch16_224      |  64  | 1.185957 |
|          cait_m36_384           |  4   | 1.169284 |
| deit_base_distilled_patch16_224 |  64  | 1.166842 |
|      vit_base_patch16_224       |  64  | 1.164687 |
|          jx_nest_base           |  32  | 1.122718 |
|            pit_b_224            |  64  | 1.121054 |
|         crossvit_9_240          | 256  | 1.080584 |
|          resmlp_12_224          | 128  | 0.765767 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|        adv_inception_v3         | 8  |    pass     |
|      beit_base_patch16_224      | 8  |    pass     |
|          botnet26t_256          | 8  |    pass     |
|          cait_m36_384           | 8  |    pass     |
|           convit_base           | 8  |    pass     |
|        convmixer_768_32         | 8  |    pass     |
|          convnext_base          | 8  |    pass     |
|         crossvit_9_240          | 8  |    pass     |
|          cspdarknet53           | 8  |    pass     |
| deit_base_distilled_patch16_224 | 8  |    pass     |
|             dla102              | 8  |    pass     |
|           dm_nfnet_f0           | 8  |    pass     |
|             dpn107              | 8  |    pass     |
|       eca_botnext26ts_256       | 8  |    pass     |
|        eca_halonext26ts         | 8  |    pass     |
|        ese_vovnet19b_dw         | 8  |    pass     |
|           fbnetc_100            | 8  |    pass     |
|            fbnetv3_b            | 8  |    pass     |
|            gernet_l             | 8  |    pass     |
|          ghostnet_100           | 8  |    pass     |
|       gluon_inception_v3        | 8  |    pass     |
|          gmixer_24_224          | 8  |    pass     |
|          gmlp_s16_224           | 8  |    pass     |
|            hrnet_w18            | 8  |    pass     |
|          inception_v3           | 8  |    pass     |
|          jx_nest_base           | 8  |    pass     |
|            lcnet_050            | 8  |    pass     |
|            levit_128            | 8  |    pass     |
|      xcit_large_24_p8_224       | 8  |    pass     |
|          mixer_b16_224          | 8  |    pass     |
|            mixnet_l             | 8  |    pass     |
|           mnasnet_100           | 8  |    pass     |
|         mobilenetv2_100         | 8  |    pass     |
|      mobilenetv3_large_100      | 8  |    pass     |
|           mobilevit_s           | 8  |    pass     |
|            nfnet_l0             | 8  |    pass     |
|            pit_b_224            | 8  |    pass     |
|          pnasnet5large          | 8  |    pass     |
|         poolformer_m36          | 8  |    pass     |
|           regnety_002           | 8  |    pass     |
|            repvgg_a2            | 8  |    pass     |
|        res2net101_26w_4s        | 8  |    pass     |
|        res2net50_14w_8s         | 8  |    pass     |
|           res2next50            | 8  |    pass     |
|          resmlp_12_224          | 8  |    pass     |
|           resnest101e           | 8  |    pass     |
|           rexnet_100            | 8  |    pass     |
|        sebotnet33ts_256         | 8  |    pass     |
|           selecsls42b           | 8  |    pass     |
|          spnasnet_100           | 8  |    pass     |
|  swin_base_patch4_window7_224   | 8  |    pass     |
|     swsl_resnext101_32x16d      | 8  |    pass     |
|       tf_efficientnet_b0        | 8  |    pass     |
|           tf_mixnet_l           | 8  |    pass     |
|            tinynet_a            | 8  |    pass     |
|        tnt_s_patch16_224        | 8  |    pass     |
|        twins_pcpvt_base         | 8  |    pass     |
|         visformer_small         | 8  |    pass     |
|      vit_base_patch16_224       | 8  |    pass     |
|           volo_d1_224           | 8  |    pass     |
|         coat_lite_mini          | 8  | fail_to_run |
+---------------------------------+----+-------------+

Compilation latency (sec)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          pnasnet5large          |  16  | 482.914263 |
|            hrnet_w18            | 128  | 259.428941 |
|        res2net101_26w_4s        | 128  | 83.563641  |
|          cait_m36_384           |  4   |  75.85017  |
|           tf_mixnet_l           | 128  | 72.896552  |
|      xcit_large_24_p8_224       |  16  | 70.610522  |
|            mixnet_l             | 128  | 69.205466  |
|        res2net50_14w_8s         | 128  | 65.898872  |
|           mobilevit_s           |  64  | 65.541451  |
|        twins_pcpvt_base         | 128  | 64.747606  |
|           resnest101e           |  64  | 62.606215  |
|  swin_base_patch4_window7_224   |  64  | 61.441706  |
|         poolformer_m36          |  64  | 60.294773  |
|             dpn107              |  64  | 56.419725  |
|        tnt_s_patch16_224        | 128  | 53.768503  |
|          jx_nest_base           |  32  | 49.769796  |
|        eca_halonext26ts         | 128  | 45.741552  |
|            fbnetv3_b            | 256  | 42.626231  |
|           volo_d1_224           |  64  |  40.83046  |
|          convnext_base          |  64  | 40.596976  |
|          gmixer_24_224          | 128  | 38.212987  |
|         crossvit_9_240          | 256  | 37.860233  |
|            levit_128            | 1024 | 37.823446  |
|          gmlp_s16_224           | 128  | 37.520994  |
|        sebotnet33ts_256         |  64  | 37.322953  |
|          inception_v3           | 128  | 37.210891  |
|        adv_inception_v3         | 128  |  37.15039  |
|       gluon_inception_v3        | 256  | 35.799628  |
|             dla102              | 128  | 35.124831  |
|           res2next50            | 128  |  34.99345  |
|          ghostnet_100           | 512  | 34.701717  |
|            tinynet_a            | 128  | 32.800127  |
|           rexnet_100            | 256  |  32.53898  |
|           dm_nfnet_f0           | 128  | 31.988566  |
|     swsl_resnext101_32x16d      |  32  | 31.834431  |
|           convit_base           |  64  | 31.195704  |
|       eca_botnext26ts_256       | 128  | 31.078399  |
|        convmixer_768_32         |  32  | 30.460214  |
|       tf_efficientnet_b0        | 128  |  30.32686  |
|         visformer_small         | 128  | 29.808142  |
|            nfnet_l0             | 128  | 29.722318  |
|          botnet26t_256          | 128  | 29.189036  |
|            pit_b_224            |  64  | 26.321363  |
|          mixer_b16_224          | 128  | 25.488315  |
|      beit_base_patch16_224      |  64  | 25.463312  |
|          cspdarknet53           |  64  | 25.090425  |
|           regnety_002           | 1024 |  24.74756  |
| deit_base_distilled_patch16_224 |  64  | 24.125079  |
|      vit_base_patch16_224       |  64  | 24.083399  |
|      mobilenetv3_large_100      | 512  | 23.317643  |
|          spnasnet_100           | 128  | 22.999711  |
|          resmlp_12_224          | 128  | 22.393006  |
|           fbnetc_100            | 512  | 21.655947  |
|            gernet_l             | 128  | 21.523422  |
|         mobilenetv2_100         | 128  | 21.472244  |
|            repvgg_a2            | 128  | 21.312824  |
|        ese_vovnet19b_dw         | 256  | 21.293048  |
|           mnasnet_100           | 512  | 19.746506  |
|           selecsls42b           | 128  | 19.197884  |
|            lcnet_050            | 256  | 18.401149  |
+---------------------------------+------+------------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.997409 |
|           fbnetc_100            | 512  | 0.996545 |
|      mobilenetv3_large_100      | 512  | 0.996397 |
|            fbnetv3_b            | 256  | 0.995985 |
|           mnasnet_100           | 512  | 0.99555  |
|          ghostnet_100           | 512  | 0.995233 |
|           dm_nfnet_f0           | 128  | 0.995148 |
|           regnety_002           | 1024 | 0.995071 |
|       eca_botnext26ts_256       | 128  | 0.994361 |
|          convnext_base          |  64  | 0.994018 |
|            levit_128            | 1024 | 0.993913 |
|        eca_halonext26ts         | 128  | 0.993481 |
|             dpn107              |  64  | 0.99333  |
|           rexnet_100            | 256  | 0.993305 |
|        res2net101_26w_4s        | 128  | 0.993034 |
|           res2next50            | 128  | 0.992999 |
|       tf_efficientnet_b0        | 128  | 0.992751 |
|          botnet26t_256          | 128  | 0.992699 |
|           tf_mixnet_l           | 128  | 0.992613 |
|        convmixer_768_32         |  32  | 0.992408 |
|          cspdarknet53           |  64  | 0.991876 |
|          gmlp_s16_224           | 128  | 0.991862 |
|         visformer_small         | 128  | 0.991805 |
|       gluon_inception_v3        | 256  | 0.991611 |
|            gernet_l             | 128  | 0.991601 |
|            nfnet_l0             | 128  | 0.991423 |
|            mixnet_l             | 128  | 0.991264 |
|        sebotnet33ts_256         |  64  | 0.991263 |
|          gmixer_24_224          | 128  | 0.99112  |
|          mixer_b16_224          | 128  | 0.990835 |
|      xcit_large_24_p8_224       |  16  | 0.990598 |
|           mobilevit_s           |  64  | 0.990547 |
|         mobilenetv2_100         | 128  | 0.990211 |
|        res2net50_14w_8s         | 128  | 0.990034 |
|           selecsls42b           | 128  | 0.989497 |
|             dla102              | 128  | 0.989417 |
|        twins_pcpvt_base         | 128  | 0.989394 |
|          inception_v3           | 128  | 0.988918 |
|        adv_inception_v3         | 128  | 0.988829 |
|           convit_base           |  64  | 0.988734 |
|  swin_base_patch4_window7_224   |  64  | 0.988525 |
|            hrnet_w18            | 128  | 0.987606 |
|         poolformer_m36          |  64  | 0.987254 |
|            tinynet_a            | 128  | 0.987184 |
|        tnt_s_patch16_224        | 128  | 0.987162 |
|          spnasnet_100           | 128  | 0.986551 |
|          resmlp_12_224          | 128  | 0.986453 |
|      beit_base_patch16_224      |  64  | 0.986129 |
|           resnest101e           |  64  | 0.984585 |
|            lcnet_050            | 256  | 0.984438 |
| deit_base_distilled_patch16_224 |  64  | 0.983869 |
|          pnasnet5large          |  16  | 0.983519 |
|      vit_base_patch16_224       |  64  | 0.983398 |
|            pit_b_224            |  64  | 0.982206 |
|            repvgg_a2            | 128  | 0.981253 |
|          jx_nest_base           |  32  | 0.980289 |
|     swsl_resnext101_32x16d      |  32  | 0.979898 |
|           volo_d1_224           |  64  | 0.979241 |
|          cait_m36_384           |  4   | 0.978533 |
|         crossvit_9_240          | 256  | 0.966785 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+-------------+
|              name               |  bs  |  inductor   |
+---------------------------------+------+-------------+
|      xcit_large_24_p8_224       |  16  | 1280.500562 |
|          cait_m36_384           |  4   | 1090.21878  |
|          convnext_base          |  64  | 1017.336289 |
|           dm_nfnet_f0           | 128  | 967.704363  |
|             dpn107              |  64  | 913.696802  |
|          mixer_b16_224          | 128  | 911.577669  |
|       gluon_inception_v3        | 256  | 813.265557  |
|        tnt_s_patch16_224        | 128  | 746.776589  |
|        twins_pcpvt_base         | 128  | 718.950663  |
|  swin_base_patch4_window7_224   |  64  | 708.343836  |
|           convit_base           |  64  | 686.153074  |
|        res2net101_26w_4s        | 128  | 637.582428  |
|     swsl_resnext101_32x16d      |  32  |  626.99029  |
| deit_base_distilled_patch16_224 |  64  | 612.317197  |
|      vit_base_patch16_224       |  64  | 611.071675  |
|      beit_base_patch16_224      |  64  | 606.844052  |
|            nfnet_l0             | 128  | 606.107864  |
|            levit_128            | 1024 | 559.649993  |
|        ese_vovnet19b_dw         | 256  | 553.168784  |
|             dla102              | 128  | 521.502561  |
|            pit_b_224            |  64  | 511.166189  |
|          gmlp_s16_224           | 128  | 494.356148  |
|          jx_nest_base           |  32  | 484.191616  |
|          gmixer_24_224          | 128  | 477.160582  |
|         crossvit_9_240          | 256  |  473.72404  |
|           resnest101e           |  64  | 470.884321  |
|         poolformer_m36          |  64  | 468.678542  |
|        convmixer_768_32         |  32  |  432.7952   |
|            hrnet_w18            | 128  | 428.240687  |
|          resmlp_12_224          | 128  | 414.116862  |
|          inception_v3           | 128  | 406.523904  |
|        adv_inception_v3         | 128  | 405.813891  |
|           volo_d1_224           |  64  | 405.162839  |
|        res2net50_14w_8s         | 128  | 393.502176  |
|         visformer_small         | 128  | 388.688571  |
|          ghostnet_100           | 512  | 356.749351  |
|           res2next50            | 128  | 353.921829  |
|            mixnet_l             | 128  |  348.8749   |
|           tf_mixnet_l           | 128  | 338.314149  |
|            repvgg_a2            | 128  | 332.813378  |
|          pnasnet5large          |  16  | 329.444528  |
|        eca_halonext26ts         | 128  | 310.568198  |
|           fbnetc_100            | 512  | 303.501781  |
|       eca_botnext26ts_256       | 128  | 291.160752  |
|            gernet_l             | 128  |  290.7443   |
|        sebotnet33ts_256         |  64  | 281.193975  |
|           regnety_002           | 1024 | 277.412763  |
|          botnet26t_256          | 128  | 277.190833  |
|          cspdarknet53           |  64  | 257.993009  |
|           mnasnet_100           | 512  | 254.153716  |
|            fbnetv3_b            | 256  | 240.970908  |
|           selecsls42b           | 128  | 229.705885  |
|      mobilenetv3_large_100      | 512  | 229.254657  |
|           rexnet_100            | 256  | 223.013694  |
|           mobilevit_s           |  64  | 221.518055  |
|       tf_efficientnet_b0        | 128  | 117.389418  |
|            tinynet_a            | 128  |  79.763627  |
|         mobilenetv2_100         | 128  |  70.955879  |
|          spnasnet_100           | 128  |  64.072756  |
|            lcnet_050            | 256  |  25.99107   |
+---------------------------------+------+-------------+

@zxd1997066
Copy link
Contributor

[cppwrapper_dynamic_shape] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-05-05 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 6d30803d64953955df63da56833bf4eb52249aae
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+06ad737
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 96%, 76/79 | 100%, 46/46 | 100%, 60/60 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.60x    |    1.20x    |    1.57x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   52.80    |    37.31    |    47.48    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.85x    |    0.81x    |    0.82x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 54.152881 |
|     pyhpc_equation_of_state     |    1    | 28.358323 |
|         basic_gnn_sage          |    1    | 3.568579  |
|          squeezenet1_1          |    1    | 3.550346  |
|     functorch_maml_omniglot     |    1    | 3.509268  |
|          basic_gnn_gin          |    1    | 3.482452  |
|          basic_gnn_gcn          |    1    | 3.469505  |
|          maml_omniglot          |    5    | 2.875488  |
|           timm_nfnet            |    1    | 2.797466  |
|         opacus_cifar10          |    1    |  2.48377  |
|       shufflenet_v2_x1_0        |    1    | 2.276188  |
|      functorch_dp_cifar10       |    1    | 2.269387  |
|              dcgan              |    1    |  2.25528  |
|            resnet18             |    1    | 2.240995  |
|          lennard_jones          |    1    | 2.199674  |
|          mobilenet_v2           |    1    | 2.030565  |
|          timm_resnest           |    1    | 2.001549  |
|       mobilenet_v3_large        |    1    | 1.900063  |
|           mnasnet1_0            |    1    | 1.892214  |
|         phlippe_resnet          |    1    | 1.838576  |
|        phlippe_densenet         |    1    | 1.784478  |
|           densenet121           |    1    | 1.783211  |
|            resnet50             |    1    | 1.768394  |
|        timm_efficientnet        |    1    | 1.726559  |
|            resnet152            |    1    | 1.685504  |
|         LearningToPaint         |    1    | 1.630682  |
|           timm_vovnet           |    1    | 1.609233  |
|              llama              |    1    | 1.563347  |
|      doctr_reco_predictor       |    1    | 1.501498  |
|           timm_regnet           |    1    | 1.494766  |
|         resnext50_32x4d         |    1    | 1.488313  |
|              dlrm               |    1    | 1.478306  |
|              vgg16              |    1    | 1.445101  |
|        basic_gnn_edgecnn        |    1    | 1.401634  |
|             yolov3              |    1    | 1.383227  |
|             alexnet             |    1    | 1.364212  |
|          BERT_pytorch           |    1    | 1.314897  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.299029  |
|               drq               |    1    | 1.288407  |
|            hf_Albert            |    1    | 1.282544  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.282222  |
|              maml               |    1    | 1.256088  |
|             hf_GPT2             |    1    |  1.24938  |
|          hf_GPT2_large          |    1    | 1.237826  |
|       doctr_det_predictor       |    1    | 1.230963  |
|            moondream            |    1    | 1.224744  |
|     timm_vision_transformer     |    1    | 1.221037  |
|          fastNLP_Bert           |    1    | 1.218643  |
|         pytorch_stargan         |   16    | 1.191456  |
|          hf_Bert_large          |    1    | 1.168702  |
|  timm_vision_transformer_large  |    1    | 1.166145  |
|           hf_BigBird            |    1    | 1.164811  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.164736  |
|             hf_Bert             |    1    | 1.159802  |
|          hf_DistilBert          |    1    | 1.132764  |
|      torch_multimodal_clip      |    1    | 1.118219  |
|             hf_Bart             |    1    |  1.1045   |
|       speech_transformer        |    1    | 1.072187  |
|        hf_distil_whisper        |    1    | 1.071803  |
|          pytorch_unet           |    1    | 1.067684  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  1.05232  |
|        soft_actor_critic        |   256   | 1.046541  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.037954  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.037595  |
|          hf_Longformer          |    1    | 1.009305  |
|           tts_angular           |    1    | 1.001389  |
|             demucs              |    1    | 1.000849  |
|     resnet50_quantized_qat      |    1    | 0.995835  |
|   mobilenet_v2_quantized_qat    |    1    | 0.987637  |
|     nvidia_deeprecommender      |    1    | 0.979771  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.911154  |
|           hf_Reformer           |    1    | 0.862899  |
|       Background_Matting        |    1    | 0.832265  |
|           hf_T5_large           |    1    | 0.792161  |
|              hf_T5              |    1    | 0.708262  |
|           hf_T5_base            |    1    | 0.582104  |
|              moco               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|             yolov3              |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|           densenet121           |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|             demucs              |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|              llama              |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|            moondream            |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_fpn |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    1    |    fail_to_run     |
|           Super_SloMo           |    1    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 479.111762 |
|    detectron2_fcos_r_50_fpn     |    1    | 414.358176 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 226.982439 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 221.007777 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 209.947978 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 205.453302 |
|              maml               |    1    | 144.666165 |
|           hf_T5_large           |    1    | 111.507021 |
|           hf_T5_base            |    1    | 104.985594 |
|       speech_transformer        |    1    | 90.590451  |
|          hf_Longformer          |    1    | 85.720888  |
|           hf_Reformer           |    1    | 81.281423  |
|          basic_gnn_gcn          |    1    | 61.893735  |
|           densenet121           |    1    | 57.057945  |
|          fastNLP_Bert           |    1    | 54.505751  |
|            resnet152            |    1    | 54.012667  |
|  timm_vision_transformer_large  |    1    | 49.736225  |
|       doctr_det_predictor       |    1    | 45.708054  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 43.037711  |
|          hf_Bert_large          |    1    | 42.644111  |
|          hf_GPT2_large          |    1    | 42.590257  |
|            moondream            |    1    | 41.686255  |
|        hf_distil_whisper        |    1    | 41.005623  |
|      torch_multimodal_clip      |    1    | 36.544306  |
|             demucs              |    1    | 34.948562  |
|           timm_regnet           |    1    | 33.450713  |
|           timm_nfnet            |    1    | 32.423221  |
|              hf_T5              |    1    | 31.645823  |
|             hf_Bart             |    1    | 30.558177  |
|             yolov3              |    1    | 29.282564  |
|       Background_Matting        |    1    | 28.794768  |
|        timm_efficientnet        |    1    | 28.130044  |
|          BERT_pytorch           |    1    | 28.037532  |
|             hf_Bert             |    1    | 26.455597  |
|        phlippe_densenet         |    1    |  25.96971  |
|      doctr_reco_predictor       |    1    | 25.885728  |
|       shufflenet_v2_x1_0        |    1    | 25.326062  |
|            hf_Albert            |    1    | 24.651558  |
|             hf_GPT2             |    1    | 24.036325  |
|         pytorch_stargan         |   16    | 23.901829  |
|       mobilenet_v3_large        |    1    | 23.444721  |
|              llama              |    1    | 23.170273  |
|     timm_vision_transformer     |    1    | 23.150548  |
|         opacus_cifar10          |    1    | 21.672556  |
|         resnext50_32x4d         |    1    | 21.512331  |
|            resnet50             |    1    | 21.505514  |
|           timm_vovnet           |    1    | 21.309464  |
|          timm_resnest           |    1    | 20.975571  |
|          mobilenet_v2           |    1    | 20.875545  |
|           mnasnet1_0            |    1    | 20.577153  |
|          hf_DistilBert          |    1    | 20.510363  |
|     pyhpc_isoneutral_mixing     |    1    | 20.393653  |
|      functorch_dp_cifar10       |    1    | 20.284854  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 18.634412  |
|          pytorch_unet           |    1    | 18.618063  |
|          squeezenet1_1          |    1    | 17.760773  |
|            resnet18             |    1    | 17.159361  |
|     pyhpc_equation_of_state     |    1    | 17.021719  |
|         LearningToPaint         |    1    |  17.00515  |
|              vgg16              |    1    | 16.910327  |
|         phlippe_resnet          |    1    | 16.734465  |
|             alexnet             |    1    | 16.519655  |
|               drq               |    1    | 15.802866  |
|     functorch_maml_omniglot     |    1    | 15.635329  |
|          maml_omniglot          |    5    | 15.546051  |
|     nvidia_deeprecommender      |    1    | 15.386346  |
|              dlrm               |    1    | 15.003187  |
|              dcgan              |    1    | 14.751822  |
|        basic_gnn_edgecnn        |    1    | 14.435663  |
|          basic_gnn_gin          |    1    | 14.418653  |
|        soft_actor_critic        |   256   | 14.385653  |
|         basic_gnn_sage          |    1    | 14.378611  |
|          lennard_jones          |    1    | 13.932152  |
|           tts_angular           |    1    | 13.741843  |
|   mobilenet_v2_quantized_qat    |    1    |  0.097578  |
|     resnet50_quantized_qat      |    1    |  0.069212  |
|        timm_efficientdet        |    0    |    0.0     |
|              moco               |    0    |    0.0     |
|         DALLE2_pytorch          |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.988468 |
|           hf_T5_base            |    1    | 0.987361 |
|             demucs              |    1    | 0.985061 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981998 |
|       Background_Matting        |    1    | 0.981845 |
|          pytorch_unet           |    1    | 0.978375 |
|          hf_GPT2_large          |    1    | 0.974146 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.972054 |
|        basic_gnn_edgecnn        |    1    | 0.971277 |
|       doctr_det_predictor       |    1    | 0.969501 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.961872 |
|     resnet50_quantized_qat      |    1    | 0.956506 |
|         LearningToPaint         |    1    | 0.946448 |
|           hf_BigBird            |    1    | 0.945173 |
|         pytorch_stargan         |   16    | 0.943103 |
|      doctr_reco_predictor       |    1    | 0.942584 |
|         basic_gnn_sage          |    1    | 0.938976 |
|          basic_gnn_gin          |    1    | 0.938751 |
|          basic_gnn_gcn          |    1    | 0.937858 |
|   mobilenet_v2_quantized_qat    |    1    | 0.928134 |
|              llama              |    1    | 0.91914  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.915949 |
|      torch_multimodal_clip      |    1    | 0.901116 |
|        hf_distil_whisper        |    1    | 0.892756 |
|           tts_angular           |    1    | 0.888686 |
|        soft_actor_critic        |   256   | 0.888577 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.886704 |
|         opacus_cifar10          |    1    | 0.884907 |
|        timm_efficientnet        |    1    | 0.877675 |
|          mobilenet_v2           |    1    | 0.870707 |
|          lennard_jones          |    1    | 0.861678 |
|          maml_omniglot          |    5    | 0.86049  |
|          squeezenet1_1          |    1    | 0.859738 |
|           mnasnet1_0            |    1    | 0.85929  |
|     functorch_maml_omniglot     |    1    | 0.857447 |
|          timm_resnest           |    1    | 0.855396 |
|              dcgan              |    1    | 0.853116 |
|          fastNLP_Bert           |    1    | 0.849719 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.849442 |
|       mobilenet_v3_large        |    1    | 0.848533 |
|       shufflenet_v2_x1_0        |    1    | 0.841494 |
|         phlippe_resnet          |    1    | 0.838955 |
|     pyhpc_equation_of_state     |    1    | 0.833418 |
|       speech_transformer        |    1    | 0.827933 |
|        phlippe_densenet         |    1    | 0.820843 |
|           timm_nfnet            |    1    | 0.816711 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.813519 |
|         resnext50_32x4d         |    1    | 0.810743 |
|     pyhpc_isoneutral_mixing     |    1    | 0.810086 |
|          hf_Bert_large          |    1    | 0.808407 |
|          hf_Longformer          |    1    | 0.805304 |
|            hf_Albert            |    1    | 0.802212 |
|             hf_Bert             |    1    | 0.800614 |
|     timm_vision_transformer     |    1    | 0.800067 |
|              maml               |    1    | 0.79873  |
|            moondream            |    1    | 0.796752 |
|           hf_T5_large           |    1    | 0.795196 |
|             yolov3              |    1    | 0.780874 |
|          BERT_pytorch           |    1    | 0.779616 |
|          hf_DistilBert          |    1    | 0.777897 |
|            resnet50             |    1    | 0.76845  |
|             hf_GPT2             |    1    | 0.764224 |
|            resnet18             |    1    | 0.763006 |
|           timm_regnet           |    1    | 0.761251 |
|               drq               |    1    | 0.760197 |
|           timm_vovnet           |    1    | 0.759241 |
|           densenet121           |    1    | 0.755541 |
|             hf_Bart             |    1    | 0.751171 |
|              hf_T5              |    1    | 0.749989 |
|      functorch_dp_cifar10       |    1    | 0.744648 |
|             alexnet             |    1    | 0.740694 |
|  timm_vision_transformer_large  |    1    | 0.733022 |
|           hf_Reformer           |    1    | 0.728207 |
|              vgg16              |    1    | 0.722844 |
|            resnet152            |    1    | 0.694859 |
|     nvidia_deeprecommender      |    1    | 0.672404 |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26443.359877 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11781.537546 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11213.16504  |
|          hf_GPT2_large          |    1    | 10065.130163 |
|           hf_T5_large           |    1    | 7445.046096  |
|            moondream            |    1    | 7308.103612  |
|        hf_distil_whisper        |    1    | 6928.542848  |
|       Background_Matting        |    1    | 6697.017425  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5795.104131  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 5034.411878  |
|          pytorch_unet           |    1    | 4735.303417  |
|  timm_vision_transformer_large  |    1    | 2792.873514  |
|    detectron2_fcos_r_50_fpn     |    1    | 2536.405503  |
|             demucs              |    1    | 2253.434085  |
|         pytorch_stargan         |   16    | 2012.873809  |
|       doctr_det_predictor       |    1    | 1806.160829  |
|          hf_Bert_large          |    1    | 1762.373364  |
|           hf_BigBird            |    1    | 1462.857948  |
|      torch_multimodal_clip      |    1    | 1261.660875  |
|          hf_Longformer          |    1    | 1109.531781  |
|             hf_Bart             |    1    |  881.320526  |
|              hf_T5              |    1    |  766.784748  |
|             hf_Bert             |    1    |  675.974079  |
|       speech_transformer        |    1    |  670.114197  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  623.670579  |
|            hf_Albert            |    1    |  570.761252  |
|          fastNLP_Bert           |    1    |  521.416683  |
|             yolov3              |    1    |  427.919828  |
|          hf_DistilBert          |    1    |  413.801626  |
|           hf_Reformer           |    1    |  410.523355  |
|             hf_GPT2             |    1    |  354.840889  |
|        basic_gnn_edgecnn        |    1    |  230.567183  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  210.426832  |
|              vgg16              |    1    |  189.712442  |
|           timm_regnet           |    1    |  148.56274   |
|          BERT_pytorch           |    1    |  139.306089  |
|            resnet152            |    1    |  135.414456  |
|           timm_nfnet            |    1    |   95.46815   |
|           timm_vovnet           |    1    |  78.994735   |
|              maml               |    1    |  74.402654   |
|     nvidia_deeprecommender      |    1    |  58.137254   |
|     timm_vision_transformer     |    1    |  57.527679   |
|         resnext50_32x4d         |    1    |  56.796705   |
|           tts_angular           |    1    |  54.067649   |
|            resnet50             |    1    |  51.006183   |
|           densenet121           |    1    |  41.935307   |
|          timm_resnest           |    1    |  32.839363   |
|          basic_gnn_gcn          |    1    |  29.473507   |
|      doctr_reco_predictor       |    1    |  22.992539   |
|             alexnet             |    1    |  22.053455   |
|            resnet18             |    1    |  21.694691   |
|              llama              |    1    |  20.432108   |
|     resnet50_quantized_qat      |    1    |  18.230638   |
|          basic_gnn_gin          |    1    |  16.657954   |
|         basic_gnn_sage          |    1    |  16.429606   |
|        timm_efficientnet        |    1    |  12.444064   |
|         LearningToPaint         |    1    |   9.508717   |
|           mnasnet1_0            |    1    |   7.345681   |
|          mobilenet_v2           |    1    |   7.051054   |
|   mobilenet_v2_quantized_qat    |    1    |   6.912146   |
|       mobilenet_v3_large        |    1    |   6.642344   |
|          squeezenet1_1          |    1    |   5.579288   |
|       shufflenet_v2_x1_0        |    1    |   5.052232   |
|        soft_actor_critic        |   256   |   3.385762   |
|        phlippe_densenet         |    1    |   2.769345   |
|      functorch_dp_cifar10       |    1    |   2.23551    |
|         opacus_cifar10          |    1    |   2.149125   |
|               drq               |    1    |   1.81096    |
|              dcgan              |    1    |   1.640946   |
|         phlippe_resnet          |    1    |   1.234096   |
|     functorch_maml_omniglot     |    1    |   0.819191   |
|          maml_omniglot          |    5    |   0.762716   |
|              dlrm               |    1    |   0.542735   |
|     pyhpc_isoneutral_mixing     |    1    |   0.049823   |
|     pyhpc_equation_of_state     |    1    |   0.036662   |
|          lennard_jones          |    1    |   0.031592   |
|              moco               |    0    |     0.0      |
|         DALLE2_pytorch          |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 2.123172 |
|     MobileBertForQuestionAnswering      | 1  | 1.749041 |
|            XLNetLMHeadModel             | 1  | 1.380651 |
|         Speech2Text2ForCausalLM         | 1  | 1.359022 |
|      GPT2ForSequenceClassification      | 1  | 1.323436 |
|          DistilBertForMaskedLM          | 1  | 1.306815 |
|            YituTechConvBert             | 1  | 1.304043 |
|     DistilBertForQuestionAnswering      | 1  | 1.300681 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.300341 |
|       BlenderbotSmallForCausalLM        | 1  | 1.28959  |
|       DebertaForQuestionAnswering       | 1  | 1.25833  |
|          BlenderbotForCausalLM          | 1  | 1.250029 |
|       MT5ForConditionalGeneration       | 1  | 1.248626 |
|     M2M100ForConditionalGeneration      | 1  | 1.248136 |
|     PegasusForConditionalGeneration     | 1  | 1.245851 |
|           PegasusForCausalLM            | 1  | 1.240451 |
|             XGLMForCausalLM             | 1  | 1.238578 |
|           DebertaForMaskedLM            | 1  | 1.233791 |
|               GoogleFnet                | 1  | 1.217149 |
|       AlbertForQuestionAnswering        | 1  | 1.206551 |
|            AlbertForMaskedLM            | 1  | 1.202896 |
|               DistillGPT2               | 1  | 1.190025 |
|           ElectraForCausalLM            | 1  | 1.186257 |
|    LayoutLMForSequenceClassification    | 1  | 1.174123 |
|    MegatronBertForQuestionAnswering     | 1  | 1.173141 |
|        BertForQuestionAnswering         | 1  | 1.172612 |
|                CamemBert                | 1  | 1.169615 |
|           LayoutLMForMaskedLM           | 1  | 1.162941 |
|         MegatronBertForCausalLM         | 1  | 1.160864 |
|       ElectraForQuestionAnswering       | 1  | 1.159487 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.15446  |
|            TrOCRForCausalLM             | 1  | 1.152973 |
|       RobertaForQuestionAnswering       | 1  | 1.152784 |
|          DebertaV2ForMaskedLM           | 1  | 1.14984  |
|           RobertaForCausalLM            | 1  | 1.149032 |
|             BertForMaskedLM             | 1  | 1.14758  |
|     PLBartForConditionalGeneration      | 1  | 1.088605 |
|      MBartForConditionalGeneration      | 1  | 1.067824 |
|             BartForCausalLM             | 1  | 1.060919 |
|      BartForConditionalGeneration       | 1  | 1.052701 |
|             OPTForCausalLM              | 1  | 1.032088 |
|            PLBartForCausalLM            | 1  | 1.031437 |
|            MBartForCausalLM             | 1  | 1.013712 |
|          AllenaiLongformerBase          | 1  | 0.973072 |
|       T5ForConditionalGeneration        | 1  | 0.622614 |
|                 T5Small                 | 1  | 0.622614 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+------------+
|                  name                   | bs |  inductor  |
+-----------------------------------------+----+------------+
|          MobileBertForMaskedLM          | 1  | 122.013058 |
|     MobileBertForQuestionAnswering      | 1  | 120.22063  |
|          AllenaiLongformerBase          | 1  | 93.045212  |
|      BartForConditionalGeneration       | 1  | 56.624499  |
|     PegasusForConditionalGeneration     | 1  | 49.836028  |
|     M2M100ForConditionalGeneration      | 1  | 49.697129  |
|      MBartForConditionalGeneration      | 1  | 49.566242  |
|          BlenderbotForCausalLM          | 1  | 47.935661  |
|             XGLMForCausalLM             | 1  | 46.526637  |
|            XLNetLMHeadModel             | 1  | 46.105514  |
|      DebertaV2ForQuestionAnswering      | 1  | 43.925166  |
|          DebertaV2ForMaskedLM           | 1  | 43.904783  |
|         MegatronBertForCausalLM         | 1  |  43.56606  |
|    MegatronBertForQuestionAnswering     | 1  |  43.38355  |
|       MT5ForConditionalGeneration       | 1  | 42.152152  |
| BlenderbotSmallForConditionalGeneration | 1  | 38.414145  |
|       T5ForConditionalGeneration        | 1  | 36.051411  |
|                 T5Small                 | 1  |  36.01712  |
|            YituTechConvBert             | 1  | 35.670866  |
|     PLBartForConditionalGeneration      | 1  | 31.928916  |
|            MBartForCausalLM             | 1  |  28.8723   |
|            TrOCRForCausalLM             | 1  | 28.604339  |
|           PegasusForCausalLM            | 1  | 28.172221  |
|             OPTForCausalLM              | 1  | 28.059431  |
|           ElectraForCausalLM            | 1  | 27.050736  |
|           RobertaForCausalLM            | 1  | 26.790777  |
|                CamemBert                | 1  | 26.763377  |
|       RobertaForQuestionAnswering       | 1  | 26.747982  |
|       ElectraForQuestionAnswering       | 1  | 26.742519  |
|           LayoutLMForMaskedLM           | 1  | 26.732527  |
|           DebertaForMaskedLM            | 1  |  26.6549   |
|             BertForMaskedLM             | 1  | 26.650439  |
|        BertForQuestionAnswering         | 1  | 26.564636  |
|       DebertaForQuestionAnswering       | 1  | 26.523213  |
|    LayoutLMForSequenceClassification    | 1  | 26.340017  |
|             BartForCausalLM             | 1  | 26.100948  |
|       BlenderbotSmallForCausalLM        | 1  | 22.996484  |
|      GPT2ForSequenceClassification      | 1  |  22.98932  |
|               GoogleFnet                | 1  | 21.387347  |
|            PLBartForCausalLM            | 1  | 21.288883  |
|          DistilBertForMaskedLM          | 1  | 21.083719  |
|     DistilBertForQuestionAnswering      | 1  | 20.972229  |
|         Speech2Text2ForCausalLM         | 1  | 20.783922  |
|               DistillGPT2               | 1  | 19.195604  |
|            AlbertForMaskedLM            | 1  | 17.955119  |
|       AlbertForQuestionAnswering        | 1  | 17.573702  |
+-----------------------------------------+----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.986771 |
|      MBartForConditionalGeneration      | 1  | 0.973011 |
|      GPT2ForSequenceClassification      | 1  | 0.951628 |
|          AllenaiLongformerBase          | 1  | 0.947238 |
|            MBartForCausalLM             | 1  | 0.923628 |
|            XLNetLMHeadModel             | 1  | 0.907074 |
|       T5ForConditionalGeneration        | 1  | 0.906626 |
|                 T5Small                 | 1  | 0.906128 |
|     PLBartForConditionalGeneration      | 1  | 0.905688 |
|            PLBartForCausalLM            | 1  | 0.90334  |
|       DebertaForQuestionAnswering       | 1  | 0.872461 |
|               GoogleFnet                | 1  | 0.854513 |
|    LayoutLMForSequenceClassification    | 1  | 0.847521 |
|       RobertaForQuestionAnswering       | 1  | 0.847244 |
|        BertForQuestionAnswering         | 1  | 0.84156  |
|       ElectraForQuestionAnswering       | 1  | 0.841382 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.835254 |
|    MegatronBertForQuestionAnswering     | 1  | 0.832307 |
|               DistillGPT2               | 1  | 0.827833 |
|           DebertaForMaskedLM            | 1  | 0.825321 |
|           LayoutLMForMaskedLM           | 1  | 0.812375 |
|             BertForMaskedLM             | 1  | 0.811786 |
|                CamemBert                | 1  | 0.810777 |
|         MegatronBertForCausalLM         | 1  | 0.810685 |
|         Speech2Text2ForCausalLM         | 1  | 0.809745 |
|           RobertaForCausalLM            | 1  | 0.808711 |
|           ElectraForCausalLM            | 1  | 0.805219 |
|     DistilBertForQuestionAnswering      | 1  | 0.800699 |
|          BlenderbotForCausalLM          | 1  | 0.79812  |
|          DebertaV2ForMaskedLM           | 1  | 0.796707 |
|             BartForCausalLM             | 1  | 0.793702 |
|            TrOCRForCausalLM             | 1  | 0.787751 |
|            YituTechConvBert             | 1  | 0.786866 |
|       MT5ForConditionalGeneration       | 1  | 0.780341 |
|      BartForConditionalGeneration       | 1  | 0.778779 |
|       BlenderbotSmallForCausalLM        | 1  | 0.76495  |
|           PegasusForCausalLM            | 1  | 0.751066 |
|          DistilBertForMaskedLM          | 1  | 0.746539 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.738964 |
|     MobileBertForQuestionAnswering      | 1  | 0.733777 |
|     M2M100ForConditionalGeneration      | 1  | 0.717185 |
|     PegasusForConditionalGeneration     | 1  | 0.715535 |
|          MobileBertForMaskedLM          | 1  | 0.703119 |
|             XGLMForCausalLM             | 1  | 0.697346 |
|            AlbertForMaskedLM            | 1  | 0.448704 |
|       AlbertForQuestionAnswering        | 1  | 0.443619 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12746.324483 |
|       AlbertForQuestionAnswering        | 1  | 12693.887663 |
|      MBartForConditionalGeneration      | 1  | 6134.057564  |
|      BartForConditionalGeneration       | 1  | 5674.300826  |
|             OPTForCausalLM              | 1  | 5212.911704  |
|          DebertaV2ForMaskedLM           | 1  | 5057.723236  |
|      DebertaV2ForQuestionAnswering      | 1  | 3971.175951  |
|            XLNetLMHeadModel             | 1  | 3113.288452  |
|            MBartForCausalLM             | 1  | 3037.834081  |
|          BlenderbotForCausalLM          | 1  |  2628.82508  |
|             BartForCausalLM             | 1  | 2545.024191  |
|       T5ForConditionalGeneration        | 1  |  2474.22357  |
|                 T5Small                 | 1  | 2457.633395  |
|          AllenaiLongformerBase          | 1  | 2408.230722  |
|     PLBartForConditionalGeneration      | 1  | 2173.172236  |
|         MegatronBertForCausalLM         | 1  | 2034.552744  |
|    MegatronBertForQuestionAnswering     | 1  | 1858.700256  |
|      GPT2ForSequenceClassification      | 1  | 1313.859235  |
|            PLBartForCausalLM            | 1  | 1211.330659  |
|             XGLMForCausalLM             | 1  |  829.511295  |
|           DebertaForMaskedLM            | 1  |  788.55448   |
|           RobertaForCausalLM            | 1  |  777.255039  |
|     M2M100ForConditionalGeneration      | 1  |  711.972133  |
|                CamemBert                | 1  |  693.915314  |
|             BertForMaskedLM             | 1  |  686.860551  |
|           LayoutLMForMaskedLM           | 1  |  681.294458  |
|            YituTechConvBert             | 1  |  681.048008  |
|     PegasusForConditionalGeneration     | 1  |  604.502293  |
|            TrOCRForCausalLM             | 1  |  586.046822  |
|       DebertaForQuestionAnswering       | 1  |  558.755996  |
|    LayoutLMForSequenceClassification    | 1  |  545.915763  |
|        BertForQuestionAnswering         | 1  |  545.249506  |
|       RobertaForQuestionAnswering       | 1  |  544.442877  |
|               DistillGPT2               | 1  |  503.018771  |
|               GoogleFnet                | 1  |  471.970998  |
|       MT5ForConditionalGeneration       | 1  |  300.373102  |
|           PegasusForCausalLM            | 1  |  298.402933  |
| BlenderbotSmallForConditionalGeneration | 1  |  143.733492  |
|           ElectraForCausalLM            | 1  |  134.546777  |
|          DistilBertForMaskedLM          | 1  |  99.307303   |
|       ElectraForQuestionAnswering       | 1  |  94.119389   |
|       BlenderbotSmallForCausalLM        | 1  |  83.931961   |
|          MobileBertForMaskedLM          | 1  |  63.365816   |
|     DistilBertForQuestionAnswering      | 1  |  63.330327   |
|     MobileBertForQuestionAnswering      | 1  |  36.740805   |
|         Speech2Text2ForCausalLM         | 1  |  18.312624   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.398565 |
|          inception_v3           | 1  | 2.284134 |
|       gluon_inception_v3        | 1  | 2.267469 |
|        adv_inception_v3         | 1  | 2.241362 |
|           dm_nfnet_f0           | 1  | 2.222122 |
|            nfnet_l0             | 1  | 2.22191  |
|         mobilenetv2_100         | 1  | 2.116742 |
|          spnasnet_100           | 1  | 2.053341 |
|          ghostnet_100           | 1  | 2.049576 |
|            levit_128            | 1  | 2.034491 |
|            lcnet_050            | 1  | 2.013911 |
|           fbnetc_100            | 1  | 1.989645 |
|            repvgg_a2            | 1  | 1.989306 |
|            hrnet_w18            | 1  | 1.987065 |
|           mnasnet_100           | 1  | 1.986606 |
|      mobilenetv3_large_100      | 1  | 1.949966 |
|           regnety_002           | 1  | 1.903366 |
|           rexnet_100            | 1  | 1.815306 |
|            fbnetv3_b            | 1  | 1.798255 |
|           selecsls42b           | 1  | 1.790324 |
|       tf_efficientnet_b0        | 1  | 1.790007 |
|             dla102              | 1  | 1.724363 |
|        ese_vovnet19b_dw         | 1  | 1.709337 |
|          botnet26t_256          | 1  | 1.68435  |
|       eca_botnext26ts_256       | 1  | 1.67983  |
|            tinynet_a            | 1  | 1.659796 |
|          cspdarknet53           | 1  | 1.649985 |
|           resnest101e           | 1  | 1.632477 |
|        eca_halonext26ts         | 1  | 1.561751 |
|           res2next50            | 1  | 1.554238 |
|        res2net50_14w_8s         | 1  | 1.537543 |
|         poolformer_m36          | 1  | 1.537191 |
|           volo_d1_224           | 1  | 1.508407 |
|        res2net101_26w_4s        | 1  | 1.500444 |
|           mobilevit_s           | 1  | 1.497066 |
|           tf_mixnet_l           | 1  | 1.456494 |
|         visformer_small         | 1  | 1.425463 |
|           convit_base           | 1  | 1.390759 |
|          gmixer_24_224          | 1  | 1.389276 |
|     swsl_resnext101_32x16d      | 1  | 1.343119 |
|            mixnet_l             | 1  | 1.328205 |
|            gernet_l             | 1  | 1.319762 |
|        twins_pcpvt_base         | 1  | 1.317126 |
|  swin_base_patch4_window7_224   | 1  | 1.28886  |
|      beit_base_patch16_224      | 1  | 1.287587 |
|          resmlp_12_224          | 1  | 1.283139 |
|        convmixer_768_32         | 1  | 1.266399 |
|          mixer_b16_224          | 1  | 1.21176  |
|      vit_base_patch16_224       | 1  | 1.206559 |
|             dpn107              | 1  | 1.206362 |
| deit_base_distilled_patch16_224 | 1  | 1.202085 |
|      xcit_large_24_p8_224       | 1  | 1.190117 |
|         crossvit_9_240          | 1  | 1.189995 |
|        tnt_s_patch16_224        | 1  | 1.177125 |
|          gmlp_s16_224           | 1  | 1.177049 |
|          jx_nest_base           | 1  | 1.163074 |
|            pit_b_224            | 1  | 1.16161  |
|          convnext_base          | 1  | 1.146404 |
|        sebotnet33ts_256         | 1  | 1.115313 |
|          cait_m36_384           | 1  | 0.973558 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|        adv_inception_v3         | 1  |    pass     |
|      beit_base_patch16_224      | 1  |    pass     |
|          botnet26t_256          | 1  |    pass     |
|          cait_m36_384           | 1  |    pass     |
|           convit_base           | 1  |    pass     |
|        convmixer_768_32         | 1  |    pass     |
|          convnext_base          | 1  |    pass     |
|         crossvit_9_240          | 1  |    pass     |
|          cspdarknet53           | 1  |    pass     |
| deit_base_distilled_patch16_224 | 1  |    pass     |
|             dla102              | 1  |    pass     |
|           dm_nfnet_f0           | 1  |    pass     |
|             dpn107              | 1  |    pass     |
|       eca_botnext26ts_256       | 1  |    pass     |
|        eca_halonext26ts         | 1  |    pass     |
|        ese_vovnet19b_dw         | 1  |    pass     |
|           fbnetc_100            | 1  |    pass     |
|            fbnetv3_b            | 1  |    pass     |
|            gernet_l             | 1  |    pass     |
|          ghostnet_100           | 1  |    pass     |
|       gluon_inception_v3        | 1  |    pass     |
|          gmixer_24_224          | 1  |    pass     |
|          gmlp_s16_224           | 1  |    pass     |
|            hrnet_w18            | 1  |    pass     |
|          inception_v3           | 1  |    pass     |
|          jx_nest_base           | 1  |    pass     |
|            lcnet_050            | 1  |    pass     |
|            levit_128            | 1  |    pass     |
|      xcit_large_24_p8_224       | 1  |    pass     |
|          mixer_b16_224          | 1  |    pass     |
|            mixnet_l             | 1  |    pass     |
|           mnasnet_100           | 1  |    pass     |
|         mobilenetv2_100         | 1  |    pass     |
|      mobilenetv3_large_100      | 1  |    pass     |
|           mobilevit_s           | 1  |    pass     |
|            nfnet_l0             | 1  |    pass     |
|            pit_b_224            | 1  |    pass     |
|          pnasnet5large          | 1  |    pass     |
|         poolformer_m36          | 1  |    pass     |
|           regnety_002           | 1  |    pass     |
|            repvgg_a2            | 1  |    pass     |
|        res2net101_26w_4s        | 1  |    pass     |
|        res2net50_14w_8s         | 1  |    pass     |
|           res2next50            | 1  |    pass     |
|          resmlp_12_224          | 1  |    pass     |
|           resnest101e           | 1  |    pass     |
|           rexnet_100            | 1  |    pass     |
|        sebotnet33ts_256         | 1  |    pass     |
|           selecsls42b           | 1  |    pass     |
|          spnasnet_100           | 1  |    pass     |
|  swin_base_patch4_window7_224   | 1  |    pass     |
|     swsl_resnext101_32x16d      | 1  |    pass     |
|       tf_efficientnet_b0        | 1  |    pass     |
|           tf_mixnet_l           | 1  |    pass     |
|            tinynet_a            | 1  |    pass     |
|        tnt_s_patch16_224        | 1  |    pass     |
|        twins_pcpvt_base         | 1  |    pass     |
|         visformer_small         | 1  |    pass     |
|      vit_base_patch16_224       | 1  |    pass     |
|           volo_d1_224           | 1  |    pass     |
|         coat_lite_mini          | 1  | fail_to_run |
+---------------------------------+----+-------------+

Compilation latency (sec)

+---------------------------------+----+------------+
|              name               | bs |  inductor  |
+---------------------------------+----+------------+
|          pnasnet5large          | 1  | 485.024964 |
|            hrnet_w18            | 1  | 265.328951 |
|        res2net101_26w_4s        | 1  | 79.524783  |
|           tf_mixnet_l           | 1  | 72.593224  |
|           resnest101e           | 1  | 71.412538  |
|            mixnet_l             | 1  | 68.556969  |
|          cait_m36_384           | 1  | 67.380792  |
|        res2net50_14w_8s         | 1  | 65.202396  |
|        twins_pcpvt_base         | 1  | 61.102825  |
|      xcit_large_24_p8_224       | 1  | 58.528154  |
|         poolformer_m36          | 1  | 56.483565  |
|  swin_base_patch4_window7_224   | 1  | 56.089879  |
|             dpn107              | 1  | 51.114142  |
|        tnt_s_patch16_224        | 1  | 45.617234  |
|          jx_nest_base           | 1  | 44.944946  |
|           mobilevit_s           | 1  | 42.929589  |
|            fbnetv3_b            | 1  | 42.738536  |
|          convnext_base          | 1  | 38.006137  |
|             dla102              | 1  |  37.18219  |
|        adv_inception_v3         | 1  | 36.474411  |
|          inception_v3           | 1  | 36.422894  |
|       gluon_inception_v3        | 1  | 36.393613  |
|          gmlp_s16_224           | 1  |  35.13367  |
|           volo_d1_224           | 1  | 34.871356  |
|          ghostnet_100           | 1  | 34.656143  |
|          gmixer_24_224          | 1  | 34.258784  |
|           res2next50            | 1  | 34.155168  |
|         crossvit_9_240          | 1  | 32.925677  |
|           dm_nfnet_f0           | 1  | 32.427934  |
|            tinynet_a            | 1  | 32.311831  |
|     swsl_resnext101_32x16d      | 1  | 31.957093  |
|        sebotnet33ts_256         | 1  | 31.768555  |
|        eca_halonext26ts         | 1  | 31.759639  |
|            levit_128            | 1  | 31.032935  |
|           rexnet_100            | 1  | 30.429914  |
|            nfnet_l0             | 1  | 30.058562  |
|        convmixer_768_32         | 1  | 29.985786  |
|       tf_efficientnet_b0        | 1  | 29.662337  |
|           convit_base           | 1  | 27.941079  |
|         visformer_small         | 1  | 26.322643  |
|       eca_botnext26ts_256       | 1  | 26.286801  |
|            pit_b_224            | 1  |  25.39742  |
|          cspdarknet53           | 1  | 25.394661  |
|           regnety_002           | 1  | 25.316103  |
|          botnet26t_256          | 1  |  25.2756   |
|      beit_base_patch16_224      | 1  | 24.390974  |
|      mobilenetv3_large_100      | 1  | 23.624851  |
| deit_base_distilled_patch16_224 | 1  | 23.366073  |
|           fbnetc_100            | 1  | 23.257703  |
|      vit_base_patch16_224       | 1  | 23.227055  |
|          mixer_b16_224          | 1  | 23.144424  |
|          spnasnet_100           | 1  | 23.089544  |
|            repvgg_a2            | 1  | 22.335093  |
|            gernet_l             | 1  | 22.306999  |
|         mobilenetv2_100         | 1  | 21.313954  |
|        ese_vovnet19b_dw         | 1  | 21.227594  |
|           mnasnet_100           | 1  |  21.04228  |
|          resmlp_12_224          | 1  | 20.145819  |
|           selecsls42b           | 1  | 19.502578  |
|            lcnet_050            | 1  | 18.379529  |
+---------------------------------+----+------------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.947908 |
|          pnasnet5large          | 1  | 0.929874 |
|        convmixer_768_32         | 1  | 0.918826 |
|            nfnet_l0             | 1  | 0.896129 |
|      xcit_large_24_p8_224       | 1  | 0.890945 |
|        ese_vovnet19b_dw         | 1  | 0.886199 |
|           mnasnet_100           | 1  | 0.876751 |
|          spnasnet_100           | 1  | 0.876487 |
|         mobilenetv2_100         | 1  | 0.876075 |
|            fbnetv3_b            | 1  | 0.875808 |
|       tf_efficientnet_b0        | 1  | 0.874101 |
|           rexnet_100            | 1  | 0.870233 |
|      mobilenetv3_large_100      | 1  | 0.868891 |
|            tinynet_a            | 1  | 0.865734 |
|            lcnet_050            | 1  | 0.86345  |
|           fbnetc_100            | 1  | 0.863398 |
|         poolformer_m36          | 1  | 0.856048 |
|           dm_nfnet_f0           | 1  | 0.855064 |
|       eca_botnext26ts_256       | 1  | 0.853163 |
|           tf_mixnet_l           | 1  | 0.848506 |
|           mobilevit_s           | 1  | 0.848093 |
|        eca_halonext26ts         | 1  | 0.844307 |
|           regnety_002           | 1  | 0.841897 |
|          ghostnet_100           | 1  | 0.841336 |
|          botnet26t_256          | 1  | 0.84101  |
|            mixnet_l             | 1  | 0.833014 |
|          resmlp_12_224          | 1  | 0.828786 |
|         visformer_small         | 1  | 0.819848 |
|           res2next50            | 1  | 0.814816 |
|            levit_128            | 1  | 0.809551 |
|          convnext_base          | 1  | 0.802762 |
|             dpn107              | 1  | 0.80085  |
|        sebotnet33ts_256         | 1  | 0.798571 |
|        res2net50_14w_8s         | 1  | 0.797818 |
|            hrnet_w18            | 1  | 0.797106 |
|          gmixer_24_224          | 1  | 0.796598 |
|          gmlp_s16_224           | 1  | 0.796128 |
|          cspdarknet53           | 1  | 0.793542 |
|           volo_d1_224           | 1  | 0.787641 |
|        tnt_s_patch16_224        | 1  | 0.784649 |
|           convit_base           | 1  | 0.783931 |
|         crossvit_9_240          | 1  | 0.779663 |
|          mixer_b16_224          | 1  | 0.77731  |
|           resnest101e           | 1  | 0.776436 |
|             dla102              | 1  | 0.776154 |
|        twins_pcpvt_base         | 1  | 0.774179 |
|          jx_nest_base           | 1  | 0.772384 |
|      beit_base_patch16_224      | 1  | 0.771168 |
|      vit_base_patch16_224       | 1  | 0.763957 |
|       gluon_inception_v3        | 1  | 0.763911 |
|          inception_v3           | 1  | 0.763854 |
|        adv_inception_v3         | 1  | 0.763746 |
| deit_base_distilled_patch16_224 | 1  | 0.761058 |
|            pit_b_224            | 1  | 0.753855 |
|        res2net101_26w_4s        | 1  | 0.741098 |
|           selecsls42b           | 1  | 0.739212 |
|  swin_base_patch4_window7_224   | 1  | 0.73865  |
|            gernet_l             | 1  | 0.734021 |
|            repvgg_a2            | 1  | 0.690442 |
|     swsl_resnext101_32x16d      | 1  | 0.63819  |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 3643.623508 |
|      xcit_large_24_p8_224       | 1  | 1531.555054 |
|     swsl_resnext101_32x16d      | 1  | 442.548485  |
|          pnasnet5large          | 1  | 362.933324  |
|          convnext_base          | 1  | 306.432478  |
|             dpn107              | 1  | 259.738241  |
|        convmixer_768_32         | 1  | 242.326454  |
|          jx_nest_base           | 1  |  234.53227  |
|      beit_base_patch16_224      | 1  | 197.264929  |
| deit_base_distilled_patch16_224 | 1  | 196.354392  |
|      vit_base_patch16_224       | 1  | 194.916573  |
|  swin_base_patch4_window7_224   | 1  | 194.181324  |
|           convit_base           | 1  | 193.704295  |
|            pit_b_224            | 1  | 169.017575  |
|           resnest101e           | 1  | 163.780127  |
|           dm_nfnet_f0           | 1  | 157.261108  |
|          mixer_b16_224          | 1  | 139.430836  |
|         poolformer_m36          | 1  | 136.451659  |
|        res2net101_26w_4s        | 1  | 110.286849  |
|        twins_pcpvt_base         | 1  | 106.426699  |
|           volo_d1_224           | 1  |  93.366766  |
|            nfnet_l0             | 1  |  93.285267  |
|        tnt_s_patch16_224        | 1  |  92.538648  |
|             dla102              | 1  |  88.985339  |
|        sebotnet33ts_256         | 1  |  84.135765  |
|            hrnet_w18            | 1  |  83.649423  |
|          cspdarknet53           | 1  |  81.802694  |
|          inception_v3           | 1  |  71.46566   |
|        adv_inception_v3         | 1  |  71.444678  |
|       gluon_inception_v3        | 1  |  71.420707  |
|          gmlp_s16_224           | 1  |  68.14362   |
|         visformer_small         | 1  |  65.347626  |
|            repvgg_a2            | 1  |  62.545391  |
|        res2net50_14w_8s         | 1  |  62.540195  |
|          gmixer_24_224          | 1  |  61.83283   |
|           res2next50            | 1  |  58.126845  |
|            gernet_l             | 1  |  55.999368  |
|          botnet26t_256          | 1  |  44.028447  |
|           selecsls42b           | 1  |  43.998182  |
|        eca_halonext26ts         | 1  |  43.513328  |
|           mobilevit_s           | 1  |  41.258905  |
|       eca_botnext26ts_256       | 1  |   39.8278   |
|          resmlp_12_224          | 1  |  35.383311  |
|         crossvit_9_240          | 1  |  32.838862  |
|        ese_vovnet19b_dw         | 1  |  30.800595  |
|            mixnet_l             | 1  |  28.334613  |
|           tf_mixnet_l           | 1  |  27.904948  |
|            fbnetv3_b            | 1  |  14.691512  |
|       tf_efficientnet_b0        | 1  |  12.699229  |
|           rexnet_100            | 1  |  12.567029  |
|            tinynet_a            | 1  |  11.371343  |
|           fbnetc_100            | 1  |  8.746018   |
|            levit_128            | 1  |  8.238647   |
|          ghostnet_100           | 1  |  8.013017   |
|          spnasnet_100           | 1  |  7.805432   |
|           mnasnet_100           | 1  |  7.233701   |
|         mobilenetv2_100         | 1  |  7.006522   |
|      mobilenetv3_large_100      | 1  |   6.67847   |
|           regnety_002           | 1  |  5.866928   |
|            lcnet_050            | 1  |  2.254533   |
+---------------------------------+----+-------------+

@zxd1997066
Copy link
Contributor

[cppwrapper_static_shape] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-05-05 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main 6d30803d64953955df63da56833bf4eb52249aae
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+06ad737
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 96%, 76/79 | 100%, 46/46 | 100%, 60/60 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.61x    |    1.20x    |    1.57x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   52.72    |    37.28    |    47.49    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.85x    |    0.81x    |    0.82x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 55.909156 |
|     pyhpc_equation_of_state     |    1    | 29.515701 |
|          maml_omniglot          |    5    | 4.174174  |
|          squeezenet1_1          |    1    |  3.56444  |
|         basic_gnn_sage          |    1    | 3.552044  |
|          basic_gnn_gcn          |    1    | 3.492467  |
|          basic_gnn_gin          |    1    | 3.478323  |
|     functorch_maml_omniglot     |    1    | 3.399451  |
|           timm_nfnet            |    1    | 2.779046  |
|         opacus_cifar10          |    1    | 2.504579  |
|       shufflenet_v2_x1_0        |    1    | 2.354942  |
|      functorch_dp_cifar10       |    1    | 2.273608  |
|              dcgan              |    1    | 2.250037  |
|            resnet18             |    1    | 2.229638  |
|          lennard_jones          |    1    | 2.093192  |
|          mobilenet_v2           |    1    | 2.043144  |
|          timm_resnest           |    1    | 2.004058  |
|           mnasnet1_0            |    1    | 1.918625  |
|       mobilenet_v3_large        |    1    | 1.909048  |
|         phlippe_resnet          |    1    | 1.819481  |
|            resnet50             |    1    | 1.806526  |
|        phlippe_densenet         |    1    | 1.800914  |
|           densenet121           |    1    | 1.779485  |
|        timm_efficientnet        |    1    | 1.706023  |
|            resnet152            |    1    | 1.685713  |
|         LearningToPaint         |    1    | 1.616149  |
|           timm_vovnet           |    1    | 1.612419  |
|              llama              |    1    | 1.563241  |
|      doctr_reco_predictor       |    1    | 1.500021  |
|              dlrm               |    1    | 1.496211  |
|           timm_regnet           |    1    | 1.491549  |
|         resnext50_32x4d         |    1    | 1.484196  |
|              vgg16              |    1    |  1.44497  |
|        basic_gnn_edgecnn        |    1    | 1.403575  |
|             alexnet             |    1    | 1.370012  |
|             yolov3              |    1    | 1.343931  |
|          BERT_pytorch           |    1    | 1.312379  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.292778  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.291432  |
|               drq               |    1    | 1.284666  |
|            hf_Albert            |    1    | 1.279095  |
|             hf_GPT2             |    1    | 1.253246  |
|              maml               |    1    | 1.251135  |
|          hf_GPT2_large          |    1    |  1.23733  |
|       doctr_det_predictor       |    1    | 1.227246  |
|            moondream            |    1    | 1.225498  |
|     timm_vision_transformer     |    1    | 1.214995  |
|          fastNLP_Bert           |    1    | 1.207564  |
|        soft_actor_critic        |   256   | 1.196156  |
|         pytorch_stargan         |   16    | 1.193236  |
|          hf_Bert_large          |    1    |  1.17204  |
|  timm_vision_transformer_large  |    1    | 1.170428  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.165753  |
|           hf_BigBird            |    1    | 1.162157  |
|             hf_Bert             |    1    |  1.15576  |
|          hf_DistilBert          |    1    | 1.134977  |
|      torch_multimodal_clip      |    1    | 1.120629  |
|             hf_Bart             |    1    | 1.095382  |
|        hf_distil_whisper        |    1    | 1.074049  |
|          pytorch_unet           |    1    |   1.071   |
|       speech_transformer        |    1    | 1.070794  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.049287  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.042921  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.032836  |
|          hf_Longformer          |    1    | 1.008586  |
|             demucs              |    1    | 1.004711  |
|           tts_angular           |    1    | 1.001821  |
|     resnet50_quantized_qat      |    1    | 0.995896  |
|   mobilenet_v2_quantized_qat    |    1    |  0.98966  |
|     nvidia_deeprecommender      |    1    | 0.946077  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.910226  |
|           hf_Reformer           |    1    | 0.859218  |
|       Background_Matting        |    1    | 0.833261  |
|           hf_T5_large           |    1    | 0.793739  |
|              hf_T5              |    1    | 0.707498  |
|           hf_T5_base            |    1    | 0.593575  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|             yolov3              |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|           densenet121           |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|             demucs              |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|              llama              |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|            moondream            |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_fpn |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    1    |    fail_to_run     |
|           Super_SloMo           |    1    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 479.176094 |
|    detectron2_fcos_r_50_fpn     |    1    | 414.373662 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 227.212591 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 221.037695 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 209.992877 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 205.189203 |
|              maml               |    1    | 144.577846 |
|           hf_T5_large           |    1    | 111.532408 |
|           hf_T5_base            |    1    | 105.231386 |
|       speech_transformer        |    1    | 90.536562  |
|          hf_Longformer          |    1    | 85.455408  |
|           hf_Reformer           |    1    | 81.261211  |
|          basic_gnn_gcn          |    1    | 61.734338  |
|           densenet121           |    1    |  56.91413  |
|          fastNLP_Bert           |    1    | 54.554835  |
|            resnet152            |    1    | 53.094347  |
|  timm_vision_transformer_large  |    1    | 49.645257  |
|       doctr_det_predictor       |    1    | 45.716778  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 42.960087  |
|          hf_GPT2_large          |    1    | 42.906993  |
|          hf_Bert_large          |    1    | 42.657639  |
|            moondream            |    1    | 41.545074  |
|        hf_distil_whisper        |    1    | 40.667211  |
|      torch_multimodal_clip      |    1    |  36.52357  |
|             demucs              |    1    | 34.893266  |
|           timm_regnet           |    1    | 33.424077  |
|           timm_nfnet            |    1    | 32.324437  |
|              hf_T5              |    1    | 31.565325  |
|             hf_Bart             |    1    | 30.603733  |
|             yolov3              |    1    | 29.287943  |
|       Background_Matting        |    1    | 28.976122  |
|        timm_efficientnet        |    1    |  28.12955  |
|          BERT_pytorch           |    1    | 28.044407  |
|             hf_Bert             |    1    | 26.402368  |
|        phlippe_densenet         |    1    | 25.975064  |
|      doctr_reco_predictor       |    1    |  25.8713   |
|       shufflenet_v2_x1_0        |    1    | 25.283275  |
|            hf_Albert            |    1    | 24.665964  |
|             hf_GPT2             |    1    |  24.07221  |
|       mobilenet_v3_large        |    1    | 23.464269  |
|              llama              |    1    | 23.197169  |
|     timm_vision_transformer     |    1    | 23.152937  |
|         opacus_cifar10          |    1    | 21.653004  |
|         resnext50_32x4d         |    1    | 21.471597  |
|            resnet50             |    1    | 21.465655  |
|           timm_vovnet           |    1    |  21.31692  |
|          timm_resnest           |    1    | 20.967382  |
|          mobilenet_v2           |    1    | 20.835153  |
|           mnasnet1_0            |    1    | 20.588668  |
|          hf_DistilBert          |    1    | 20.513138  |
|     pyhpc_isoneutral_mixing     |    1    | 20.460509  |
|      functorch_dp_cifar10       |    1    | 20.285432  |
|         pytorch_stargan         |   16    | 19.507985  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  18.63554  |
|          pytorch_unet           |    1    | 18.575845  |
|          squeezenet1_1          |    1    |  17.77758  |
|            resnet18             |    1    | 17.154469  |
|     pyhpc_equation_of_state     |    1    |  17.02187  |
|         LearningToPaint         |    1    | 17.017683  |
|              vgg16              |    1    | 16.932025  |
|         phlippe_resnet          |    1    | 16.753377  |
|             alexnet             |    1    | 16.522514  |
|               drq               |    1    | 15.794343  |
|     functorch_maml_omniglot     |    1    | 15.638874  |
|     nvidia_deeprecommender      |    1    | 15.406015  |
|          maml_omniglot          |    5    | 15.233524  |
|              dlrm               |    1    | 15.004998  |
|              dcgan              |    1    | 14.771939  |
|          basic_gnn_gin          |    1    |  14.42687  |
|        basic_gnn_edgecnn        |    1    |  14.3975   |
|         basic_gnn_sage          |    1    | 14.395809  |
|        soft_actor_critic        |   256   | 14.248263  |
|          lennard_jones          |    1    | 13.964945  |
|           tts_angular           |    1    | 13.728432  |
|   mobilenet_v2_quantized_qat    |    1    |  0.098478  |
|     resnet50_quantized_qat      |    1    |  0.070547  |
|        timm_efficientdet        |    0    |    0.0     |
|              moco               |    0    |    0.0     |
|         DALLE2_pytorch          |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.988471 |
|           hf_T5_base            |    1    | 0.987007 |
|             demucs              |    1    | 0.985517 |
|       Background_Matting        |    1    | 0.981885 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981801 |
|          pytorch_unet           |    1    | 0.978272 |
|          hf_GPT2_large          |    1    | 0.973931 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.97222  |
|        basic_gnn_edgecnn        |    1    | 0.971242 |
|       doctr_det_predictor       |    1    | 0.969269 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.96085  |
|     resnet50_quantized_qat      |    1    | 0.955013 |
|         LearningToPaint         |    1    | 0.94686  |
|         pytorch_stargan         |   16    | 0.946696 |
|           hf_BigBird            |    1    | 0.944876 |
|      doctr_reco_predictor       |    1    | 0.943313 |
|          basic_gnn_gin          |    1    | 0.939601 |
|          basic_gnn_gcn          |    1    | 0.938725 |
|         basic_gnn_sage          |    1    | 0.938108 |
|   mobilenet_v2_quantized_qat    |    1    | 0.929005 |
|              llama              |    1    | 0.918789 |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  0.9154  |
|      torch_multimodal_clip      |    1    | 0.901686 |
|        hf_distil_whisper        |    1    | 0.892867 |
|           tts_angular           |    1    | 0.891225 |
|        soft_actor_critic        |   256   | 0.887746 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.884701 |
|         opacus_cifar10          |    1    | 0.883921 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.881902 |
|        timm_efficientnet        |    1    | 0.879656 |
|          mobilenet_v2           |    1    | 0.872401 |
|          lennard_jones          |    1    | 0.861835 |
|           mnasnet1_0            |    1    | 0.860565 |
|          maml_omniglot          |    5    | 0.859574 |
|          squeezenet1_1          |    1    | 0.858586 |
|     functorch_maml_omniglot     |    1    | 0.856536 |
|          timm_resnest           |    1    | 0.856012 |
|              dcgan              |    1    | 0.852971 |
|          fastNLP_Bert           |    1    | 0.849112 |
|       mobilenet_v3_large        |    1    | 0.848958 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.84811  |
|       shufflenet_v2_x1_0        |    1    | 0.840568 |
|         phlippe_resnet          |    1    | 0.839869 |
|     pyhpc_equation_of_state     |    1    | 0.835002 |
|       speech_transformer        |    1    | 0.828567 |
|        phlippe_densenet         |    1    | 0.819253 |
|           timm_nfnet            |    1    | 0.816011 |
|           hf_T5_large           |    1    | 0.811351 |
|         resnext50_32x4d         |    1    | 0.811208 |
|     pyhpc_isoneutral_mixing     |    1    | 0.809322 |
|          hf_Bert_large          |    1    | 0.808793 |
|          hf_Longformer          |    1    | 0.805431 |
|            hf_Albert            |    1    | 0.802429 |
|     timm_vision_transformer     |    1    | 0.800917 |
|             hf_Bert             |    1    | 0.800541 |
|            moondream            |    1    | 0.796435 |
|              maml               |    1    | 0.794882 |
|             yolov3              |    1    | 0.780619 |
|          BERT_pytorch           |    1    | 0.779464 |
|          hf_DistilBert          |    1    | 0.777505 |
|            resnet50             |    1    | 0.769587 |
|             hf_GPT2             |    1    | 0.764452 |
|            resnet18             |    1    | 0.763006 |
|               drq               |    1    | 0.762514 |
|           timm_regnet           |    1    | 0.760594 |
|           timm_vovnet           |    1    | 0.759843 |
|           densenet121           |    1    | 0.759618 |
|             hf_Bart             |    1    | 0.751378 |
|              hf_T5              |    1    | 0.750378 |
|      functorch_dp_cifar10       |    1    | 0.744843 |
|             alexnet             |    1    | 0.735282 |
|  timm_vision_transformer_large  |    1    | 0.732923 |
|           hf_Reformer           |    1    | 0.729486 |
|              vgg16              |    1    | 0.723206 |
|            resnet152            |    1    | 0.694346 |
|     nvidia_deeprecommender      |    1    | 0.672993 |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26105.721146 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11788.370271 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11208.953785 |
|          hf_GPT2_large          |    1    | 10060.118915 |
|           hf_T5_large           |    1    | 7438.741842  |
|            moondream            |    1    | 7316.252677  |
|        hf_distil_whisper        |    1    | 6935.511847  |
|       Background_Matting        |    1    | 6687.271441  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5824.907568  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 5050.049568  |
|          pytorch_unet           |    1    | 4719.293928  |
|  timm_vision_transformer_large  |    1    | 2783.857591  |
|    detectron2_fcos_r_50_fpn     |    1    | 2537.589756  |
|             demucs              |    1    | 2239.695631  |
|         pytorch_stargan         |   16    |  2010.58367  |
|       doctr_det_predictor       |    1    | 1806.116389  |
|          hf_Bert_large          |    1    | 1761.056528  |
|           hf_BigBird            |    1    | 1465.566605  |
|      torch_multimodal_clip      |    1    | 1261.976529  |
|          hf_Longformer          |    1    | 1109.131671  |
|             hf_Bart             |    1    |  884.17418   |
|              hf_T5              |    1    |  764.936197  |
|             hf_Bert             |    1    |  680.598353  |
|       speech_transformer        |    1    |  669.735191  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  623.363714  |
|            hf_Albert            |    1    |  571.278273  |
|          fastNLP_Bert           |    1    |  520.503239  |
|             yolov3              |    1    |  440.773337  |
|          hf_DistilBert          |    1    |  412.719333  |
|           hf_Reformer           |    1    |  411.195334  |
|             hf_GPT2             |    1    |  353.909642  |
|        basic_gnn_edgecnn        |    1    |  230.64805   |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  210.145234  |
|              vgg16              |    1    |  189.786071  |
|           timm_regnet           |    1    |  148.537262  |
|          BERT_pytorch           |    1    |  139.604826  |
|            resnet152            |    1    |  135.446212  |
|           timm_nfnet            |    1    |  96.046213   |
|           timm_vovnet           |    1    |  79.032446   |
|              maml               |    1    |  74.401242   |
|     nvidia_deeprecommender      |    1    |  58.136332   |
|     timm_vision_transformer     |    1    |  57.881916   |
|         resnext50_32x4d         |    1    |   56.97089   |
|           tts_angular           |    1    |  54.107319   |
|            resnet50             |    1    |  50.051563   |
|           densenet121           |    1    |  41.988385   |
|          timm_resnest           |    1    |   32.89547   |
|          basic_gnn_gcn          |    1    |  29.211143   |
|      doctr_reco_predictor       |    1    |  23.149998   |
|             alexnet             |    1    |  22.025259   |
|            resnet18             |    1    |  21.755765   |
|              llama              |    1    |  20.492865   |
|     resnet50_quantized_qat      |    1    |  18.162432   |
|          basic_gnn_gin          |    1    |  16.652939   |
|         basic_gnn_sage          |    1    |  16.453008   |
|        timm_efficientnet        |    1    |  12.418923   |
|         LearningToPaint         |    1    |   9.548682   |
|           mnasnet1_0            |    1    |   7.23318    |
|          mobilenet_v2           |    1    |   7.049909   |
|   mobilenet_v2_quantized_qat    |    1    |   6.937444   |
|       mobilenet_v3_large        |    1    |   6.629672   |
|          squeezenet1_1          |    1    |   5.555037   |
|       shufflenet_v2_x1_0        |    1    |   4.907299   |
|        soft_actor_critic        |   256   |   2.964466   |
|        phlippe_densenet         |    1    |   2.800625   |
|      functorch_dp_cifar10       |    1    |   2.219275   |
|         opacus_cifar10          |    1    |   2.130965   |
|               drq               |    1    |   1.807988   |
|              dcgan              |    1    |   1.652546   |
|         phlippe_resnet          |    1    |   1.23526    |
|     functorch_maml_omniglot     |    1    |   0.844864   |
|              dlrm               |    1    |    0.5448    |
|          maml_omniglot          |    5    |   0.526185   |
|     pyhpc_isoneutral_mixing     |    1    |   0.049216   |
|     pyhpc_equation_of_state     |    1    |   0.035919   |
|          lennard_jones          |    1    |   0.032567   |
|              moco               |    0    |     0.0      |
|         DALLE2_pytorch          |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 2.128078 |
|     MobileBertForQuestionAnswering      | 1  | 1.735099 |
|            XLNetLMHeadModel             | 1  | 1.383051 |
|         Speech2Text2ForCausalLM         | 1  | 1.368662 |
|      GPT2ForSequenceClassification      | 1  | 1.316912 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.306456 |
|     DistilBertForQuestionAnswering      | 1  | 1.29673  |
|          DistilBertForMaskedLM          | 1  | 1.295489 |
|            YituTechConvBert             | 1  | 1.294237 |
|       BlenderbotSmallForCausalLM        | 1  | 1.292655 |
|       DebertaForQuestionAnswering       | 1  | 1.260868 |
|       MT5ForConditionalGeneration       | 1  | 1.253767 |
|          BlenderbotForCausalLM          | 1  | 1.250364 |
|     M2M100ForConditionalGeneration      | 1  | 1.250232 |
|     PegasusForConditionalGeneration     | 1  | 1.246876 |
|           DebertaForMaskedLM            | 1  | 1.246428 |
|             XGLMForCausalLM             | 1  | 1.244295 |
|           PegasusForCausalLM            | 1  | 1.237969 |
|               GoogleFnet                | 1  | 1.221823 |
|            AlbertForMaskedLM            | 1  | 1.204192 |
|       AlbertForQuestionAnswering        | 1  | 1.203767 |
|               DistillGPT2               | 1  | 1.188421 |
|    LayoutLMForSequenceClassification    | 1  | 1.181072 |
|           ElectraForCausalLM            | 1  | 1.179697 |
|    MegatronBertForQuestionAnswering     | 1  | 1.175642 |
|           LayoutLMForMaskedLM           | 1  | 1.173686 |
|        BertForQuestionAnswering         | 1  | 1.17214  |
|                CamemBert                | 1  | 1.170619 |
|         MegatronBertForCausalLM         | 1  | 1.162717 |
|             BertForMaskedLM             | 1  | 1.15777  |
|            TrOCRForCausalLM             | 1  | 1.156479 |
|       RobertaForQuestionAnswering       | 1  | 1.15535  |
|      DebertaV2ForQuestionAnswering      | 1  | 1.154431 |
|          DebertaV2ForMaskedLM           | 1  | 1.153266 |
|           RobertaForCausalLM            | 1  | 1.150789 |
|       ElectraForQuestionAnswering       | 1  | 1.149138 |
|     PLBartForConditionalGeneration      | 1  | 1.089354 |
|      MBartForConditionalGeneration      | 1  | 1.068491 |
|             BartForCausalLM             | 1  | 1.058182 |
|      BartForConditionalGeneration       | 1  | 1.055758 |
|            PLBartForCausalLM            | 1  | 1.032723 |
|             OPTForCausalLM              | 1  | 1.032386 |
|            MBartForCausalLM             | 1  | 1.014673 |
|          AllenaiLongformerBase          | 1  | 0.966654 |
|       T5ForConditionalGeneration        | 1  | 0.620773 |
|                 T5Small                 | 1  | 0.61841  |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+------------+
|                  name                   | bs |  inductor  |
+-----------------------------------------+----+------------+
|          MobileBertForMaskedLM          | 1  | 121.182308 |
|     MobileBertForQuestionAnswering      | 1  | 119.868786 |
|          AllenaiLongformerBase          | 1  | 93.068342  |
|      BartForConditionalGeneration       | 1  | 56.703029  |
|     PegasusForConditionalGeneration     | 1  | 49.737319  |
|     M2M100ForConditionalGeneration      | 1  | 49.705683  |
|      MBartForConditionalGeneration      | 1  | 49.532439  |
|          BlenderbotForCausalLM          | 1  |  47.94314  |
|             XGLMForCausalLM             | 1  | 46.502951  |
|            XLNetLMHeadModel             | 1  |  46.2435   |
|          DebertaV2ForMaskedLM           | 1  | 43.851387  |
|      DebertaV2ForQuestionAnswering      | 1  | 43.826192  |
|         MegatronBertForCausalLM         | 1  | 43.697763  |
|    MegatronBertForQuestionAnswering     | 1  | 43.442403  |
|       MT5ForConditionalGeneration       | 1  | 42.137585  |
| BlenderbotSmallForConditionalGeneration | 1  | 38.452606  |
|                 T5Small                 | 1  | 36.080077  |
|       T5ForConditionalGeneration        | 1  | 36.046721  |
|            YituTechConvBert             | 1  | 35.719808  |
|     PLBartForConditionalGeneration      | 1  |  31.82684  |
|            MBartForCausalLM             | 1  | 28.804871  |
|            TrOCRForCausalLM             | 1  | 28.622793  |
|           PegasusForCausalLM            | 1  | 28.198777  |
|             OPTForCausalLM              | 1  | 28.075307  |
|           ElectraForCausalLM            | 1  | 27.030101  |
|                CamemBert                | 1  | 26.776941  |
|           RobertaForCausalLM            | 1  | 26.758853  |
|       ElectraForQuestionAnswering       | 1  | 26.757776  |
|           LayoutLMForMaskedLM           | 1  | 26.702679  |
|       RobertaForQuestionAnswering       | 1  |  26.68925  |
|           DebertaForMaskedLM            | 1  | 26.622202  |
|             BertForMaskedLM             | 1  | 26.616642  |
|       DebertaForQuestionAnswering       | 1  | 26.563035  |
|        BertForQuestionAnswering         | 1  |  26.55943  |
|    LayoutLMForSequenceClassification    | 1  | 26.364675  |
|             BartForCausalLM             | 1  |  26.10624  |
|       BlenderbotSmallForCausalLM        | 1  | 23.065923  |
|      GPT2ForSequenceClassification      | 1  | 23.032601  |
|               GoogleFnet                | 1  | 21.433021  |
|            PLBartForCausalLM            | 1  | 21.248829  |
|          DistilBertForMaskedLM          | 1  | 21.077966  |
|     DistilBertForQuestionAnswering      | 1  | 20.992766  |
|         Speech2Text2ForCausalLM         | 1  | 20.767707  |
|               DistillGPT2               | 1  | 19.178904  |
|            AlbertForMaskedLM            | 1  | 17.838621  |
|       AlbertForQuestionAnswering        | 1  | 17.586888  |
+-----------------------------------------+----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.986788 |
|      MBartForConditionalGeneration      | 1  | 0.973008 |
|      GPT2ForSequenceClassification      | 1  | 0.951658 |
|          AllenaiLongformerBase          | 1  | 0.948934 |
|            MBartForCausalLM             | 1  | 0.923607 |
|     PLBartForConditionalGeneration      | 1  | 0.906272 |
|                 T5Small                 | 1  | 0.906106 |
|       T5ForConditionalGeneration        | 1  | 0.905933 |
|            XLNetLMHeadModel             | 1  | 0.905815 |
|            PLBartForCausalLM            | 1  | 0.903311 |
|       DebertaForQuestionAnswering       | 1  | 0.872564 |
|               GoogleFnet                | 1  | 0.854252 |
|       RobertaForQuestionAnswering       | 1  | 0.847853 |
|    LayoutLMForSequenceClassification    | 1  | 0.847812 |
|        BertForQuestionAnswering         | 1  | 0.841941 |
|       ElectraForQuestionAnswering       | 1  | 0.841447 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.835364 |
|    MegatronBertForQuestionAnswering     | 1  | 0.832151 |
|               DistillGPT2               | 1  | 0.827314 |
|           DebertaForMaskedLM            | 1  | 0.825753 |
|           LayoutLMForMaskedLM           | 1  | 0.81315  |
|         Speech2Text2ForCausalLM         | 1  | 0.812027 |
|         MegatronBertForCausalLM         | 1  | 0.811841 |
|             BertForMaskedLM             | 1  | 0.811734 |
|                CamemBert                | 1  | 0.810941 |
|           RobertaForCausalLM            | 1  | 0.809504 |
|           ElectraForCausalLM            | 1  | 0.805926 |
|     DistilBertForQuestionAnswering      | 1  | 0.800346 |
|          BlenderbotForCausalLM          | 1  | 0.798046 |
|          DebertaV2ForMaskedLM           | 1  | 0.798017 |
|             BartForCausalLM             | 1  | 0.793568 |
|            TrOCRForCausalLM             | 1  | 0.787982 |
|       MT5ForConditionalGeneration       | 1  | 0.787802 |
|            YituTechConvBert             | 1  | 0.785738 |
|      BartForConditionalGeneration       | 1  | 0.778793 |
|       BlenderbotSmallForCausalLM        | 1  | 0.765492 |
|           PegasusForCausalLM            | 1  | 0.750499 |
|          DistilBertForMaskedLM          | 1  | 0.746549 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.738737 |
|     MobileBertForQuestionAnswering      | 1  | 0.733417 |
|     M2M100ForConditionalGeneration      | 1  | 0.717241 |
|     PegasusForConditionalGeneration     | 1  | 0.715249 |
|          MobileBertForMaskedLM          | 1  | 0.702473 |
|             XGLMForCausalLM             | 1  | 0.696266 |
|            AlbertForMaskedLM            | 1  | 0.448704 |
|       AlbertForQuestionAnswering        | 1  | 0.443701 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12737.338792 |
|       AlbertForQuestionAnswering        | 1  | 12694.257703 |
|      MBartForConditionalGeneration      | 1  | 6126.543679  |
|      BartForConditionalGeneration       | 1  |  5675.0269   |
|             OPTForCausalLM              | 1  | 5209.377251  |
|          DebertaV2ForMaskedLM           | 1  | 5044.185928  |
|      DebertaV2ForQuestionAnswering      | 1  | 3969.713959  |
|            XLNetLMHeadModel             | 1  |  3111.27759  |
|            MBartForCausalLM             | 1  | 3036.393285  |
|          BlenderbotForCausalLM          | 1  | 2628.273215  |
|             BartForCausalLM             | 1  | 2557.518747  |
|       T5ForConditionalGeneration        | 1  | 2472.197957  |
|                 T5Small                 | 1  | 2468.886193  |
|          AllenaiLongformerBase          | 1  | 2407.836329  |
|     PLBartForConditionalGeneration      | 1  | 2170.292681  |
|         MegatronBertForCausalLM         | 1  | 2028.575547  |
|    MegatronBertForQuestionAnswering     | 1  | 1854.207538  |
|      GPT2ForSequenceClassification      | 1  | 1318.698438  |
|            PLBartForCausalLM            | 1  | 1212.044933  |
|             XGLMForCausalLM             | 1  |  832.451881  |
|           DebertaForMaskedLM            | 1  |  782.501854  |
|           RobertaForCausalLM            | 1  |  774.380638  |
|     M2M100ForConditionalGeneration      | 1  |  712.314089  |
|                CamemBert                | 1  |  691.986991  |
|            YituTechConvBert             | 1  |  687.897186  |
|             BertForMaskedLM             | 1  |  681.876967  |
|           LayoutLMForMaskedLM           | 1  |  681.22025   |
|     PegasusForConditionalGeneration     | 1  |  602.534985  |
|            TrOCRForCausalLM             | 1  |  584.110424  |
|       DebertaForQuestionAnswering       | 1  |  557.984136  |
|        BertForQuestionAnswering         | 1  |  546.361312  |
|       RobertaForQuestionAnswering       | 1  |  542.596952  |
|    LayoutLMForSequenceClassification    | 1  |  541.75428   |
|               DistillGPT2               | 1  |  505.214611  |
|               GoogleFnet                | 1  |  471.925698  |
|       MT5ForConditionalGeneration       | 1  |  300.588068  |
|           PegasusForCausalLM            | 1  |  299.713355  |
| BlenderbotSmallForConditionalGeneration | 1  |  143.372669  |
|           ElectraForCausalLM            | 1  |   135.022    |
|          DistilBertForMaskedLM          | 1  |  99.891963   |
|       ElectraForQuestionAnswering       | 1  |  95.031016   |
|       BlenderbotSmallForCausalLM        | 1  |  84.203115   |
|     DistilBertForQuestionAnswering      | 1  |  63.299547   |
|          MobileBertForMaskedLM          | 1  |  63.213432   |
|     MobileBertForQuestionAnswering      | 1  |  37.039735   |
|         Speech2Text2ForCausalLM         | 1  |  18.280916   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.400716 |
|          inception_v3           | 1  | 2.283057 |
|       gluon_inception_v3        | 1  | 2.257769 |
|        adv_inception_v3         | 1  | 2.243596 |
|           dm_nfnet_f0           | 1  | 2.224928 |
|            nfnet_l0             | 1  | 2.211658 |
|         mobilenetv2_100         | 1  | 2.131584 |
|          ghostnet_100           | 1  | 2.104035 |
|          spnasnet_100           | 1  | 2.064966 |
|            levit_128            | 1  | 2.033101 |
|            lcnet_050            | 1  | 2.02313  |
|            hrnet_w18            | 1  | 2.005957 |
|           mnasnet_100           | 1  | 1.988794 |
|           fbnetc_100            | 1  | 1.98706  |
|            repvgg_a2            | 1  | 1.98142  |
|      mobilenetv3_large_100      | 1  | 1.954214 |
|           regnety_002           | 1  | 1.908427 |
|           rexnet_100            | 1  | 1.838014 |
|            fbnetv3_b            | 1  | 1.789851 |
|       tf_efficientnet_b0        | 1  | 1.787523 |
|           selecsls42b           | 1  | 1.78628  |
|             dla102              | 1  | 1.725337 |
|        ese_vovnet19b_dw         | 1  | 1.711595 |
|          botnet26t_256          | 1  | 1.688721 |
|            tinynet_a            | 1  | 1.662321 |
|       eca_botnext26ts_256       | 1  | 1.656526 |
|          cspdarknet53           | 1  | 1.65235  |
|           resnest101e           | 1  | 1.630568 |
|        eca_halonext26ts         | 1  | 1.564895 |
|           res2next50            | 1  | 1.559471 |
|        res2net50_14w_8s         | 1  | 1.545805 |
|         poolformer_m36          | 1  | 1.537001 |
|           volo_d1_224           | 1  | 1.500188 |
|        res2net101_26w_4s        | 1  | 1.498843 |
|           mobilevit_s           | 1  | 1.490722 |
|           tf_mixnet_l           | 1  | 1.471092 |
|         visformer_small         | 1  | 1.417986 |
|           convit_base           | 1  | 1.392714 |
|          gmixer_24_224          | 1  | 1.380959 |
|     swsl_resnext101_32x16d      | 1  | 1.339979 |
|        twins_pcpvt_base         | 1  | 1.339757 |
|            gernet_l             | 1  | 1.317997 |
|            mixnet_l             | 1  | 1.31326  |
|      beit_base_patch16_224      | 1  | 1.288763 |
|          resmlp_12_224          | 1  | 1.286114 |
|  swin_base_patch4_window7_224   | 1  | 1.284677 |
|        convmixer_768_32         | 1  | 1.268421 |
|          mixer_b16_224          | 1  | 1.206717 |
| deit_base_distilled_patch16_224 | 1  | 1.205171 |
|      vit_base_patch16_224       | 1  | 1.203817 |
|             dpn107              | 1  | 1.200627 |
|         crossvit_9_240          | 1  | 1.191288 |
|      xcit_large_24_p8_224       | 1  | 1.188152 |
|        tnt_s_patch16_224        | 1  | 1.179393 |
|          jx_nest_base           | 1  | 1.177726 |
|            pit_b_224            | 1  | 1.161965 |
|          gmlp_s16_224           | 1  | 1.149068 |
|          convnext_base          | 1  | 1.133161 |
|        sebotnet33ts_256         | 1  | 1.10292  |
|          cait_m36_384           | 1  | 0.978343 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|        adv_inception_v3         | 1  |    pass     |
|      beit_base_patch16_224      | 1  |    pass     |
|          botnet26t_256          | 1  |    pass     |
|          cait_m36_384           | 1  |    pass     |
|           convit_base           | 1  |    pass     |
|        convmixer_768_32         | 1  |    pass     |
|          convnext_base          | 1  |    pass     |
|         crossvit_9_240          | 1  |    pass     |
|          cspdarknet53           | 1  |    pass     |
| deit_base_distilled_patch16_224 | 1  |    pass     |
|             dla102              | 1  |    pass     |
|           dm_nfnet_f0           | 1  |    pass     |
|             dpn107              | 1  |    pass     |
|       eca_botnext26ts_256       | 1  |    pass     |
|        eca_halonext26ts         | 1  |    pass     |
|        ese_vovnet19b_dw         | 1  |    pass     |
|           fbnetc_100            | 1  |    pass     |
|            fbnetv3_b            | 1  |    pass     |
|            gernet_l             | 1  |    pass     |
|          ghostnet_100           | 1  |    pass     |
|       gluon_inception_v3        | 1  |    pass     |
|          gmixer_24_224          | 1  |    pass     |
|          gmlp_s16_224           | 1  |    pass     |
|            hrnet_w18            | 1  |    pass     |
|          inception_v3           | 1  |    pass     |
|          jx_nest_base           | 1  |    pass     |
|            lcnet_050            | 1  |    pass     |
|            levit_128            | 1  |    pass     |
|      xcit_large_24_p8_224       | 1  |    pass     |
|          mixer_b16_224          | 1  |    pass     |
|            mixnet_l             | 1  |    pass     |
|           mnasnet_100           | 1  |    pass     |
|         mobilenetv2_100         | 1  |    pass     |
|      mobilenetv3_large_100      | 1  |    pass     |
|           mobilevit_s           | 1  |    pass     |
|            nfnet_l0             | 1  |    pass     |
|            pit_b_224            | 1  |    pass     |
|          pnasnet5large          | 1  |    pass     |
|         poolformer_m36          | 1  |    pass     |
|           regnety_002           | 1  |    pass     |
|            repvgg_a2            | 1  |    pass     |
|        res2net101_26w_4s        | 1  |    pass     |
|        res2net50_14w_8s         | 1  |    pass     |
|           res2next50            | 1  |    pass     |
|          resmlp_12_224          | 1  |    pass     |
|           resnest101e           | 1  |    pass     |
|           rexnet_100            | 1  |    pass     |
|        sebotnet33ts_256         | 1  |    pass     |
|           selecsls42b           | 1  |    pass     |
|          spnasnet_100           | 1  |    pass     |
|  swin_base_patch4_window7_224   | 1  |    pass     |
|     swsl_resnext101_32x16d      | 1  |    pass     |
|       tf_efficientnet_b0        | 1  |    pass     |
|           tf_mixnet_l           | 1  |    pass     |
|            tinynet_a            | 1  |    pass     |
|        tnt_s_patch16_224        | 1  |    pass     |
|        twins_pcpvt_base         | 1  |    pass     |
|         visformer_small         | 1  |    pass     |
|      vit_base_patch16_224       | 1  |    pass     |
|           volo_d1_224           | 1  |    pass     |
|         coat_lite_mini          | 1  | fail_to_run |
+---------------------------------+----+-------------+

Compilation latency (sec)

+---------------------------------+----+------------+
|              name               | bs |  inductor  |
+---------------------------------+----+------------+
|          pnasnet5large          | 1  | 486.113205 |
|            hrnet_w18            | 1  | 265.797377 |
|        res2net101_26w_4s        | 1  | 79.569127  |
|           tf_mixnet_l           | 1  | 72.365546  |
|           resnest101e           | 1  | 71.590166  |
|            mixnet_l             | 1  | 68.750612  |
|          cait_m36_384           | 1  | 67.679714  |
|        res2net50_14w_8s         | 1  | 64.291745  |
|        twins_pcpvt_base         | 1  | 61.147212  |
|      xcit_large_24_p8_224       | 1  | 58.386302  |
|         poolformer_m36          | 1  | 56.712634  |
|  swin_base_patch4_window7_224   | 1  | 56.044482  |
|             dpn107              | 1  | 51.116236  |
|        tnt_s_patch16_224        | 1  | 45.578527  |
|          jx_nest_base           | 1  | 44.952361  |
|           mobilevit_s           | 1  | 42.999189  |
|            fbnetv3_b            | 1  | 42.641802  |
|          convnext_base          | 1  | 37.916604  |
|             dla102              | 1  | 37.126658  |
|          inception_v3           | 1  | 36.431132  |
|       gluon_inception_v3        | 1  | 36.423333  |
|        adv_inception_v3         | 1  | 36.410609  |
|          gmlp_s16_224           | 1  | 35.186877  |
|           volo_d1_224           | 1  | 34.943902  |
|          ghostnet_100           | 1  | 34.661838  |
|          gmixer_24_224          | 1  | 34.234001  |
|           res2next50            | 1  | 34.137654  |
|         crossvit_9_240          | 1  |  32.94051  |
|           dm_nfnet_f0           | 1  |  32.35379  |
|            tinynet_a            | 1  | 32.275971  |
|     swsl_resnext101_32x16d      | 1  | 31.984599  |
|        sebotnet33ts_256         | 1  | 31.797143  |
|        eca_halonext26ts         | 1  | 31.746708  |
|            levit_128            | 1  | 30.987938  |
|           rexnet_100            | 1  | 30.461417  |
|        convmixer_768_32         | 1  | 30.086908  |
|            nfnet_l0             | 1  | 30.029456  |
|       tf_efficientnet_b0        | 1  | 29.630194  |
|           convit_base           | 1  | 27.907858  |
|         visformer_small         | 1  | 26.310833  |
|       eca_botnext26ts_256       | 1  | 26.246039  |
|            pit_b_224            | 1  |  25.3838   |
|           regnety_002           | 1  | 25.345505  |
|          cspdarknet53           | 1  | 25.344501  |
|          botnet26t_256          | 1  | 25.263931  |
|      beit_base_patch16_224      | 1  | 24.379828  |
|      mobilenetv3_large_100      | 1  | 23.529532  |
| deit_base_distilled_patch16_224 | 1  | 23.314982  |
|      vit_base_patch16_224       | 1  | 23.257653  |
|           fbnetc_100            | 1  | 23.233616  |
|          spnasnet_100           | 1  | 23.111818  |
|          mixer_b16_224          | 1  | 23.071803  |
|            repvgg_a2            | 1  | 22.349701  |
|            gernet_l             | 1  | 22.305092  |
|         mobilenetv2_100         | 1  | 21.269959  |
|        ese_vovnet19b_dw         | 1  |  21.21583  |
|           mnasnet_100           | 1  | 21.052848  |
|          resmlp_12_224          | 1  | 20.200156  |
|           selecsls42b           | 1  |  19.50081  |
|            lcnet_050            | 1  | 18.392153  |
+---------------------------------+----+------------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.945626 |
|          pnasnet5large          | 1  | 0.928874 |
|        convmixer_768_32         | 1  | 0.918412 |
|            nfnet_l0             | 1  | 0.896098 |
|      xcit_large_24_p8_224       | 1  | 0.890677 |
|        ese_vovnet19b_dw         | 1  | 0.886592 |
|            fbnetv3_b            | 1  | 0.877831 |
|           mnasnet_100           | 1  | 0.877464 |
|          spnasnet_100           | 1  | 0.877314 |
|         mobilenetv2_100         | 1  | 0.877123 |
|       tf_efficientnet_b0        | 1  | 0.873669 |
|           rexnet_100            | 1  | 0.871894 |
|      mobilenetv3_large_100      | 1  | 0.868826 |
|           fbnetc_100            | 1  | 0.86383  |
|            lcnet_050            | 1  | 0.863427 |
|            tinynet_a            | 1  | 0.86277  |
|         poolformer_m36          | 1  | 0.85602  |
|           dm_nfnet_f0           | 1  | 0.854796 |
|       eca_botnext26ts_256       | 1  | 0.854744 |
|           mobilevit_s           | 1  | 0.850068 |
|        eca_halonext26ts         | 1  | 0.845041 |
|           regnety_002           | 1  | 0.842552 |
|          ghostnet_100           | 1  | 0.841944 |
|          botnet26t_256          | 1  | 0.841302 |
|           tf_mixnet_l           | 1  | 0.841225 |
|            mixnet_l             | 1  | 0.834084 |
|          resmlp_12_224          | 1  | 0.828471 |
|         visformer_small         | 1  | 0.821169 |
|           res2next50            | 1  | 0.815311 |
|            levit_128            | 1  | 0.808916 |
|          convnext_base          | 1  | 0.80305  |
|             dpn107              | 1  | 0.800721 |
|        sebotnet33ts_256         | 1  | 0.79937  |
|            hrnet_w18            | 1  | 0.799064 |
|          gmlp_s16_224           | 1  |  0.7961  |
|        res2net50_14w_8s         | 1  | 0.795088 |
|          gmixer_24_224          | 1  | 0.79455  |
|          cspdarknet53           | 1  | 0.793486 |
|           volo_d1_224           | 1  | 0.787251 |
|        tnt_s_patch16_224        | 1  | 0.784233 |
|           convit_base           | 1  | 0.781359 |
|         crossvit_9_240          | 1  | 0.78025  |
|        twins_pcpvt_base         | 1  | 0.780155 |
|          mixer_b16_224          | 1  | 0.777728 |
|             dla102              | 1  | 0.776584 |
|           resnest101e           | 1  | 0.776035 |
|      beit_base_patch16_224      | 1  | 0.771616 |
|          jx_nest_base           | 1  | 0.77157  |
|      vit_base_patch16_224       | 1  | 0.763828 |
|          inception_v3           | 1  | 0.76372  |
|        adv_inception_v3         | 1  | 0.763102 |
|       gluon_inception_v3        | 1  | 0.763059 |
| deit_base_distilled_patch16_224 | 1  | 0.760672 |
|            pit_b_224            | 1  | 0.755853 |
|        res2net101_26w_4s        | 1  | 0.739891 |
|  swin_base_patch4_window7_224   | 1  | 0.739249 |
|           selecsls42b           | 1  | 0.738982 |
|            gernet_l             | 1  | 0.735967 |
|            repvgg_a2            | 1  | 0.690647 |
|     swsl_resnext101_32x16d      | 1  | 0.637989 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 3637.305327 |
|      xcit_large_24_p8_224       | 1  | 1533.357049 |
|     swsl_resnext101_32x16d      | 1  | 443.579265  |
|          pnasnet5large          | 1  | 363.226249  |
|          convnext_base          | 1  | 309.831625  |
|             dpn107              | 1  | 261.679707  |
|        convmixer_768_32         | 1  |  242.31608  |
|          jx_nest_base           | 1  | 231.521541  |
|      beit_base_patch16_224      | 1  | 196.827141  |
| deit_base_distilled_patch16_224 | 1  | 195.801992  |
|      vit_base_patch16_224       | 1  | 195.520967  |
|  swin_base_patch4_window7_224   | 1  | 194.646821  |
|           convit_base           | 1  | 193.813395  |
|            pit_b_224            | 1  | 167.704407  |
|           resnest101e           | 1  |  163.81051  |
|           dm_nfnet_f0           | 1  | 157.176668  |
|          mixer_b16_224          | 1  | 139.738184  |
|         poolformer_m36          | 1  | 136.580768  |
|        res2net101_26w_4s        | 1  | 110.361597  |
|        twins_pcpvt_base         | 1  | 104.590044  |
|           volo_d1_224           | 1  |  93.935035  |
|            nfnet_l0             | 1  |  93.336841  |
|        tnt_s_patch16_224        | 1  |  92.194719  |
|             dla102              | 1  |  88.811578  |
|        sebotnet33ts_256         | 1  |  84.243557  |
|            hrnet_w18            | 1  |  83.670199  |
|          cspdarknet53           | 1  |  81.784398  |
|       gluon_inception_v3        | 1  |  71.871528  |
|          inception_v3           | 1  |  71.863698  |
|        adv_inception_v3         | 1  |  71.641267  |
|          gmlp_s16_224           | 1  |  69.822291  |
|         visformer_small         | 1  |  65.637736  |
|            repvgg_a2            | 1  |  62.588539  |
|        res2net50_14w_8s         | 1  |  62.359588  |
|          gmixer_24_224          | 1  |  62.021473  |
|           res2next50            | 1  |  57.987819  |
|            gernet_l             | 1  |  55.930359  |
|           selecsls42b           | 1  |  44.050795  |
|          botnet26t_256          | 1  |  43.925824  |
|        eca_halonext26ts         | 1  |  43.388579  |
|           mobilevit_s           | 1  |  40.803593  |
|       eca_botnext26ts_256       | 1  |  39.758477  |
|          resmlp_12_224          | 1  |  35.388823  |
|         crossvit_9_240          | 1  |  32.801781  |
|        ese_vovnet19b_dw         | 1  |  30.867582  |
|            mixnet_l             | 1  |  28.318314  |
|           tf_mixnet_l           | 1  |  28.095854  |
|            fbnetv3_b            | 1  |  14.755359  |
|       tf_efficientnet_b0        | 1  |  12.879833  |
|           rexnet_100            | 1  |  12.527138  |
|            tinynet_a            | 1  |  11.30304   |
|           fbnetc_100            | 1  |  8.806679   |
|            levit_128            | 1  |  8.233893   |
|          ghostnet_100           | 1  |  7.847556   |
|          spnasnet_100           | 1  |  7.714229   |
|           mnasnet_100           | 1  |  7.266184   |
|         mobilenetv2_100         | 1  |  7.011404   |
|      mobilenetv3_large_100      | 1  |  6.707049   |
|           regnety_002           | 1  |  5.849064   |
|            lcnet_050            | 1  |  2.246625   |
+---------------------------------+----+-------------+

@zxd1997066
Copy link
Contributor

[default] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-05-06 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main fc183f0bdec30baa6686e13720adae077c332bdd
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+06ad737
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000" 

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 96%, 76/79 | 100%, 46/46 | 100%, 60/60 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.46x    |    1.34x    |    1.91x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   26.43    |    28.94    |    42.55    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.90x    |    0.98x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 11.186752 |
|          squeezenet1_1          |   16    | 3.090442  |
|        timm_efficientnet        |   64    | 2.994559  |
|       mobilenet_v3_large        |   32    | 2.943853  |
|          mobilenet_v2           |   16    |  2.87456  |
|           mnasnet1_0            |   32    | 2.802282  |
|       shufflenet_v2_x1_0        |   64    | 2.556791  |
|          timm_resnest           |   32    | 2.269499  |
|            resnet50             |   32    | 2.167517  |
|        phlippe_densenet         |   128   | 2.026508  |
|        soft_actor_critic        |   256   |  2.01687  |
|           densenet121           |   64    | 1.957225  |
|         resnext50_32x4d         |    8    | 1.949925  |
|            resnet152            |   32    | 1.934516  |
|       doctr_det_predictor       |    1    | 1.903768  |
|         phlippe_resnet          |   128   | 1.889571  |
|           timm_regnet           |   32    | 1.872759  |
|            resnet18             |    8    | 1.866761  |
|             hf_GPT2             |    1    | 1.801121  |
|           timm_nfnet            |   128   | 1.760093  |
|           timm_vovnet           |   32    |  1.64894  |
|             alexnet             |   128   | 1.584754  |
|             yolov3              |    8    | 1.564245  |
|          BERT_pytorch           |    2    | 1.563709  |
|      doctr_reco_predictor       |    1    | 1.550763  |
|          maml_omniglot          |    5    | 1.511447  |
|            moondream            |    1    | 1.506733  |
|             hf_Bert             |    1    | 1.490497  |
|            hf_Albert            |    1    | 1.486202  |
|          hf_Bert_large          |    1    | 1.477228  |
|        basic_gnn_edgecnn        |    1    | 1.474141  |
|          hf_GPT2_large          |    1    | 1.458206  |
|              vgg16              |    4    | 1.455284  |
|          fastNLP_Bert           |    1    | 1.451012  |
|          hf_Longformer          |    1    | 1.385801  |
|         LearningToPaint         |   96    | 1.383032  |
|              llama              |   32    |  1.36192  |
|          basic_gnn_gcn          |    1    | 1.350964  |
|              dcgan              |   256   | 1.338377  |
|     functorch_maml_omniglot     |    1    | 1.331945  |
|      torch_multimodal_clip      |   32    | 1.304172  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  1.28315  |
|          hf_DistilBert          |    1    | 1.274687  |
|             hf_Bart             |    1    | 1.236405  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.222951  |
|          lennard_jones          |  1000   | 1.221169  |
|     timm_vision_transformer     |   32    | 1.219321  |
|        hf_distil_whisper        |    1    | 1.209588  |
|         pytorch_stargan         |   16    | 1.198334  |
|     nvidia_deeprecommender      |   256   | 1.189984  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.184887  |
|           hf_T5_large           |    1    | 1.180296  |
|          pytorch_unet           |    1    | 1.180168  |
|         basic_gnn_sage          |    1    | 1.165443  |
|           hf_BigBird            |    1    | 1.156187  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.138877  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.138046  |
|              dlrm               |  2048   | 1.137141  |
|          basic_gnn_gin          |    1    |  1.12956  |
|              hf_T5              |    1    | 1.114508  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.077148  |
|  timm_vision_transformer_large  |   32    | 1.051174  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.041368  |
|             demucs              |    1    | 1.015996  |
|           tts_angular           |   64    | 1.011067  |
|   mobilenet_v2_quantized_qat    |   96    | 0.996425  |
|           hf_Reformer           |    1    | 0.995394  |
|     resnet50_quantized_qat      |   32    | 0.991891  |
|       speech_transformer        |    1    | 0.983734  |
|               drq               |    1    | 0.954333  |
|           hf_T5_base            |    1    | 0.828721  |
|       Background_Matting        |    1    | 0.814447  |
|              maml               |    1    | 0.736059  |
|         opacus_cifar10          |   64    | 0.731194  |
|      functorch_dp_cifar10       |   64    |  0.66212  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.643185  |
|              moco               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|             yolov3              |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|              dcgan              |    4    |        pass        |
|           densenet121           |    4    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    4    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    4    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    4    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    4    |        pass        |
|           hf_T5_base            |    4    |        pass        |
|             demucs              |    1    |        pass        |
|        hf_distil_whisper        |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|              dlrm               |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|              llama              |    4    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|            resnet152            |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|            moondream            |    4    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_fpn |    4    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    4    |    fail_to_run     |
|           Super_SloMo           |    4    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           densenet121           |   64    | 98.574574 |
|           hf_BigBird            |    1    | 77.14348  |
|    detectron2_fcos_r_50_fpn     |    1    | 70.505592 |
|           hf_T5_large           |    1    | 62.48316  |
|              maml               |    1    | 53.832903 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 53.519518 |
|           timm_nfnet            |   128   | 53.406092 |
|          hf_Longformer          |    1    | 51.306458 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 51.283923 |
|           hf_Reformer           |    1    | 47.29312  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 46.727468 |
|           hf_T5_base            |    1    | 44.957275 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 44.277137 |
|        phlippe_densenet         |   128   | 43.458133 |
|  timm_vision_transformer_large  |   32    | 42.74628  |
|     pyhpc_isoneutral_mixing     | 1048576 | 40.146583 |
|      torch_multimodal_clip      |   32    | 39.506776 |
|       speech_transformer        |    1    | 39.402551 |
|          hf_GPT2_large          |    1    | 36.335438 |
|        timm_efficientnet        |   64    | 36.135567 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 34.767634 |
|             demucs              |    1    | 34.754724 |
|            moondream            |    1    | 33.691561 |
|              hf_T5              |    1    | 31.210348 |
|             yolov3              |    8    | 30.955807 |
|        hf_distil_whisper        |    1    | 30.508257 |
|         opacus_cifar10          |   64    | 29.271959 |
|      functorch_dp_cifar10       |   64    | 27.895254 |
|          timm_resnest           |   32    | 27.298549 |
|       doctr_det_predictor       |    1    | 26.831634 |
|          hf_Bert_large          |    1    | 26.540045 |
|       shufflenet_v2_x1_0        |   64    | 25.286162 |
|       mobilenet_v3_large        |   32    | 24.796181 |
|       Background_Matting        |    1    | 24.065467 |
|          BERT_pytorch           |    2    | 23.786488 |
|           timm_vovnet           |   32    | 22.724505 |
|          fastNLP_Bert           |    1    | 22.719299 |
|             hf_Bart             |    1    | 22.600603 |
|           timm_regnet           |   32    | 22.418012 |
|              llama              |   32    | 22.385777 |
|     timm_vision_transformer     |   32    | 21.894385 |
|          pytorch_unet           |    1    | 21.477232 |
|            hf_Albert            |    1    | 20.416849 |
|             hf_GPT2             |    1    | 19.712311 |
|          hf_DistilBert          |    1    | 19.047166 |
|             hf_Bert             |    1    | 19.043557 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 18.394555 |
|          squeezenet1_1          |   16    | 16.600038 |
|         pytorch_stargan         |   16    | 16.518048 |
|            resnet152            |   32    |  16.169   |
|              vgg16              |    4    | 14.941686 |
|      doctr_reco_predictor       |    1    | 13.665001 |
|             alexnet             |   128   | 12.937592 |
|          basic_gnn_gcn          |    1    | 11.895643 |
|         resnext50_32x4d         |    8    | 11.843515 |
|            resnet50             |   32    | 11.806836 |
|          basic_gnn_gin          |    1    | 11.341939 |
|         basic_gnn_sage          |    1    | 11.312561 |
|               drq               |    1    | 10.89749  |
|              dlrm               |  2048   | 10.825023 |
|            resnet18             |    8    | 10.433104 |
|          mobilenet_v2           |   16    | 10.230486 |
|           mnasnet1_0            |   32    | 10.038256 |
|     functorch_maml_omniglot     |    1    | 9.711522  |
|          maml_omniglot          |    5    | 9.336631  |
|        basic_gnn_edgecnn        |    1    | 9.317818  |
|     pyhpc_equation_of_state     | 1048576 | 9.257615  |
|         LearningToPaint         |   96    | 9.207478  |
|     nvidia_deeprecommender      |   256   |  8.68413  |
|         phlippe_resnet          |   128   | 8.682107  |
|        soft_actor_critic        |   256   | 6.849747  |
|          lennard_jones          |  1000   | 6.739343  |
|              dcgan              |   256   |  5.72535  |
|           tts_angular           |   64    | 5.608905  |
|   mobilenet_v2_quantized_qat    |   96    | 0.110289  |
|     resnet50_quantized_qat      |   32    |  0.07832  |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|           timm_nfnet            |   128   | 0.99251  |
|              dlrm               |  2048   | 0.988063 |
|           hf_T5_base            |    1    | 0.987575 |
|        timm_efficientnet        |   64    | 0.986406 |
|       Background_Matting        |    1    | 0.982529 |
|             demucs              |    1    | 0.98007  |
|  timm_vision_transformer_large  |   32    | 0.979724 |
|          pytorch_unet           |    1    | 0.97836  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.977412 |
|          hf_GPT2_large          |    1    | 0.97715  |
|           densenet121           |   64    | 0.976916 |
|             yolov3              |    8    | 0.974388 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973743 |
|        basic_gnn_edgecnn        |    1    | 0.971081 |
|            resnet50             |   32    | 0.970979 |
|         LearningToPaint         |   96    | 0.969159 |
|            resnet152            |   32    | 0.968719 |
|          timm_resnest           |   32    | 0.968331 |
|      torch_multimodal_clip      |   32    | 0.965914 |
|           timm_regnet           |   32    | 0.965389 |
|       doctr_det_predictor       |    1    | 0.964974 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.964693 |
|           timm_vovnet           |   32    | 0.96459  |
|   mobilenet_v2_quantized_qat    |   96    | 0.962683 |
|     resnet50_quantized_qat      |   32    | 0.962152 |
|           mnasnet1_0            |   32    | 0.961049 |
|     timm_vision_transformer     |   32    |  0.9595  |
|          mobilenet_v2           |   16    | 0.956654 |
|       shufflenet_v2_x1_0        |   64    | 0.952959 |
|       mobilenet_v3_large        |   32    | 0.952738 |
|           hf_BigBird            |    1    | 0.95056  |
|         pytorch_stargan         |   16    | 0.947357 |
|        phlippe_densenet         |   128   | 0.944697 |
|         resnext50_32x4d         |    8    | 0.944341 |
|      doctr_reco_predictor       |    1    | 0.937785 |
|          basic_gnn_gcn          |    1    | 0.937336 |
|              llama              |   32    | 0.933065 |
|           tts_angular           |   64    | 0.920115 |
|          squeezenet1_1          |   16    | 0.915241 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.913502 |
|        hf_distil_whisper        |    1    | 0.912892 |
|              dcgan              |   256   | 0.912892 |
|     pyhpc_equation_of_state     | 1048576 | 0.907923 |
|            resnet18             |    8    | 0.900356 |
|         phlippe_resnet          |   128   | 0.899984 |
|             alexnet             |   128   | 0.896416 |
|         opacus_cifar10          |   64    | 0.89061  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.885219 |
|        soft_actor_critic        |   256   | 0.880146 |
|          lennard_jones          |  1000   | 0.863585 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.863197 |
|          maml_omniglot          |    5    | 0.860963 |
|     functorch_maml_omniglot     |    1    | 0.860341 |
|         basic_gnn_sage          |    1    | 0.858664 |
|          basic_gnn_gin          |    1    | 0.857282 |
|          fastNLP_Bert           |    1    | 0.849776 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.849523 |
|            moondream            |    1    | 0.824115 |
|       speech_transformer        |    1    | 0.818354 |
|          hf_Bert_large          |    1    | 0.809769 |
|      functorch_dp_cifar10       |   64    | 0.803834 |
|          hf_Longformer          |    1    | 0.803405 |
|           hf_T5_large           |    1    | 0.803246 |
|          BERT_pytorch           |    2    | 0.800736 |
|              maml               |    1    | 0.795605 |
|             hf_Bert             |    1    | 0.795175 |
|            hf_Albert            |    1    | 0.79132  |
|     nvidia_deeprecommender      |   256   | 0.775668 |
|              vgg16              |    4    | 0.773261 |
|             hf_Bart             |    1    | 0.769168 |
|               drq               |    1    | 0.767758 |
|          hf_DistilBert          |    1    | 0.763902 |
|             hf_GPT2             |    1    | 0.760942 |
|              hf_T5              |    1    | 0.758049 |
|           hf_Reformer           |    1    | 0.739914 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.692509 |
|        timm_efficientdet        |    0    |   0.0    |
|              moco               |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4458.159389 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1450.88823  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1322.219665 |
|           hf_T5_base            |    1    | 1268.594497 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1179.759818 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1097.983105 |
|          hf_GPT2_large          |    1    | 581.348281  |
|           timm_nfnet            |   128   |  566.55886  |
|           hf_T5_large           |    1    | 415.480578  |
|            moondream            |    1    | 407.141677  |
|        hf_distil_whisper        |    1    |   359.403   |
|       Background_Matting        |    1    |  352.68947  |
|          pytorch_unet           |    1    | 232.593156  |
|           timm_regnet           |   32    | 221.265401  |
|            resnet152            |   32    | 195.378496  |
|           densenet121           |   64    | 192.152077  |
|    detectron2_fcos_r_50_fpn     |    1    | 185.246375  |
|             yolov3              |    8    | 161.717767  |
|      torch_multimodal_clip      |   32    | 151.721677  |
|             demucs              |    1    | 144.024417  |
|           hf_BigBird            |    1    | 132.059911  |
|           timm_vovnet           |   32    | 118.882688  |
|          hf_Bert_large          |    1    | 112.489254  |
|         pytorch_stargan         |   16    |  96.64045   |
|     timm_vision_transformer     |   32    |  91.651267  |
|       doctr_det_predictor       |    1    |  88.014653  |
|            resnet50             |   32    |  77.20591   |
|          hf_Longformer          |    1    |  73.075555  |
|             hf_Bart             |    1    |  55.189477  |
|       speech_transformer        |    1    |  54.983743  |
|          timm_resnest           |   32    |  54.174771  |
|        timm_efficientnet        |   64    |  49.63824   |
|              maml               |    1    |  49.260561  |
|              hf_T5              |    1    |  44.494698  |
|             alexnet             |   128   |  43.998583  |
|   mobilenet_v2_quantized_qat    |   96    |  43.351636  |
|             hf_Bert             |    1    |  41.782018  |
|         LearningToPaint         |   96    |  37.996427  |
|           hf_Reformer           |    1    |  37.367509  |
|            hf_Albert            |    1    |  37.02237   |
|          fastNLP_Bert           |    1    |  34.94684   |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  34.042969  |
|              vgg16              |    4    |  32.847844  |
|     nvidia_deeprecommender      |   256   |  30.776548  |
|     pyhpc_isoneutral_mixing     | 1048576 |  29.937813  |
|          hf_DistilBert          |    1    |  27.961926  |
|     resnet50_quantized_qat      |   32    |  25.544027  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  25.364119  |
|             hf_GPT2             |    1    |  24.423957  |
|              llama              |   32    |  23.017083  |
|         resnext50_32x4d         |    8    |  22.96505   |
|          BERT_pytorch           |    2    |  21.083076  |
|           tts_angular           |   64    |  20.711099  |
|        phlippe_densenet         |   128   |  19.811917  |
|        basic_gnn_edgecnn        |    1    |  19.358296  |
|              dcgan              |   256   |  19.157939  |
|       shufflenet_v2_x1_0        |   64    |  15.583028  |
|           mnasnet1_0            |   32    |  14.469346  |
|       mobilenet_v3_large        |   32    |  13.194919  |
|          basic_gnn_gcn          |    1    |  9.861843   |
|      functorch_dp_cifar10       |   64    |  9.654813   |
|         opacus_cifar10          |   64    |  9.362123   |
|          mobilenet_v2           |   16    |  9.209157   |
|            resnet18             |    8    |  9.060898   |
|              dlrm               |  2048   |  6.766702   |
|          squeezenet1_1          |   16    |  5.645363   |
|         basic_gnn_sage          |    1    |  5.598593   |
|          basic_gnn_gin          |    1    |  5.311994   |
|         phlippe_resnet          |   128   |  4.102917   |
|      doctr_reco_predictor       |    1    |  3.513819   |
|     pyhpc_equation_of_state     | 1048576 |  1.109207   |
|               drq               |    1    |  0.966758   |
|     functorch_maml_omniglot     |    1    |  0.510645   |
|          maml_omniglot          |    5    |  0.393774   |
|        soft_actor_critic        |   256   |  0.309211   |
|          lennard_jones          |  1000   |   0.18832   |
|        timm_efficientdet        |    0    |     0.0     |
|         DALLE2_pytorch          |    0    |     0.0     |
|              moco               |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.642331 |
|     MobileBertForQuestionAnswering      | 128 | 1.949027 |
|           ElectraForCausalLM            | 32  | 1.923362 |
|      GPT2ForSequenceClassification      |  4  | 1.910664 |
|       ElectraForQuestionAnswering       | 64  | 1.744785 |
|          MobileBertForMaskedLM          | 128 | 1.682657 |
|               DistillGPT2               | 16  | 1.516808 |
|       RobertaForQuestionAnswering       | 16  | 1.462503 |
|            YituTechConvBert             | 16  | 1.434621 |
|    LayoutLMForSequenceClassification    | 16  | 1.422052 |
|        BertForQuestionAnswering         | 16  | 1.417921 |
|               GoogleFnet                | 16  | 1.412852 |
|           RobertaForCausalLM            | 16  | 1.402202 |
|    MegatronBertForQuestionAnswering     |  8  | 1.377179 |
|           LayoutLMForMaskedLM           | 16  | 1.373278 |
|             BertForMaskedLM             | 16  | 1.356303 |
|          AllenaiLongformerBase          |  4  | 1.354067 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.351167 |
|                CamemBert                | 16  | 1.347574 |
|         MegatronBertForCausalLM         |  4  | 1.33725  |
|     PLBartForConditionalGeneration      |  4  | 1.281191 |
|             XGLMForCausalLM             |  8  | 1.278798 |
|           DebertaForMaskedLM            |  8  | 1.275094 |
|       DebertaForQuestionAnswering       | 16  | 1.255698 |
|       AlbertForQuestionAnswering        |  4  | 1.254315 |
|            AlbertForMaskedLM            |  4  | 1.244007 |
|          BlenderbotForCausalLM          |  4  | 1.227528 |
|      MBartForConditionalGeneration      |  2  | 1.221295 |
| BlenderbotSmallForConditionalGeneration | 64  | 1.214994 |
|             OPTForCausalLM              |  2  | 1.213521 |
|         Speech2Text2ForCausalLM         | 256 | 1.19538  |
|          DistilBertForMaskedLM          | 128 | 1.193283 |
|          DebertaV2ForMaskedLM           |  2  | 1.185658 |
|     DistilBertForQuestionAnswering      | 256 | 1.165667 |
|       MT5ForConditionalGeneration       | 16  | 1.161368 |
|     M2M100ForConditionalGeneration      | 16  | 1.160347 |
|     PegasusForConditionalGeneration     | 32  | 1.144384 |
|       BlenderbotSmallForCausalLM        | 64  | 1.141173 |
|             BartForCausalLM             |  4  | 1.127503 |
|      BartForConditionalGeneration       |  2  | 1.124206 |
|           PegasusForCausalLM            | 32  | 1.102547 |
|            MBartForCausalLM             |  4  | 1.077679 |
|            TrOCRForCausalLM             | 32  | 1.072241 |
|            PLBartForCausalLM            |  8  | 1.059217 |
|       T5ForConditionalGeneration        |  4  | 1.00131  |
|                 T5Small                 |  4  | 0.97611  |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|          AllenaiLongformerBase          |  4  | 94.515067 |
|          MobileBertForMaskedLM          | 128 | 46.840679 |
|     MobileBertForQuestionAnswering      | 128 | 44.499921 |
|     PegasusForConditionalGeneration     | 32  | 43.740218 |
|     M2M100ForConditionalGeneration      | 16  | 43.368356 |
|       MT5ForConditionalGeneration       | 16  | 42.696219 |
|      MBartForConditionalGeneration      |  2  | 42.250231 |
|                 T5Small                 |  4  | 38.917252 |
|          BlenderbotForCausalLM          |  4  | 38.789907 |
|       T5ForConditionalGeneration        |  4  | 38.724563 |
|             XGLMForCausalLM             |  8  | 34.939155 |
|          DebertaV2ForMaskedLM           |  2  | 34.424391 |
| BlenderbotSmallForConditionalGeneration | 64  | 33.622612 |
|      DebertaV2ForQuestionAnswering      |  1  | 32.105603 |
|     PLBartForConditionalGeneration      |  4  | 31.347862 |
|      BartForConditionalGeneration       |  2  | 30.519797 |
|            YituTechConvBert             | 16  | 30.347498 |
|         MegatronBertForCausalLM         |  4  | 29.950565 |
|    MegatronBertForQuestionAnswering     |  8  | 28.509412 |
|             OPTForCausalLM              |  2  | 27.693795 |
|           PegasusForCausalLM            | 32  | 26.256617 |
|            MBartForCausalLM             |  4  | 25.492264 |
|            TrOCRForCausalLM             | 32  | 24.903591 |
|           DebertaForMaskedLM            |  8  | 24.261271 |
|       DebertaForQuestionAnswering       | 16  | 23.414588 |
|           RobertaForCausalLM            | 16  | 21.744295 |
|           ElectraForCausalLM            | 32  | 21.603614 |
|      GPT2ForSequenceClassification      |  4  | 21.407819 |
|          DistilBertForMaskedLM          | 128 | 21.355355 |
|            AlbertForMaskedLM            |  4  | 21.184796 |
|                CamemBert                | 16  | 21.062728 |
|       BlenderbotSmallForCausalLM        | 64  | 20.936202 |
|         Speech2Text2ForCausalLM         | 256 | 20.406964 |
|     DistilBertForQuestionAnswering      | 256 | 20.351164 |
|            PLBartForCausalLM            |  8  | 20.091997 |
|             BertForMaskedLM             | 16  | 19.809078 |
|           LayoutLMForMaskedLM           | 16  | 19.801941 |
|       AlbertForQuestionAnswering        |  4  | 19.733376 |
|       ElectraForQuestionAnswering       | 64  | 19.728123 |
|       RobertaForQuestionAnswering       | 16  | 19.547031 |
|    LayoutLMForSequenceClassification    | 16  | 19.432253 |
|               DistillGPT2               | 16  | 19.380242 |
|             BartForCausalLM             |  4  | 19.217346 |
|               GoogleFnet                | 16  | 18.647136 |
|        BertForQuestionAnswering         | 16  | 18.400382 |
|            XLNetLMHeadModel             |  8  | 15.350487 |
+-----------------------------------------+-----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  | 0.994437 |
|       AlbertForQuestionAnswering        |  4  | 0.994353 |
|     DistilBertForQuestionAnswering      | 256 | 0.993274 |
|             OPTForCausalLM              |  2  | 0.993069 |
|           RobertaForCausalLM            | 16  | 0.992442 |
|            TrOCRForCausalLM             | 32  | 0.992294 |
|               DistillGPT2               | 16  | 0.991842 |
|          DistilBertForMaskedLM          | 128 | 0.991805 |
|               GoogleFnet                | 16  | 0.991121 |
|           ElectraForCausalLM            | 32  | 0.990989 |
|       ElectraForQuestionAnswering       | 64  | 0.990783 |
|            PLBartForCausalLM            |  8  | 0.990671 |
|                CamemBert                | 16  | 0.990547 |
|             BertForMaskedLM             | 16  | 0.990383 |
|            MBartForCausalLM             |  4  | 0.989828 |
|           LayoutLMForMaskedLM           | 16  | 0.989826 |
|       RobertaForQuestionAnswering       | 16  | 0.98881  |
|            YituTechConvBert             | 16  | 0.988803 |
|    LayoutLMForSequenceClassification    | 16  | 0.988682 |
|         Speech2Text2ForCausalLM         | 256 | 0.988666 |
|    MegatronBertForQuestionAnswering     |  8  | 0.988624 |
|        BertForQuestionAnswering         | 16  | 0.988614 |
|       DebertaForQuestionAnswering       | 16  | 0.988572 |
|     PLBartForConditionalGeneration      |  4  | 0.988095 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.988036 |
|      GPT2ForSequenceClassification      |  4  | 0.987649 |
|           PegasusForCausalLM            | 32  | 0.987332 |
|          MobileBertForMaskedLM          | 128 | 0.987155 |
|       BlenderbotSmallForCausalLM        | 64  | 0.98655  |
|             BartForCausalLM             |  4  | 0.986055 |
|            XLNetLMHeadModel             |  8  | 0.985244 |
|           DebertaForMaskedLM            |  8  | 0.985172 |
|         MegatronBertForCausalLM         |  4  | 0.984141 |
|       T5ForConditionalGeneration        |  4  | 0.984136 |
|                 T5Small                 |  4  | 0.983833 |
|     MobileBertForQuestionAnswering      | 128 | 0.982698 |
|          AllenaiLongformerBase          |  4  | 0.978499 |
|     PegasusForConditionalGeneration     | 32  | 0.977279 |
|      MBartForConditionalGeneration      |  2  | 0.976111 |
|      BartForConditionalGeneration       |  2  | 0.968408 |
|       MT5ForConditionalGeneration       | 16  | 0.963659 |
|     M2M100ForConditionalGeneration      | 16  | 0.927444 |
|          DebertaV2ForMaskedLM           |  2  | 0.907388 |
|             XGLMForCausalLM             |  8  | 0.874849 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.869467 |
|          BlenderbotForCausalLM          |  4  | 0.843614 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|            AlbertForMaskedLM            |  4  | 2523.98021  |
|       AlbertForQuestionAnswering        |  4  | 2509.956032 |
|            XLNetLMHeadModel             |  8  | 1293.395053 |
|            TrOCRForCausalLM             | 32  | 974.332085  |
|     PegasusForConditionalGeneration     | 32  |  939.71002  |
|     DistilBertForQuestionAnswering      | 256 | 849.227001  |
|    MegatronBertForQuestionAnswering     |  8  | 751.695785  |
|      MBartForConditionalGeneration      |  2  | 670.435445  |
|            MBartForCausalLM             |  4  | 666.937103  |
|          DistilBertForMaskedLM          | 128 | 635.861051  |
|           RobertaForCausalLM            | 16  | 609.028388  |
|      BartForConditionalGeneration       |  2  | 593.414902  |
|             OPTForCausalLM              |  2  | 585.147334  |
|          BlenderbotForCausalLM          |  4  |  585.10315  |
|     M2M100ForConditionalGeneration      | 16  | 580.907847  |
|          DebertaV2ForMaskedLM           |  2  | 578.157411  |
|            YituTechConvBert             | 16  | 577.248382  |
|                CamemBert                | 16  | 565.701006  |
|             BertForMaskedLM             | 16  | 558.416378  |
|           LayoutLMForMaskedLM           | 16  | 555.111118  |
|          AllenaiLongformerBase          |  4  |  523.9516   |
|       DebertaForQuestionAnswering       | 16  | 523.095068  |
|            PLBartForCausalLM            |  8  | 518.487967  |
|             BartForCausalLM             |  4  |  497.47575  |
| BlenderbotSmallForConditionalGeneration | 64  | 471.046534  |
|           PegasusForCausalLM            | 32  |  470.82609  |
|     PLBartForConditionalGeneration      |  4  |  463.85925  |
|        BertForQuestionAnswering         | 16  | 447.433811  |
|    LayoutLMForSequenceClassification    | 16  | 447.292297  |
|         MegatronBertForCausalLM         |  4  | 439.295504  |
|       RobertaForQuestionAnswering       | 16  | 425.868319  |
|                 T5Small                 |  4  | 407.190654  |
|               GoogleFnet                | 16  | 400.331415  |
|       T5ForConditionalGeneration        |  4  | 396.601812  |
|               DistillGPT2               | 16  | 387.544764  |
|          MobileBertForMaskedLM          | 128 | 379.263853  |
|           DebertaForMaskedLM            |  8  | 349.299331  |
|             XGLMForCausalLM             |  8  |  332.02241  |
|       ElectraForQuestionAnswering       | 64  |  323.91666  |
|       BlenderbotSmallForCausalLM        | 64  | 270.889248  |
|      GPT2ForSequenceClassification      |  4  | 258.313927  |
|         Speech2Text2ForCausalLM         | 256 | 257.769619  |
|      DebertaV2ForQuestionAnswering      |  1  | 241.506581  |
|       MT5ForConditionalGeneration       | 16  |  235.89164  |
|           ElectraForCausalLM            | 32  | 229.446018  |
|     MobileBertForQuestionAnswering      | 128 |  225.75036  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           fbnetc_100            | 512  | 3.931372 |
|            lcnet_050            | 256  | 3.871837 |
|           mnasnet_100           | 512  | 3.844241 |
|         mobilenetv2_100         | 128  | 3.761202 |
|      mobilenetv3_large_100      | 512  | 3.583824 |
|          spnasnet_100           | 128  | 3.54718  |
|            fbnetv3_b            | 256  | 3.401955 |
|           regnety_002           | 1024 | 3.269489 |
|           rexnet_100            | 256  | 3.081274 |
|       tf_efficientnet_b0        | 128  | 2.952073 |
|            tinynet_a            | 128  | 2.763909 |
|        ese_vovnet19b_dw         | 256  | 2.598478 |
|          pnasnet5large          |  16  | 2.594916 |
|            hrnet_w18            | 128  | 2.530393 |
|          botnet26t_256          | 128  | 2.50706  |
|           res2next50            | 128  | 2.386044 |
|          ghostnet_100           | 512  | 2.316504 |
|       eca_botnext26ts_256       | 128  | 2.308799 |
|       gluon_inception_v3        | 256  | 2.249178 |
|          inception_v3           | 128  | 2.200322 |
|           resnest101e           |  64  | 2.199182 |
|             dla102              | 128  | 2.175545 |
|        adv_inception_v3         | 128  | 2.174452 |
|        eca_halonext26ts         | 128  | 2.167506 |
|        res2net50_14w_8s         | 128  | 2.112108 |
|        res2net101_26w_4s        | 128  | 2.101049 |
|          cspdarknet53           |  64  | 2.031624 |
|            repvgg_a2            | 128  | 2.003698 |
|            nfnet_l0             | 128  | 1.954943 |
|        convmixer_768_32         |  32  | 1.94824  |
|            gernet_l             | 128  | 1.888558 |
|           tf_mixnet_l           | 128  | 1.874402 |
|           dm_nfnet_f0           | 128  | 1.781322 |
|           selecsls42b           | 128  | 1.765147 |
|        sebotnet33ts_256         |  64  | 1.752209 |
|            mixnet_l             | 128  | 1.721802 |
|           volo_d1_224           |  64  | 1.696393 |
|           mobilevit_s           |  64  | 1.679228 |
|         visformer_small         | 128  | 1.63443  |
|         poolformer_m36          |  64  | 1.626384 |
|     swsl_resnext101_32x16d      |  32  | 1.589809 |
|           convit_base           |  64  | 1.544051 |
|             dpn107              |  64  | 1.492609 |
|            levit_128            | 1024 | 1.459376 |
|          gmlp_s16_224           | 128  | 1.386802 |
|      xcit_large_24_p8_224       |  16  | 1.334085 |
|          gmixer_24_224          | 128  | 1.327803 |
|  swin_base_patch4_window7_224   |  64  | 1.275019 |
|        twins_pcpvt_base         | 128  | 1.217048 |
|          mixer_b16_224          | 128  | 1.211686 |
|        tnt_s_patch16_224        | 128  | 1.197513 |
|          convnext_base          |  64  | 1.184228 |
|      beit_base_patch16_224      |  64  | 1.171964 |
| deit_base_distilled_patch16_224 |  64  | 1.149566 |
|      vit_base_patch16_224       |  64  | 1.149049 |
|          cait_m36_384           |  4   | 1.140425 |
|            pit_b_224            |  64  | 1.109138 |
|          jx_nest_base           |  32  | 1.081273 |
|         crossvit_9_240          | 256  | 1.061381 |
|          resmlp_12_224          | 128  | 0.749781 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|        adv_inception_v3         | 8  |    pass     |
|      beit_base_patch16_224      | 8  |    pass     |
|          botnet26t_256          | 8  |    pass     |
|          cait_m36_384           | 8  |    pass     |
|           convit_base           | 8  |    pass     |
|        convmixer_768_32         | 8  |    pass     |
|          convnext_base          | 8  |    pass     |
|         crossvit_9_240          | 8  |    pass     |
|          cspdarknet53           | 8  |    pass     |
| deit_base_distilled_patch16_224 | 8  |    pass     |
|             dla102              | 8  |    pass     |
|           dm_nfnet_f0           | 8  |    pass     |
|             dpn107              | 8  |    pass     |
|       eca_botnext26ts_256       | 8  |    pass     |
|        eca_halonext26ts         | 8  |    pass     |
|        ese_vovnet19b_dw         | 8  |    pass     |
|           fbnetc_100            | 8  |    pass     |
|            fbnetv3_b            | 8  |    pass     |
|            gernet_l             | 8  |    pass     |
|          ghostnet_100           | 8  |    pass     |
|       gluon_inception_v3        | 8  |    pass     |
|          gmixer_24_224          | 8  |    pass     |
|          gmlp_s16_224           | 8  |    pass     |
|            hrnet_w18            | 8  |    pass     |
|          inception_v3           | 8  |    pass     |
|          jx_nest_base           | 8  |    pass     |
|            lcnet_050            | 8  |    pass     |
|            levit_128            | 8  |    pass     |
|      xcit_large_24_p8_224       | 8  |    pass     |
|          mixer_b16_224          | 8  |    pass     |
|            mixnet_l             | 8  |    pass     |
|           mnasnet_100           | 8  |    pass     |
|         mobilenetv2_100         | 8  |    pass     |
|      mobilenetv3_large_100      | 8  |    pass     |
|           mobilevit_s           | 8  |    pass     |
|            nfnet_l0             | 8  |    pass     |
|            pit_b_224            | 8  |    pass     |
|          pnasnet5large          | 8  |    pass     |
|         poolformer_m36          | 8  |    pass     |
|           regnety_002           | 8  |    pass     |
|            repvgg_a2            | 8  |    pass     |
|        res2net101_26w_4s        | 8  |    pass     |
|        res2net50_14w_8s         | 8  |    pass     |
|           res2next50            | 8  |    pass     |
|          resmlp_12_224          | 8  |    pass     |
|           resnest101e           | 8  |    pass     |
|           rexnet_100            | 8  |    pass     |
|        sebotnet33ts_256         | 8  |    pass     |
|           selecsls42b           | 8  |    pass     |
|          spnasnet_100           | 8  |    pass     |
|  swin_base_patch4_window7_224   | 8  |    pass     |
|     swsl_resnext101_32x16d      | 8  |    pass     |
|       tf_efficientnet_b0        | 8  |    pass     |
|           tf_mixnet_l           | 8  |    pass     |
|            tinynet_a            | 8  |    pass     |
|        tnt_s_patch16_224        | 8  |    pass     |
|        twins_pcpvt_base         | 8  |    pass     |
|         visformer_small         | 8  |    pass     |
|      vit_base_patch16_224       | 8  |    pass     |
|           volo_d1_224           | 8  |    pass     |
|         coat_lite_mini          | 8  | fail_to_run |
+---------------------------------+----+-------------+

Compilation latency (sec)

+---------------------------------+------+-----------+
|              name               |  bs  | inductor  |
+---------------------------------+------+-----------+
|          pnasnet5large          |  16  | 95.405403 |
|  swin_base_patch4_window7_224   |  64  | 90.937476 |
|           mobilevit_s           |  64  | 85.010102 |
|           tf_mixnet_l           | 128  | 79.671263 |
|             dpn107              |  64  | 78.542206 |
|        twins_pcpvt_base         | 128  | 71.953437 |
|           rexnet_100            | 256  | 70.621991 |
|        eca_halonext26ts         | 128  | 67.676238 |
|        res2net50_14w_8s         | 128  | 65.373495 |
|        sebotnet33ts_256         |  64  | 65.11767  |
|      xcit_large_24_p8_224       |  16  | 64.917991 |
|          jx_nest_base           |  32  | 64.835915 |
|          ghostnet_100           | 512  | 64.123309 |
|            levit_128            | 1024 | 63.146422 |
|          cait_m36_384           |  4   | 62.430109 |
|        tnt_s_patch16_224        | 128  | 59.270101 |
|            mixnet_l             | 128  | 57.977893 |
|         crossvit_9_240          | 256  | 57.540743 |
|           dm_nfnet_f0           | 128  | 56.596429 |
|         poolformer_m36          |  64  | 56.564284 |
|       eca_botnext26ts_256       | 128  | 53.857627 |
|           volo_d1_224           |  64  | 51.33121  |
|          convnext_base          |  64  | 45.131894 |
|        res2net101_26w_4s        | 128  | 45.098906 |
|       tf_efficientnet_b0        | 128  | 44.370858 |
|            nfnet_l0             | 128  | 43.743498 |
|            hrnet_w18            | 128  | 43.393409 |
|       gluon_inception_v3        | 256  | 42.06222  |
|           convit_base           |  64  | 41.357835 |
|          botnet26t_256          | 128  | 40.135466 |
|        adv_inception_v3         | 128  | 39.945737 |
|          inception_v3           | 128  | 39.932258 |
|           res2next50            | 128  | 38.967354 |
|            tinynet_a            | 128  | 38.061125 |
|            pit_b_224            |  64  | 36.600257 |
|           resnest101e           |  64  | 32.955511 |
|            fbnetv3_b            | 256  | 32.213171 |
|         visformer_small         | 128  | 30.350447 |
|          gmlp_s16_224           | 128  | 29.686651 |
|        ese_vovnet19b_dw         | 256  | 29.640883 |
|          cspdarknet53           |  64  | 29.557525 |
|             dla102              | 128  | 29.034938 |
|          gmixer_24_224          | 128  | 28.478483 |
|      mobilenetv3_large_100      | 512  | 27.248478 |
|      vit_base_patch16_224       |  64  | 23.047192 |
|      beit_base_patch16_224      |  64  | 21.992896 |
|          mixer_b16_224          | 128  | 21.878277 |
| deit_base_distilled_patch16_224 |  64  | 21.792315 |
|           regnety_002           | 1024 | 20.960516 |
|          resmlp_12_224          | 128  | 19.473978 |
|        convmixer_768_32         |  32  | 17.608071 |
|           selecsls42b           | 128  | 17.033365 |
|            repvgg_a2            | 128  | 16.917929 |
|            lcnet_050            | 256  | 14.716653 |
|     swsl_resnext101_32x16d      |  32  | 13.887829 |
|          spnasnet_100           | 128  | 11.084832 |
|         mobilenetv2_100         | 128  | 10.943169 |
|            gernet_l             | 128  | 10.635288 |
|           fbnetc_100            | 512  |   10.55   |
|           mnasnet_100           | 512  | 9.810286  |
+---------------------------------+------+-----------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.997487 |
|           fbnetc_100            | 512  | 0.99664  |
|            fbnetv3_b            | 256  | 0.99648  |
|      mobilenetv3_large_100      | 512  | 0.996351 |
|           mnasnet_100           | 512  | 0.995546 |
|           regnety_002           | 1024 | 0.995149 |
|           dm_nfnet_f0           | 128  | 0.995084 |
|          ghostnet_100           | 512  | 0.994913 |
|       eca_botnext26ts_256       | 128  | 0.994289 |
|            levit_128            | 1024 | 0.993948 |
|          convnext_base          |  64  | 0.993858 |
|           rexnet_100            | 256  | 0.993382 |
|        eca_halonext26ts         | 128  | 0.993378 |
|            nfnet_l0             | 128  | 0.993064 |
|        res2net101_26w_4s        | 128  | 0.992985 |
|          botnet26t_256          | 128  | 0.992754 |
|           res2next50            | 128  | 0.992744 |
|             dpn107              |  64  | 0.992657 |
|           tf_mixnet_l           | 128  | 0.992589 |
|        convmixer_768_32         |  32  | 0.992567 |
|        twins_pcpvt_base         | 128  | 0.992302 |
|       tf_efficientnet_b0        | 128  | 0.992244 |
|          cspdarknet53           |  64  | 0.992097 |
|          gmlp_s16_224           | 128  | 0.991828 |
|       gluon_inception_v3        | 256  | 0.991734 |
|            gernet_l             | 128  | 0.991537 |
|         visformer_small         | 128  | 0.991418 |
|            mixnet_l             | 128  |  0.9912  |
|          mixer_b16_224          | 128  | 0.990853 |
|          gmixer_24_224          | 128  | 0.990852 |
|           mobilevit_s           |  64  | 0.990834 |
|        sebotnet33ts_256         |  64  | 0.990805 |
|      xcit_large_24_p8_224       |  16  | 0.990472 |
|         mobilenetv2_100         | 128  | 0.990439 |
|        res2net50_14w_8s         | 128  | 0.990222 |
|             dla102              | 128  | 0.989479 |
|           selecsls42b           | 128  | 0.989276 |
|        adv_inception_v3         | 128  | 0.989065 |
|          inception_v3           | 128  | 0.988994 |
|           convit_base           |  64  | 0.988704 |
|  swin_base_patch4_window7_224   |  64  | 0.988422 |
|          spnasnet_100           | 128  | 0.988185 |
|      beit_base_patch16_224      |  64  | 0.988149 |
|         poolformer_m36          |  64  | 0.987364 |
|            tinynet_a            | 128  | 0.987341 |
|            hrnet_w18            | 128  | 0.987322 |
|        tnt_s_patch16_224        | 128  | 0.987201 |
|      vit_base_patch16_224       |  64  | 0.986243 |
|          resmlp_12_224          | 128  | 0.98623  |
| deit_base_distilled_patch16_224 |  64  | 0.986113 |
|           resnest101e           |  64  | 0.985189 |
|          pnasnet5large          |  16  | 0.984397 |
|            lcnet_050            | 256  | 0.984188 |
|            pit_b_224            |  64  | 0.984047 |
|            repvgg_a2            | 128  | 0.981154 |
|          jx_nest_base           |  32  | 0.980158 |
|     swsl_resnext101_32x16d      |  32  | 0.979793 |
|           volo_d1_224           |  64  | 0.978895 |
|          cait_m36_384           |  4   | 0.978458 |
|         crossvit_9_240          | 256  | 0.967708 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+-------------+
|              name               |  bs  |  inductor   |
+---------------------------------+------+-------------+
|      xcit_large_24_p8_224       |  16  | 1331.043876 |
|          cait_m36_384           |  4   | 1139.641765 |
|          convnext_base          |  64  | 1053.728463 |
|           dm_nfnet_f0           | 128  |  998.72541  |
|          mixer_b16_224          | 128  |  939.60154  |
|             dpn107              |  64  | 937.544955  |
|       gluon_inception_v3        | 256  | 834.617091  |
|        tnt_s_patch16_224        | 128  | 763.794654  |
|        twins_pcpvt_base         | 128  | 741.706585  |
|  swin_base_patch4_window7_224   |  64  | 738.430406  |
|           convit_base           |  64  | 715.862157  |
|        res2net101_26w_4s        | 128  | 656.441371  |
|     swsl_resnext101_32x16d      |  32  | 647.774675  |
| deit_base_distilled_patch16_224 |  64  | 636.926343  |
|      vit_base_patch16_224       |  64  | 633.717513  |
|      beit_base_patch16_224      |  64  | 628.356451  |
|            nfnet_l0             | 128  | 623.384892  |
|            levit_128            | 1024 | 568.291227  |
|        ese_vovnet19b_dw         | 256  | 558.509578  |
|             dla102              | 128  | 537.852015  |
|            pit_b_224            |  64  | 526.514473  |
|          gmlp_s16_224           | 128  | 517.179569  |
|          jx_nest_base           |  32  |  506.09021  |
|           resnest101e           |  64  | 492.451653  |
|          gmixer_24_224          | 128  |  492.14304  |
|         crossvit_9_240          | 256  | 487.310474  |
|         poolformer_m36          |  64  | 483.418468  |
|        convmixer_768_32         |  32  | 445.373593  |
|            hrnet_w18            | 128  | 438.818824  |
|          resmlp_12_224          | 128  | 426.785736  |
|          inception_v3           | 128  | 419.937564  |
|           volo_d1_224           |  64  | 419.827417  |
|        adv_inception_v3         | 128  | 418.868547  |
|        res2net50_14w_8s         | 128  | 406.684172  |
|         visformer_small         | 128  | 401.162707  |
|           res2next50            | 128  | 363.991799  |
|          ghostnet_100           | 512  | 361.389727  |
|            mixnet_l             | 128  | 356.349867  |
|           tf_mixnet_l           | 128  | 346.218024  |
|            repvgg_a2            | 128  | 345.557403  |
|          pnasnet5large          |  16  | 343.917284  |
|        eca_halonext26ts         | 128  | 322.765476  |
|           fbnetc_100            | 512  | 306.228426  |
|            gernet_l             | 128  | 301.176915  |
|       eca_botnext26ts_256       | 128  | 297.455535  |
|        sebotnet33ts_256         |  64  | 285.094174  |
|          botnet26t_256          | 128  | 284.982224  |
|           regnety_002           | 1024 | 282.444465  |
|          cspdarknet53           |  64  | 265.651892  |
|           mnasnet_100           | 512  | 258.904042  |
|            fbnetv3_b            | 256  | 246.193927  |
|           selecsls42b           | 128  | 234.242779  |
|      mobilenetv3_large_100      | 512  | 232.490607  |
|           rexnet_100            | 256  | 226.273406  |
|           mobilevit_s           |  64  | 225.088929  |
|       tf_efficientnet_b0        | 128  | 118.279648  |
|            tinynet_a            | 128  |  85.224101  |
|         mobilenetv2_100         | 128  |  72.265875  |
|          spnasnet_100           | 128  |  65.728744  |
|            lcnet_050            | 256  |  26.748234  |
+---------------------------------+------+-------------+

@zxd1997066
Copy link
Contributor

[default] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-05-06 nightly release)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

SW information:

SW Branch Commit
Pytorch main fc183f0bdec30baa6686e13720adae077c332bdd
Torchbench main d6015d42
torchaudio main 2.2.0a0+ea437b3
torchtext main 0.16.0a0+b0ebddc
torchvision main 0.19.0a0+06ad737
torchdata main 0.7.1a0+0790338
dynamo_benchmarks main nightly

HW information

Item Value
Manufacturer Amazon EC2
Product Name c6i.16xlarge
CPU Model Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory 128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS Ubuntu 22.04.2 LTS
Kernel 5.19.0-1022-aws
Microcode 0xd000389
GCC gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils GNU ld (GNU Binutils for Ubuntu) 2.38
Python Python 3.10.6
OpenSSL OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 96%, 76/79 | 100%, 46/46 | 100%, 60/60 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.56x    |    1.19x    |    1.50x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   25.64    |    26.78    |    36.31    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.86x    |    0.81x    |    0.82x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 64.445867 |
|     pyhpc_equation_of_state     |    1    | 22.882589 |
|          maml_omniglot          |    5    | 3.813073  |
|         basic_gnn_sage          |    1    | 3.528025  |
|          basic_gnn_gin          |    1    | 3.464426  |
|          basic_gnn_gcn          |    1    | 3.404701  |
|     functorch_maml_omniglot     |    1    |  3.26743  |
|          squeezenet1_1          |    1    | 3.248894  |
|           timm_nfnet            |    1    | 2.762415  |
|         opacus_cifar10          |    1    | 2.225202  |
|            resnet18             |    1    | 2.185349  |
|              dcgan              |    1    | 2.143119  |
|      functorch_dp_cifar10       |    1    | 2.042973  |
|       shufflenet_v2_x1_0        |    1    | 2.017098  |
|          timm_resnest           |    1    | 1.966796  |
|          mobilenet_v2           |    1    | 1.871454  |
|          lennard_jones          |    1    | 1.834938  |
|            resnet50             |    1    | 1.767576  |
|           mnasnet1_0            |    1    | 1.753123  |
|       mobilenet_v3_large        |    1    | 1.702377  |
|         phlippe_resnet          |    1    | 1.665991  |
|            resnet152            |    1    | 1.656782  |
|           densenet121           |    1    | 1.635308  |
|           timm_vovnet           |    1    | 1.585331  |
|        timm_efficientnet        |    1    | 1.574541  |
|         LearningToPaint         |    1    | 1.567979  |
|      doctr_reco_predictor       |    1    | 1.508264  |
|           timm_regnet           |    1    | 1.476008  |
|         resnext50_32x4d         |    1    | 1.470957  |
|              vgg16              |    1    | 1.441952  |
|        phlippe_densenet         |    1    | 1.429332  |
|        basic_gnn_edgecnn        |    1    | 1.404273  |
|              llama              |    1    | 1.393092  |
|             yolov3              |    1    | 1.380536  |
|             alexnet             |    1    | 1.349487  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.296279  |
|          BERT_pytorch           |    1    | 1.287187  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.277091  |
|            hf_Albert            |    1    | 1.275734  |
|              maml               |    1    | 1.258337  |
|             hf_GPT2             |    1    | 1.249006  |
|       doctr_det_predictor       |    1    | 1.230863  |
|          fastNLP_Bert           |    1    | 1.229768  |
|               drq               |    1    | 1.222185  |
|          hf_GPT2_large          |    1    | 1.219968  |
|            moondream            |    1    | 1.219794  |
|     timm_vision_transformer     |    1    | 1.185166  |
|         pytorch_stargan         |   16    | 1.179583  |
|        soft_actor_critic        |   256   | 1.177967  |
|  timm_vision_transformer_large  |    1    | 1.168518  |
|          hf_Bert_large          |    1    | 1.164643  |
|              dlrm               |    1    | 1.163382  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.159774  |
|           hf_BigBird            |    1    | 1.149589  |
|             hf_Bert             |    1    | 1.141007  |
|      torch_multimodal_clip      |    1    | 1.138586  |
|          hf_DistilBert          |    1    | 1.131109  |
|             hf_Bart             |    1    | 1.097835  |
|        hf_distil_whisper        |    1    | 1.076378  |
|          pytorch_unet           |    1    | 1.069244  |
|       speech_transformer        |    1    | 1.066964  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.048635  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.041664  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.032674  |
|          hf_Longformer          |    1    | 1.014032  |
|             demucs              |    1    | 1.006278  |
|           tts_angular           |    1    | 1.000806  |
|     resnet50_quantized_qat      |    1    | 0.993974  |
|   mobilenet_v2_quantized_qat    |    1    | 0.987685  |
|     nvidia_deeprecommender      |    1    | 0.933001  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.914573  |
|           hf_Reformer           |    1    | 0.850977  |
|       Background_Matting        |    1    | 0.829369  |
|           hf_T5_large           |    1    | 0.788433  |
|              hf_T5              |    1    | 0.714386  |
|           hf_T5_base            |    1    | 0.588373  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|             yolov3              |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|           densenet121           |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|             demucs              |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|              llama              |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|            moondream            |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
|              moco               |    0    | model_fail_to_load |
|         DALLE2_pytorch          |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_fpn |    1    |    fail_to_run     |
| detectron2_fasterrcnn_r_50_fpn  |    1    |    fail_to_run     |
|           Super_SloMo           |    1    |    fail_to_run     |
|         vision_maskrcnn         |    1    |    fail_to_run     |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|           hf_T5_base            |    1    | 98.637192 |
|           densenet121           |    1    | 90.665883 |
|           hf_BigBird            |    1    | 76.01799  |
|           hf_T5_large           |    1    | 71.846317 |
|    detectron2_fcos_r_50_fpn     |    1    | 69.727589 |
|              maml               |    1    | 53.906516 |
|          hf_Longformer          |    1    | 52.212565 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 51.222416 |
|           timm_nfnet            |    1    | 48.505315 |
|           hf_Reformer           |    1    | 47.714214 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 43.928014 |
|        phlippe_densenet         |    1    | 42.854589 |
|       speech_transformer        |    1    | 39.386287 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 39.349483 |
|  timm_vision_transformer_large  |    1    | 36.565153 |
|      torch_multimodal_clip      |    1    | 35.787178 |
|             demucs              |    1    | 35.407404 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 33.926052 |
|        timm_efficientnet        |    1    | 33.718175 |
|              hf_T5              |    1    | 32.576684 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 31.856489 |
|             yolov3              |    1    | 28.956766 |
|       Background_Matting        |    1    | 28.942798 |
|         opacus_cifar10          |    1    | 28.718462 |
|        hf_distil_whisper        |    1    | 28.536675 |
|            moondream            |    1    | 27.997211 |
|          hf_GPT2_large          |    1    | 27.525852 |
|      functorch_dp_cifar10       |    1    | 27.299151 |
|          hf_Bert_large          |    1    | 25.826456 |
|          timm_resnest           |    1    | 25.604184 |
|       doctr_det_predictor       |    1    | 25.202894 |
|       shufflenet_v2_x1_0        |    1    | 24.147071 |
|       mobilenet_v3_large        |    1    | 24.122388 |
|              llama              |    1    | 23.901318 |
|          BERT_pytorch           |    1    | 22.900849 |
|          fastNLP_Bert           |    1    | 22.420057 |
|             hf_Bart             |    1    | 22.388637 |
|           timm_vovnet           |    1    | 21.658463 |
|           timm_regnet           |    1    | 21.366581 |
|     timm_vision_transformer     |    1    | 21.311943 |
|          pytorch_unet           |    1    | 20.127344 |
|            hf_Albert            |    1    | 19.997674 |
|             hf_GPT2             |    1    | 19.456303 |
|          hf_DistilBert          |    1    | 19.003754 |
|             hf_Bert             |    1    | 18.803731 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 17.994607 |
|          squeezenet1_1          |    1    | 16.063705 |
|            resnet152            |    1    | 15.785529 |
|              vgg16              |    1    | 15.026169 |
|         pytorch_stargan         |   16    | 14.756071 |
|      doctr_reco_predictor       |    1    | 13.704349 |
|             alexnet             |    1    | 12.643404 |
|     pyhpc_isoneutral_mixing     |    1    | 12.185837 |
|         resnext50_32x4d         |    1    | 11.538633 |
|            resnet50             |    1    | 11.516691 |
|               drq               |    1    | 10.956853 |
|              dlrm               |    1    | 10.616167 |
|            resnet18             |    1    | 10.179536 |
|          mobilenet_v2           |    1    | 10.084892 |
|           mnasnet1_0            |    1    | 9.909876  |
|     functorch_maml_omniglot     |    1    |  9.71479  |
|          basic_gnn_gcn          |    1    | 9.396279  |
|          maml_omniglot          |    5    | 9.367228  |
|     nvidia_deeprecommender      |    1    |  9.22472  |
|         LearningToPaint         |    1    | 9.006523  |
|          basic_gnn_gin          |    1    | 8.991116  |
|        basic_gnn_edgecnn        |    1    |  8.93692  |
|     pyhpc_equation_of_state     |    1    |  8.77773  |
|         phlippe_resnet          |    1    | 8.609056  |
|         basic_gnn_sage          |    1    | 7.878841  |
|        soft_actor_critic        |   256   |  6.8601   |
|          lennard_jones          |    1    | 5.762626  |
|              dcgan              |    1    | 5.698204  |
|           tts_angular           |    1    | 5.519752  |
|   mobilenet_v2_quantized_qat    |    1    | 0.098428  |
|     resnet50_quantized_qat      |    1    | 0.070457  |
|        timm_efficientdet        |    0    |    0.0    |
|              moco               |    0    |    0.0    |
|         DALLE2_pytorch          |    0    |    0.0    |
+---------------------------------+---------+-----------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.988223 |
|           hf_T5_base            |    1    | 0.987501 |
|             demucs              |    1    | 0.985024 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.982531 |
|       Background_Matting        |    1    | 0.98207  |
|          pytorch_unet           |    1    | 0.977945 |
|          hf_GPT2_large          |    1    | 0.977077 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.971569 |
|        basic_gnn_edgecnn        |    1    | 0.970832 |
|       doctr_det_predictor       |    1    | 0.969779 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.963376 |
|     resnet50_quantized_qat      |    1    | 0.955681 |
|           hf_BigBird            |    1    | 0.948372 |
|         LearningToPaint         |    1    | 0.947878 |
|         pytorch_stargan         |   16    | 0.945307 |
|      doctr_reco_predictor       |    1    | 0.942982 |
|          basic_gnn_gin          |    1    | 0.942297 |
|          basic_gnn_gcn          |    1    | 0.940448 |
|         basic_gnn_sage          |    1    | 0.937056 |
|   mobilenet_v2_quantized_qat    |    1    | 0.928211 |
|      torch_multimodal_clip      |    1    | 0.927193 |
|              llama              |    1    | 0.919541 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.916572 |
|        hf_distil_whisper        |    1    | 0.914929 |
|           tts_angular           |    1    | 0.891224 |
|        soft_actor_critic        |   256   | 0.888577 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.888407 |
|         opacus_cifar10          |    1    | 0.884166 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.883011 |
|        timm_efficientnet        |    1    | 0.879881 |
|          mobilenet_v2           |    1    |  0.8716  |
|           mnasnet1_0            |    1    | 0.864584 |
|          lennard_jones          |    1    | 0.863277 |
|          maml_omniglot          |    5    | 0.862473 |
|          squeezenet1_1          |    1    | 0.859013 |
|          fastNLP_Bert           |    1    | 0.858562 |
|     functorch_maml_omniglot     |    1    | 0.858209 |
|          timm_resnest           |    1    | 0.85478  |
|              dcgan              |    1    | 0.853119 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.848445 |
|       mobilenet_v3_large        |    1    | 0.847797 |
|         phlippe_resnet          |    1    | 0.839569 |
|       shufflenet_v2_x1_0        |    1    | 0.838372 |
|     pyhpc_equation_of_state     |    1    | 0.835444 |
|       speech_transformer        |    1    | 0.827812 |
|            moondream            |    1    | 0.825842 |
|        phlippe_densenet         |    1    | 0.818748 |
|          hf_Bert_large          |    1    | 0.815972 |
|           timm_nfnet            |    1    | 0.815435 |
|         resnext50_32x4d         |    1    | 0.810884 |
|     pyhpc_isoneutral_mixing     |    1    | 0.810856 |
|          hf_Longformer          |    1    | 0.807608 |
|           hf_T5_large           |    1    | 0.806188 |
|             hf_Bert             |    1    | 0.805374 |
|     timm_vision_transformer     |    1    | 0.804304 |
|            hf_Albert            |    1    | 0.80205  |
|              maml               |    1    | 0.794229 |
|             hf_Bart             |    1    |  0.7819  |
|             yolov3              |    1    | 0.781257 |
|          BERT_pytorch           |    1    | 0.777892 |
|          hf_DistilBert          |    1    | 0.777746 |
|            resnet50             |    1    | 0.768533 |
|             hf_GPT2             |    1    | 0.767327 |
|            resnet18             |    1    |  0.764   |
|               drq               |    1    | 0.763756 |
|           densenet121           |    1    | 0.762045 |
|           timm_regnet           |    1    | 0.761515 |
|           timm_vovnet           |    1    | 0.759526 |
|              hf_T5              |    1    | 0.757924 |
|      functorch_dp_cifar10       |    1    | 0.744955 |
|             alexnet             |    1    | 0.740068 |
|  timm_vision_transformer_large  |    1    | 0.733142 |
|           hf_Reformer           |    1    | 0.731995 |
|              vgg16              |    1    | 0.722482 |
|            resnet152            |    1    | 0.694767 |
|     nvidia_deeprecommender      |    1    | 0.673063 |
|              moco               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
|         DALLE2_pytorch          |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26126.994118 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11791.750217 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11204.303674 |
|          hf_GPT2_large          |    1    | 10130.513758 |
|           hf_T5_large           |    1    | 7491.368604  |
|            moondream            |    1    | 7351.307059  |
|        hf_distil_whisper        |    1    | 6984.400307  |
|       Background_Matting        |    1    |  6722.18541  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5819.435526  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 5050.857175  |
|          pytorch_unet           |    1    | 4719.210394  |
|  timm_vision_transformer_large  |    1    |  2786.56356  |
|    detectron2_fcos_r_50_fpn     |    1    | 2540.588244  |
|             demucs              |    1    | 2225.827308  |
|         pytorch_stargan         |   16    | 2034.150753  |
|       doctr_det_predictor       |    1    |  1795.78177  |
|          hf_Bert_large          |    1    |  1763.15407  |
|           hf_BigBird            |    1    | 1480.910867  |
|      torch_multimodal_clip      |    1    | 1242.636481  |
|          hf_Longformer          |    1    | 1109.611249  |
|             hf_Bart             |    1    |  883.731354  |
|              hf_T5              |    1    |  766.712965  |
|             hf_Bert             |    1    |   684.4697   |
|       speech_transformer        |    1    |  674.886135  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  623.529509  |
|            hf_Albert            |    1    |  574.255843  |
|          fastNLP_Bert           |    1    |  522.814572  |
|             yolov3              |    1    |  428.462035  |
|           hf_Reformer           |    1    |  415.450229  |
|          hf_DistilBert          |    1    |  414.611568  |
|             hf_GPT2             |    1    |  354.975896  |
|        basic_gnn_edgecnn        |    1    |  231.357825  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  210.720639  |
|              vgg16              |    1    |  189.988448  |
|           timm_regnet           |    1    |  149.945722  |
|          BERT_pytorch           |    1    |  142.369087  |
|            resnet152            |    1    |  137.661648  |
|           timm_nfnet            |    1    |  96.688986   |
|           timm_vovnet           |    1    |  80.148776   |
|              maml               |    1    |   74.2863    |
|     timm_vision_transformer     |    1    |  59.288886   |
|     nvidia_deeprecommender      |    1    |  58.158764   |
|         resnext50_32x4d         |    1    |  57.461683   |
|           tts_angular           |    1    |  54.350066   |
|            resnet50             |    1    |  51.132173   |
|           densenet121           |    1    |  45.893995   |
|          timm_resnest           |    1    |   33.43099   |
|          basic_gnn_gcn          |    1    |  29.751516   |
|      doctr_reco_predictor       |    1    |  23.391282   |
|              llama              |    1    |   22.81598   |
|             alexnet             |    1    |  22.225488   |
|            resnet18             |    1    |  22.216467   |
|     resnet50_quantized_qat      |    1    |  18.229348   |
|          basic_gnn_gin          |    1    |  16.745719   |
|         basic_gnn_sage          |    1    |  16.558835   |
|        timm_efficientnet        |    1    |  13.517251   |
|         LearningToPaint         |    1    |   9.855635   |
|           mnasnet1_0            |    1    |   7.91486    |
|          mobilenet_v2           |    1    |    7.6982    |
|       mobilenet_v3_large        |    1    |   7.425341   |
|   mobilenet_v2_quantized_qat    |    1    |   6.92517    |
|          squeezenet1_1          |    1    |   5.994861   |
|       shufflenet_v2_x1_0        |    1    |   5.688129   |
|        phlippe_densenet         |    1    |   3.419787   |
|        soft_actor_critic        |   256   |   3.006974   |
|      functorch_dp_cifar10       |    1    |   2.488184   |
|         opacus_cifar10          |    1    |   2.411128   |
|               drq               |    1    |   1.912705   |
|              dcgan              |    1    |   1.739844   |
|         phlippe_resnet          |    1    |   1.378399   |
|     functorch_maml_omniglot     |    1    |    0.8764    |
|              dlrm               |    1    |   0.707637   |
|          maml_omniglot          |    5    |   0.576463   |
|     pyhpc_equation_of_state     |    1    |   0.045005   |
|     pyhpc_isoneutral_mixing     |    1    |   0.042233   |
|          lennard_jones          |    1    |   0.037956   |
|              moco               |    0    |     0.0      |
|         DALLE2_pytorch          |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 1.967898 |
|     MobileBertForQuestionAnswering      | 1  | 1.560822 |
|            XLNetLMHeadModel             | 1  | 1.380197 |
|      GPT2ForSequenceClassification      | 1  | 1.309197 |
|         Speech2Text2ForCausalLM         | 1  | 1.308869 |
|          DistilBertForMaskedLM          | 1  | 1.299151 |
|            YituTechConvBert             | 1  | 1.297866 |
|     DistilBertForQuestionAnswering      | 1  | 1.284099 |
|       BlenderbotSmallForCausalLM        | 1  | 1.28294  |
| BlenderbotSmallForConditionalGeneration | 1  | 1.268637 |
|       DebertaForQuestionAnswering       | 1  | 1.250317 |
|          BlenderbotForCausalLM          | 1  | 1.24857  |
|     M2M100ForConditionalGeneration      | 1  | 1.243858 |
|       MT5ForConditionalGeneration       | 1  | 1.238832 |
|           PegasusForCausalLM            | 1  | 1.23542  |
|           DebertaForMaskedLM            | 1  | 1.23346  |
|     PegasusForConditionalGeneration     | 1  | 1.231064 |
|             XGLMForCausalLM             | 1  | 1.228349 |
|               GoogleFnet                | 1  | 1.221002 |
|       AlbertForQuestionAnswering        | 1  | 1.203965 |
|            AlbertForMaskedLM            | 1  | 1.202635 |
|               DistillGPT2               | 1  | 1.188632 |
|           ElectraForCausalLM            | 1  | 1.188038 |
|                CamemBert                | 1  | 1.179932 |
|    LayoutLMForSequenceClassification    | 1  | 1.173532 |
|        BertForQuestionAnswering         | 1  | 1.163038 |
|             BertForMaskedLM             | 1  | 1.160208 |
|    MegatronBertForQuestionAnswering     | 1  | 1.15688  |
|      DebertaV2ForQuestionAnswering      | 1  | 1.155824 |
|         MegatronBertForCausalLM         | 1  | 1.153647 |
|           LayoutLMForMaskedLM           | 1  | 1.153477 |
|            TrOCRForCausalLM             | 1  | 1.15339  |
|          DebertaV2ForMaskedLM           | 1  | 1.150769 |
|       RobertaForQuestionAnswering       | 1  | 1.143948 |
|       ElectraForQuestionAnswering       | 1  | 1.13906  |
|           RobertaForCausalLM            | 1  | 1.138163 |
|     PLBartForConditionalGeneration      | 1  | 1.089902 |
|      MBartForConditionalGeneration      | 1  | 1.064173 |
|             BartForCausalLM             | 1  | 1.051762 |
|      BartForConditionalGeneration       | 1  |  1.051   |
|             OPTForCausalLM              | 1  | 1.031614 |
|            PLBartForCausalLM            | 1  | 1.023256 |
|            MBartForCausalLM             | 1  | 1.013056 |
|          AllenaiLongformerBase          | 1  | 0.962022 |
|                 T5Small                 | 1  | 0.621263 |
|       T5ForConditionalGeneration        | 1  | 0.619995 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 59.634846 |
|          MobileBertForMaskedLM          | 1  | 43.949174 |
|     MobileBertForQuestionAnswering      | 1  | 42.781136 |
|     PegasusForConditionalGeneration     | 1  | 40.991678 |
|     M2M100ForConditionalGeneration      | 1  | 40.460028 |
|      MBartForConditionalGeneration      | 1  | 39.739536 |
|       T5ForConditionalGeneration        | 1  | 38.301509 |
|                 T5Small                 | 1  | 38.231606 |
|          BlenderbotForCausalLM          | 1  | 38.044693 |
|       MT5ForConditionalGeneration       | 1  | 36.07857  |
|            XLNetLMHeadModel             | 1  | 34.805147 |
|             XGLMForCausalLM             | 1  | 34.134239 |
|          DebertaV2ForMaskedLM           | 1  | 31.820204 |
| BlenderbotSmallForConditionalGeneration | 1  | 31.040281 |
|      DebertaV2ForQuestionAnswering      | 1  | 30.471757 |
|      BartForConditionalGeneration       | 1  | 29.502419 |
|         MegatronBertForCausalLM         | 1  | 28.794261 |
|            YituTechConvBert             | 1  | 28.548549 |
|     PLBartForConditionalGeneration      | 1  | 28.014181 |
|    MegatronBertForQuestionAnswering     | 1  | 27.48558  |
|             OPTForCausalLM              | 1  | 25.265182 |
|           PegasusForCausalLM            | 1  | 24.913006 |
|            MBartForCausalLM             | 1  | 24.426802 |
|            TrOCRForCausalLM             | 1  | 22.543522 |
|           DebertaForMaskedLM            | 1  | 22.428724 |
|       DebertaForQuestionAnswering       | 1  | 21.243703 |
|           RobertaForCausalLM            | 1  | 20.343088 |
|                CamemBert                | 1  | 20.293016 |
|           ElectraForCausalLM            | 1  | 20.252557 |
|       BlenderbotSmallForCausalLM        | 1  | 19.568318 |
|          DistilBertForMaskedLM          | 1  | 19.561027 |
|      GPT2ForSequenceClassification      | 1  | 19.359112 |
|             BertForMaskedLM             | 1  | 19.069011 |
|           LayoutLMForMaskedLM           | 1  | 19.066651 |
|       RobertaForQuestionAnswering       | 1  | 19.059284 |
|       ElectraForQuestionAnswering       | 1  | 19.029235 |
|         Speech2Text2ForCausalLM         | 1  | 18.948653 |
|    LayoutLMForSequenceClassification    | 1  | 18.749458 |
|            PLBartForCausalLM            | 1  | 18.68114  |
|             BartForCausalLM             | 1  | 18.521107 |
|     DistilBertForQuestionAnswering      | 1  | 18.327094 |
|        BertForQuestionAnswering         | 1  | 17.80669  |
|               GoogleFnet                | 1  | 17.453951 |
|               DistillGPT2               | 1  | 17.391441 |
|            AlbertForMaskedLM            | 1  | 14.239068 |
|       AlbertForQuestionAnswering        | 1  | 12.625685 |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.986073 |
|      MBartForConditionalGeneration      | 1  | 0.974533 |
|      GPT2ForSequenceClassification      | 1  | 0.95267  |
|          AllenaiLongformerBase          | 1  | 0.947388 |
|            MBartForCausalLM             | 1  | 0.923167 |
|     PLBartForConditionalGeneration      | 1  | 0.90915  |
|            XLNetLMHeadModel             | 1  | 0.907247 |
|       T5ForConditionalGeneration        | 1  | 0.905728 |
|                 T5Small                 | 1  | 0.905562 |
|            PLBartForCausalLM            | 1  | 0.903127 |
|       DebertaForQuestionAnswering       | 1  | 0.871902 |
|      BartForConditionalGeneration       | 1  | 0.865003 |
|               GoogleFnet                | 1  | 0.854519 |
|       RobertaForQuestionAnswering       | 1  | 0.851352 |
|    LayoutLMForSequenceClassification    | 1  | 0.849554 |
|        BertForQuestionAnswering         | 1  | 0.848925 |
|       ElectraForQuestionAnswering       | 1  | 0.841596 |
|    MegatronBertForQuestionAnswering     | 1  | 0.840022 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.835149 |
|               DistillGPT2               | 1  | 0.82778  |
|           DebertaForMaskedLM            | 1  | 0.825522 |
|         MegatronBertForCausalLM         | 1  | 0.819097 |
|           LayoutLMForMaskedLM           | 1  | 0.81809  |
|                CamemBert                | 1  | 0.813525 |
|             BertForMaskedLM             | 1  | 0.813373 |
|           RobertaForCausalLM            | 1  | 0.812866 |
|         Speech2Text2ForCausalLM         | 1  | 0.811835 |
|            YituTechConvBert             | 1  | 0.806554 |
|           ElectraForCausalLM            | 1  | 0.804661 |
|             BartForCausalLM             | 1  | 0.801113 |
|     DistilBertForQuestionAnswering      | 1  | 0.800597 |
|          BlenderbotForCausalLM          | 1  | 0.797931 |
|          DebertaV2ForMaskedLM           | 1  | 0.796692 |
|            TrOCRForCausalLM             | 1  | 0.787733 |
|       MT5ForConditionalGeneration       | 1  | 0.780327 |
|       BlenderbotSmallForCausalLM        | 1  | 0.764505 |
|           PegasusForCausalLM            | 1  | 0.750888 |
|          DistilBertForMaskedLM          | 1  | 0.744874 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.739488 |
|     MobileBertForQuestionAnswering      | 1  | 0.733425 |
|     M2M100ForConditionalGeneration      | 1  | 0.718892 |
|     PegasusForConditionalGeneration     | 1  | 0.716342 |
|          MobileBertForMaskedLM          | 1  | 0.703592 |
|             XGLMForCausalLM             | 1  | 0.699487 |
|            AlbertForMaskedLM            | 1  | 0.448201 |
|       AlbertForQuestionAnswering        | 1  | 0.442855 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12752.784152 |
|       AlbertForQuestionAnswering        | 1  | 12723.982714 |
|      MBartForConditionalGeneration      | 1  | 6158.567621  |
|      BartForConditionalGeneration       | 1  | 5678.356818  |
|             OPTForCausalLM              | 1  | 5209.034391  |
|          DebertaV2ForMaskedLM           | 1  | 5058.921961  |
|      DebertaV2ForQuestionAnswering      | 1  | 3964.468957  |
|            XLNetLMHeadModel             | 1  | 3125.624499  |
|            MBartForCausalLM             | 1  | 3036.583371  |
|          BlenderbotForCausalLM          | 1  | 2632.424491  |
|             BartForCausalLM             | 1  | 2562.329829  |
|       T5ForConditionalGeneration        | 1  | 2465.642964  |
|                 T5Small                 | 1  | 2461.101153  |
|          AllenaiLongformerBase          | 1  | 2417.064398  |
|     PLBartForConditionalGeneration      | 1  | 2170.428099  |
|         MegatronBertForCausalLM         | 1  | 2042.528003  |
|    MegatronBertForQuestionAnswering     | 1  | 1876.784662  |
|      GPT2ForSequenceClassification      | 1  | 1314.487933  |
|            PLBartForCausalLM            | 1  | 1219.558971  |
|             XGLMForCausalLM             | 1  |  834.364393  |
|           DebertaForMaskedLM            | 1  |  787.098198  |
|           RobertaForCausalLM            | 1  |  782.279865  |
|     M2M100ForConditionalGeneration      | 1  |  716.697224  |
|                CamemBert                | 1  |  689.877553  |
|           LayoutLMForMaskedLM           | 1  |  688.804131  |
|            YituTechConvBert             | 1  |  686.227552  |
|             BertForMaskedLM             | 1  |  684.014178  |
|     PegasusForConditionalGeneration     | 1  |  608.144072  |
|            TrOCRForCausalLM             | 1  |  586.285665  |
|       DebertaForQuestionAnswering       | 1  |  561.462122  |
|       RobertaForQuestionAnswering       | 1  |  545.545278  |
|    LayoutLMForSequenceClassification    | 1  |  545.099922  |
|        BertForQuestionAnswering         | 1  |  544.868604  |
|               DistillGPT2               | 1  |  504.00031   |
|               GoogleFnet                | 1  |  474.234429  |
|       MT5ForConditionalGeneration       | 1  |  303.232259  |
|           PegasusForCausalLM            | 1  |  300.993263  |
| BlenderbotSmallForConditionalGeneration | 1  |  146.754724  |
|           ElectraForCausalLM            | 1  |  134.614338  |
|          DistilBertForMaskedLM          | 1  |  100.23074   |
|       ElectraForQuestionAnswering       | 1  |  95.891012   |
|       BlenderbotSmallForCausalLM        | 1  |  85.462912   |
|          MobileBertForMaskedLM          | 1  |  67.054733   |
|     DistilBertForQuestionAnswering      | 1  |  64.345112   |
|     MobileBertForQuestionAnswering      | 1  |  41.122894   |
|         Speech2Text2ForCausalLM         | 1  |  19.102124   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.341154 |
|          inception_v3           | 1  | 2.210656 |
|       gluon_inception_v3        | 1  | 2.191664 |
|           dm_nfnet_f0           | 1  | 2.189164 |
|        adv_inception_v3         | 1  | 2.170318 |
|            nfnet_l0             | 1  | 2.163446 |
|            repvgg_a2            | 1  | 1.950211 |
|         mobilenetv2_100         | 1  | 1.943579 |
|            hrnet_w18            | 1  | 1.916578 |
|          spnasnet_100           | 1  | 1.873807 |
|           mnasnet_100           | 1  | 1.813199 |
|           fbnetc_100            | 1  | 1.807329 |
|            levit_128            | 1  | 1.781507 |
|          ghostnet_100           | 1  | 1.75878  |
|           selecsls42b           | 1  | 1.758383 |
|            lcnet_050            | 1  | 1.751672 |
|      mobilenetv3_large_100      | 1  | 1.702703 |
|           regnety_002           | 1  | 1.686115 |
|             dla102              | 1  | 1.682372 |
|        ese_vovnet19b_dw         | 1  | 1.667687 |
|           rexnet_100            | 1  | 1.640123 |
|       tf_efficientnet_b0        | 1  | 1.639949 |
|       eca_botnext26ts_256       | 1  | 1.636807 |
|          botnet26t_256          | 1  | 1.635287 |
|            fbnetv3_b            | 1  | 1.626372 |
|          cspdarknet53           | 1  | 1.614626 |
|           resnest101e           | 1  | 1.602241 |
|         poolformer_m36          | 1  | 1.508746 |
|           res2next50            | 1  | 1.507574 |
|            tinynet_a            | 1  | 1.505959 |
|        eca_halonext26ts         | 1  | 1.499008 |
|        res2net50_14w_8s         | 1  | 1.464934 |
|           volo_d1_224           | 1  | 1.454918 |
|        res2net101_26w_4s        | 1  | 1.450747 |
|           mobilevit_s           | 1  | 1.424571 |
|         visformer_small         | 1  | 1.382396 |
|           convit_base           | 1  | 1.368533 |
|          gmixer_24_224          | 1  | 1.348649 |
|     swsl_resnext101_32x16d      | 1  | 1.343358 |
|           tf_mixnet_l           | 1  | 1.30153  |
|        twins_pcpvt_base         | 1  | 1.295242 |
|            gernet_l             | 1  | 1.294629 |
|      beit_base_patch16_224      | 1  | 1.278506 |
|          resmlp_12_224          | 1  | 1.273201 |
|  swin_base_patch4_window7_224   | 1  | 1.26367  |
|        convmixer_768_32         | 1  | 1.25291  |
|          mixer_b16_224          | 1  | 1.208029 |
|      vit_base_patch16_224       | 1  | 1.198921 |
|             dpn107              | 1  | 1.198422 |
| deit_base_distilled_patch16_224 | 1  | 1.197573 |
|      xcit_large_24_p8_224       | 1  | 1.187356 |
|            pit_b_224            | 1  | 1.17731  |
|            mixnet_l             | 1  | 1.165216 |
|         crossvit_9_240          | 1  | 1.15796  |
|          gmlp_s16_224           | 1  | 1.153983 |
|        tnt_s_patch16_224        | 1  | 1.144888 |
|          convnext_base          | 1  | 1.143703 |
|          jx_nest_base           | 1  | 1.143015 |
|        sebotnet33ts_256         | 1  | 1.085394 |
|          cait_m36_384           | 1  | 0.964724 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|        adv_inception_v3         | 1  |    pass     |
|      beit_base_patch16_224      | 1  |    pass     |
|          botnet26t_256          | 1  |    pass     |
|          cait_m36_384           | 1  |    pass     |
|           convit_base           | 1  |    pass     |
|        convmixer_768_32         | 1  |    pass     |
|          convnext_base          | 1  |    pass     |
|         crossvit_9_240          | 1  |    pass     |
|          cspdarknet53           | 1  |    pass     |
| deit_base_distilled_patch16_224 | 1  |    pass     |
|             dla102              | 1  |    pass     |
|           dm_nfnet_f0           | 1  |    pass     |
|             dpn107              | 1  |    pass     |
|       eca_botnext26ts_256       | 1  |    pass     |
|        eca_halonext26ts         | 1  |    pass     |
|        ese_vovnet19b_dw         | 1  |    pass     |
|           fbnetc_100            | 1  |    pass     |
|            fbnetv3_b            | 1  |    pass     |
|            gernet_l             | 1  |    pass     |
|          ghostnet_100           | 1  |    pass     |
|       gluon_inception_v3        | 1  |    pass     |
|          gmixer_24_224          | 1  |    pass     |
|          gmlp_s16_224           | 1  |    pass     |
|            hrnet_w18            | 1  |    pass     |
|          inception_v3           | 1  |    pass     |
|          jx_nest_base           | 1  |    pass     |
|            lcnet_050            | 1  |    pass     |
|            levit_128            | 1  |    pass     |
|      xcit_large_24_p8_224       | 1  |    pass     |
|          mixer_b16_224          | 1  |    pass     |
|            mixnet_l             | 1  |    pass     |
|           mnasnet_100           | 1  |    pass     |
|         mobilenetv2_100         | 1  |    pass     |
|      mobilenetv3_large_100      | 1  |    pass     |
|           mobilevit_s           | 1  |    pass     |
|            nfnet_l0             | 1  |    pass     |
|            pit_b_224            | 1  |    pass     |
|          pnasnet5large          | 1  |    pass     |
|         poolformer_m36          | 1  |    pass     |
|           regnety_002           | 1  |    pass     |
|            repvgg_a2            | 1  |    pass     |
|        res2net101_26w_4s        | 1  |    pass     |
|        res2net50_14w_8s         | 1  |    pass     |
|           res2next50            | 1  |    pass     |
|          resmlp_12_224          | 1  |    pass     |
|           resnest101e           | 1  |    pass     |
|           rexnet_100            | 1  |    pass     |
|        sebotnet33ts_256         | 1  |    pass     |
|           selecsls42b           | 1  |    pass     |
|          spnasnet_100           | 1  |    pass     |
|  swin_base_patch4_window7_224   | 1  |    pass     |
|     swsl_resnext101_32x16d      | 1  |    pass     |
|       tf_efficientnet_b0        | 1  |    pass     |
|           tf_mixnet_l           | 1  |    pass     |
|            tinynet_a            | 1  |    pass     |
|        tnt_s_patch16_224        | 1  |    pass     |
|        twins_pcpvt_base         | 1  |    pass     |
|         visformer_small         | 1  |    pass     |
|      vit_base_patch16_224       | 1  |    pass     |
|           volo_d1_224           | 1  |    pass     |
|         coat_lite_mini          | 1  | fail_to_run |
+---------------------------------+----+-------------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|          pnasnet5large          | 1  | 80.97748  |
|  swin_base_patch4_window7_224   | 1  | 80.429749 |
|           tf_mixnet_l           | 1  | 68.546813 |
|             dpn107              | 1  | 62.73066  |
|        twins_pcpvt_base         | 1  | 61.227279 |
|           mobilevit_s           | 1  | 58.280053 |
|          jx_nest_base           | 1  | 58.201676 |
|        res2net50_14w_8s         | 1  | 57.072193 |
|           rexnet_100            | 1  | 55.301408 |
|      xcit_large_24_p8_224       | 1  | 54.395136 |
|          cait_m36_384           | 1  | 54.264333 |
|          ghostnet_100           | 1  | 52.790536 |
|            mixnet_l             | 1  | 50.878739 |
|        sebotnet33ts_256         | 1  | 50.602074 |
|         poolformer_m36          | 1  | 50.413777 |
|            levit_128            | 1  | 49.193149 |
|           dm_nfnet_f0           | 1  | 48.586136 |
|        eca_halonext26ts         | 1  | 48.33342  |
|         crossvit_9_240          | 1  | 48.244597 |
|        tnt_s_patch16_224        | 1  | 46.932912 |
|           volo_d1_224           | 1  | 42.678507 |
|       eca_botnext26ts_256       | 1  | 41.03886  |
|            hrnet_w18            | 1  | 40.618384 |
|        res2net101_26w_4s        | 1  | 39.782438 |
|       tf_efficientnet_b0        | 1  | 39.256665 |
|            nfnet_l0             | 1  | 37.837174 |
|          convnext_base          | 1  | 37.687132 |
|           resnest101e           | 1  | 37.544035 |
|       gluon_inception_v3        | 1  | 35.78282  |
|          inception_v3           | 1  | 35.752466 |
|        adv_inception_v3         | 1  | 35.746857 |
|            tinynet_a            | 1  | 34.324701 |
|           res2next50            | 1  | 34.320129 |
|            pit_b_224            | 1  | 33.976078 |
|           convit_base           | 1  | 31.728945 |
|          botnet26t_256          | 1  | 30.224131 |
|          cspdarknet53           | 1  | 27.512342 |
|            fbnetv3_b            | 1  | 26.886995 |
|             dla102              | 1  | 26.757869 |
|          gmlp_s16_224           | 1  | 26.39974  |
|          gmixer_24_224          | 1  | 25.241362 |
|         visformer_small         | 1  | 24.483271 |
|        ese_vovnet19b_dw         | 1  | 23.458677 |
|      mobilenetv3_large_100      | 1  | 23.151329 |
|      vit_base_patch16_224       | 1  | 21.567235 |
| deit_base_distilled_patch16_224 | 1  | 20.517075 |
|      beit_base_patch16_224      | 1  | 20.280849 |
|          mixer_b16_224          | 1  | 19.346046 |
|           regnety_002           | 1  | 19.032125 |
|            repvgg_a2            | 1  | 17.308534 |
|        convmixer_768_32         | 1  | 17.28152  |
|          resmlp_12_224          | 1  | 17.182277 |
|           selecsls42b           | 1  | 16.514863 |
|            lcnet_050            | 1  | 14.114847 |
|     swsl_resnext101_32x16d      | 1  | 12.749704 |
|           fbnetc_100            | 1  | 10.937719 |
|          spnasnet_100           | 1  | 10.893521 |
|            gernet_l             | 1  | 10.828321 |
|         mobilenetv2_100         | 1  | 10.405995 |
|           mnasnet_100           | 1  | 10.215849 |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.945835 |
|          pnasnet5large          | 1  | 0.929019 |
|        convmixer_768_32         | 1  | 0.918739 |
|            nfnet_l0             | 1  | 0.896431 |
|      xcit_large_24_p8_224       | 1  | 0.890839 |
|        ese_vovnet19b_dw         | 1  | 0.885076 |
|           mnasnet_100           | 1  | 0.878755 |
|            fbnetv3_b            | 1  | 0.878334 |
|         mobilenetv2_100         | 1  | 0.876289 |
|          spnasnet_100           | 1  | 0.875894 |
|       tf_efficientnet_b0        | 1  | 0.874548 |
|           rexnet_100            | 1  | 0.868043 |
|      mobilenetv3_large_100      | 1  | 0.866544 |
|           fbnetc_100            | 1  | 0.86615  |
|            lcnet_050            | 1  | 0.864061 |
|            tinynet_a            | 1  | 0.862503 |
|         poolformer_m36          | 1  | 0.85617  |
|           dm_nfnet_f0           | 1  | 0.855481 |
|        eca_halonext26ts         | 1  | 0.853707 |
|       eca_botnext26ts_256       | 1  | 0.853522 |
|           mobilevit_s           | 1  | 0.849628 |
|           tf_mixnet_l           | 1  | 0.849108 |
|           regnety_002           | 1  | 0.843182 |
|          ghostnet_100           | 1  | 0.842075 |
|          botnet26t_256          | 1  | 0.840399 |
|            mixnet_l             | 1  | 0.830168 |
|          resmlp_12_224          | 1  | 0.827128 |
|         visformer_small         | 1  | 0.820903 |
|           res2next50            | 1  | 0.817667 |
|            levit_128            | 1  | 0.809377 |
|             dpn107              | 1  | 0.801708 |
|          convnext_base          | 1  | 0.799297 |
|            hrnet_w18            | 1  | 0.798001 |
|        sebotnet33ts_256         | 1  | 0.796586 |
|        res2net50_14w_8s         | 1  | 0.796417 |
|          gmlp_s16_224           | 1  | 0.795554 |
|          cspdarknet53           | 1  | 0.795378 |
|          gmixer_24_224          | 1  | 0.789754 |
|           volo_d1_224           | 1  | 0.786885 |
|        tnt_s_patch16_224        | 1  | 0.784258 |
|           convit_base           | 1  | 0.78121  |
|         crossvit_9_240          | 1  | 0.778458 |
|          mixer_b16_224          | 1  | 0.778096 |
|        twins_pcpvt_base         | 1  | 0.776793 |
|           resnest101e           | 1  | 0.776039 |
|             dla102              | 1  | 0.775466 |
|      beit_base_patch16_224      | 1  | 0.772652 |
|          jx_nest_base           | 1  | 0.771103 |
|      vit_base_patch16_224       | 1  | 0.763821 |
|        adv_inception_v3         | 1  | 0.763075 |
|       gluon_inception_v3        | 1  | 0.761794 |
|          inception_v3           | 1  | 0.76176  |
| deit_base_distilled_patch16_224 | 1  | 0.761378 |
|            pit_b_224            | 1  | 0.755289 |
|        res2net101_26w_4s        | 1  | 0.742855 |
|           selecsls42b           | 1  | 0.739329 |
|  swin_base_patch4_window7_224   | 1  | 0.738747 |
|            gernet_l             | 1  | 0.735909 |
|            repvgg_a2            | 1  | 0.690765 |
|     swsl_resnext101_32x16d      | 1  | 0.638714 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 3698.327286 |
|      xcit_large_24_p8_224       | 1  | 1539.433744 |
|     swsl_resnext101_32x16d      | 1  | 442.199064  |
|          pnasnet5large          | 1  | 370.151397  |
|          convnext_base          | 1  | 308.020975  |
|             dpn107              | 1  | 261.095941  |
|        convmixer_768_32         | 1  | 245.196771  |
|          jx_nest_base           | 1  | 239.496174  |
|  swin_base_patch4_window7_224   | 1  | 198.400583  |
|      beit_base_patch16_224      | 1  | 198.133026  |
| deit_base_distilled_patch16_224 | 1  | 197.563277  |
|           convit_base           | 1  |  197.37907  |
|      vit_base_patch16_224       | 1  | 196.531926  |
|            pit_b_224            | 1  | 168.794178  |
|           resnest101e           | 1  | 166.715496  |
|           dm_nfnet_f0           | 1  | 159.883829  |
|          mixer_b16_224          | 1  | 140.335259  |
|         poolformer_m36          | 1  | 139.089323  |
|        res2net101_26w_4s        | 1  | 113.902223  |
|        twins_pcpvt_base         | 1  | 108.587743  |
|           volo_d1_224           | 1  |  96.904484  |
|        tnt_s_patch16_224        | 1  |  95.376291  |
|            nfnet_l0             | 1  |  95.260693  |
|             dla102              | 1  |  90.75363   |
|            hrnet_w18            | 1  |  86.805456  |
|        sebotnet33ts_256         | 1  |  86.113993  |
|          cspdarknet53           | 1  |  83.174237  |
|          inception_v3           | 1  |  74.022403  |
|       gluon_inception_v3        | 1  |  73.800027  |
|        adv_inception_v3         | 1  |  73.723479  |
|          gmlp_s16_224           | 1  |  69.52908   |
|         visformer_small         | 1  |  67.379552  |
|        res2net50_14w_8s         | 1  |  65.476396  |
|          gmixer_24_224          | 1  |  63.763876  |
|            repvgg_a2            | 1  |  63.586872  |
|           res2next50            | 1  |  60.035223  |
|            gernet_l             | 1  |  57.076216  |
|          botnet26t_256          | 1  |  44.863539  |
|           selecsls42b           | 1  |  44.842202  |
|        eca_halonext26ts         | 1  |  44.559181  |
|           mobilevit_s           | 1  |  42.637568  |
|       eca_botnext26ts_256       | 1  |  40.650556  |
|          resmlp_12_224          | 1  |  35.928992  |
|         crossvit_9_240          | 1  |  34.163358  |
|        ese_vovnet19b_dw         | 1  |  31.652963  |
|            mixnet_l             | 1  |  31.517298  |
|           tf_mixnet_l           | 1  |  31.230372  |
|            fbnetv3_b            | 1  |  16.281255  |
|       tf_efficientnet_b0        | 1  |  13.872125  |
|           rexnet_100            | 1  |  13.831532  |
|            tinynet_a            | 1  |  12.801584  |
|           fbnetc_100            | 1  |  9.631454   |
|            levit_128            | 1  |  9.407799   |
|          ghostnet_100           | 1  |  9.390235   |
|          spnasnet_100           | 1  |  8.544496   |
|           mnasnet_100           | 1  |  7.918752   |
|         mobilenetv2_100         | 1  |  7.705593   |
|      mobilenetv3_large_100      | 1  |   7.61698   |
|           regnety_002           | 1  |  6.626572   |
|            lcnet_050            | 1  |  2.591173   |
+---------------------------------+----+-------------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: cpu inductor CPU Inductor issues for Intel team to triage oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

10 participants