Skip to content

Makes fasterrcnn backbone run#666

Merged
NoraHagmeyer merged 4 commits intomainfrom
fastcnn
Apr 10, 2026
Merged

Makes fasterrcnn backbone run#666
NoraHagmeyer merged 4 commits intomainfrom
fastcnn

Conversation

@NoraHagmeyer
Copy link
Copy Markdown
Contributor

  • Adds tensor.insert_slice() support to the mlir frontend
  • Adds a fasterrcnn_resnet50 benchmark to the repository. Only the backbone can be lowered to torch-mlir. Therefore, no harness and benchmarking infrastructure is used for this benchmark.

@NoraHagmeyer NoraHagmeyer requested review from Atrisan and lukastruemper and removed request for Atrisan and lukastruemper April 9, 2026 17:56
@daisytuner
Copy link
Copy Markdown

daisytuner bot commented Apr 9, 2026

Daisytuner Report - mlir_torch_models (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# bn_conv_bn_relu_maxpool_torch18.62 s     +0.51%      N/A         3487.65 J   -3.87%      
# bn_conv_bn_relu_maxpool_run_none3.26 s      +2.77%      N/A         632.98 J    -0.97%      
# bn_conv_bn_relu_maxpool_run_sequential3.29 s      +3.37%      N/A         642.20 J    -0.02%      
# bn_conv_bn_relu_maxpool_run_openmp3.37 s      +3.81%      N/A         660.00 J    -0.89%      
# bn_conv_bn_relu_maxpool_run_cuda3.71 s      +1.05%      N/A         697.14 J    -4.00%      

@daisytuner
Copy link
Copy Markdown

daisytuner bot commented Apr 9, 2026

Daisytuner Report - mlir_torch_layers (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# batchnorm_torch        19.01 s     -1.23%      N/A         3740.63 J   +0.32%      
# batchnorm_run_none     3.87 s      -0.17%      N/A         761.27 J    +0.58%      
# batchnorm_run_sequential3.98 s      +0.35%      N/A         784.66 J    +1.49%      
# batchnorm_run_openmp   3.51 s      -1.55%      N/A         729.36 J    -0.13%      
# batchnorm_run_cuda     5.47 s      -1.94%      N/A         1078.82 J   -0.47%      
# conv2d_torch           18.59 s     +0.11%      N/A         3652.26 J   +1.16%      
# conv2d_run_openmp      4.34 s      +0.04%      N/A         1039.37 J   +1.88%      
# conv2d_run_cuda        7.54 s      +1.82%      N/A         1474.85 J   +3.11%      
# linear_torch           6.20 s      -0.85%      N/A         1489.29 J   +0.67%      
# linear_run_none        10.52 s     -0.09%      N/A         2890.40 J   +0.38%      
# linear_run_sequential  8.89 s      -0.75%      N/A         2569.07 J   +0.10%      
# linear_run_openmp      8.46 s      +1.81%      N/A         2482.34 J   +2.67%      
# linear_run_cuda        8.38 s      -0.25%      N/A         1635.44 J   +0.58%      
# matmul_torch           6.08 s      +0.21%      N/A         1463.57 J   +1.05%      
# matmul_run_none        10.65 s     +1.25%      N/A         2921.20 J   +1.33%      
# matmul_run_sequential  8.89 s      +0.69%      N/A         2578.28 J   +1.56%      
# matmul_run_openmp      8.28 s      -0.58%      N/A         2432.14 J   -0.02%      
# matmul_run_cuda        8.23 s      +0.32%      N/A         1615.04 J   +1.59%      
# pooling_torch          26.16 s     +1.48%      N/A         5225.36 J   +2.81%      
# pooling_run_none       15.27 s     -0.73%      N/A         2946.75 J   +0.53%      
# pooling_run_sequential 15.19 s     -1.28%      N/A         2935.33 J   +0.11%      
# pooling_run_openmp     8.99 s      -1.87%      N/A         1833.85 J   -0.80%      
# pooling_run_cuda       20.03 s     -0.83%      N/A         3908.24 J   +0.46%      
# relu_torch             19.26 s     +1.68%      N/A         3784.56 J   +2.88%      
# relu_run_none          3.80 s      +0.01%      N/A         750.72 J    +1.30%      
# relu_run_sequential    3.81 s      -0.28%      N/A         751.30 J    +0.77%      
# relu_run_openmp        3.53 s      +1.94%      N/A         725.23 J    +3.28%      
# relu_run_cuda          5.48 s      -0.79%      N/A         1081.98 J   +0.60%      

@daisytuner
Copy link
Copy Markdown

daisytuner bot commented Apr 9, 2026

Daisytuner Report - python_npbench (zinnia)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# adi_numpy              1.31 s      -0.43%      N/A         130.82 J    -0.51%      
- adi_omp                14.61 s     +82.84%     N/A         1422.99 J   +79.08%     
- adi_cuda               4.69 s      +10.03%     N/A         454.05 J    +9.55%      
- adi_seq_tuning         15.13 s     +83.89%     N/A         1401.41 J   +83.29%     
# atax_numpy             2.15 s      -0.08%      N/A         222.84 J    -0.14%      
# atax_omp               2.96 s      -1.22%      N/A         370.55 J    -1.82%      
# atax_cuda              4.14 s      +0.74%      N/A         424.87 J    +0.63%      
# atax_seq_tuning        4.10 s      -0.44%      N/A         397.79 J    +0.08%      
# gemm_numpy             1.22 s      +0.95%      N/A         195.16 J    +0.80%      
# gemm_omp               1.12 s      +0.56%      N/A         162.94 J    +0.15%      
# gemm_cuda              10.63 s     +0.40%      N/A         1011.22 J   +0.48%      
# gemm_seq_tuning        1.11 s      -0.10%      N/A         161.69 J    -0.28%      
# gesummv_numpy          1.75 s      -0.49%      N/A         249.64 J    -0.51%      
# gesummv_omp            1.96 s      -1.39%      N/A         308.00 J    -1.87%      
# gesummv_cuda           8.34 s      +0.13%      N/A         1001.69 J   +0.08%      
# gesummv_seq_tuning     8.58 s      -0.51%      N/A         974.64 J    -0.03%      
# gemver_numpy           1.10 s      +0.98%      N/A         169.15 J    +1.11%      
# gemver_omp             843.10 ms   +0.02%      N/A         107.36 J    -0.37%      
# gemver_cuda            3.90 s      +0.95%      N/A         390.24 J    +0.91%      
# gemver_seq_tuning      5.51 s      -0.49%      N/A         496.62 J    +0.59%      
# k2mm_numpy             1.20 s      -0.15%      N/A         196.40 J    -0.11%      
# k2mm_omp               3.57 s      -0.43%      N/A         662.22 J    -0.62%      
# k2mm_cuda              13.59 s     -0.13%      N/A         1290.27 J   +0.17%      
# k2mm_seq_tuning        2.91 s      -3.09%      N/A         390.51 J    -1.50%      
# k3mm_numpy             1.03 s      -0.20%      N/A         181.82 J    -0.24%      
# k3mm_omp               5.55 s      +0.03%      N/A         947.43 J    -0.85%      
# k3mm_cuda              19.79 s     -0.07%      N/A         1864.91 J   -0.14%      
# k3mm_seq_tuning        4.90 s      -0.90%      N/A         686.17 J    -0.17%      
# mvt_numpy              2.43 s      -0.09%      N/A         247.61 J    -0.17%      
# mvt_omp                2.74 s      -0.23%      N/A         284.54 J    -0.19%      
# mvt_cuda               3.36 s      -0.32%      N/A         342.50 J    -0.23%      
# mvt_seq_tuning         2.74 s      +0.01%      N/A         284.26 J    +0.02%      
# symm_numpy             794.75 ms   -0.57%      N/A         81.74 J     -0.44%      
- symm_omp               6.06 s      +29.35%     N/A         595.37 J    +28.66%     
- symm_seq_tuning        8.31 s      +20.20%     N/A         749.91 J    +20.80%     
# syr2k_numpy            894.33 ms   -0.11%      N/A         90.93 J     -0.02%      
- syr2k_omp              9.85 s      +35.42%     N/A         935.82 J    +34.66%     
# syr2k_cuda             1.64 s      +7.62%      N/A         170.26 J    +6.89%      
- syr2k_seq_tuning       9.84 s      +36.06%     N/A         935.45 J    +35.17%     
# syrk_numpy             789.79 ms   +0.70%      N/A         80.95 J     +0.57%      
- syrk_omp               5.97 s      +31.05%     N/A         573.47 J    +29.89%     
- syrk_cuda              1.53 s      +12.48%     N/A         159.62 J    +10.93%     
- syrk_seq_tuning        5.98 s      +31.38%     N/A         574.51 J    +30.14%     
# trmm_numpy             880.51 ms   +0.39%      N/A         89.76 J     +0.50%      
# trmm_omp               711.70 ms   +0.01%      N/A         89.98 J     -0.63%      
# trmm_seq_tuning        3.38 s      +0.11%      N/A         276.75 J    -0.20%      

This benchmark does not use the harness as it
only compiles the backbone. The ROI is not
supported by Torch-MLIR.
@NoraHagmeyer NoraHagmeyer merged commit 6e2e054 into main Apr 10, 2026
25 checks passed
@NoraHagmeyer NoraHagmeyer deleted the fastcnn branch April 10, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants