Skip to content

adds LoopRotate transformation#640

Merged
lukastruemper merged 1 commit into
mainfrom
loop-rotate
Apr 3, 2026
Merged

adds LoopRotate transformation#640
lukastruemper merged 1 commit into
mainfrom
loop-rotate

Conversation

@lukastruemper
Copy link
Copy Markdown
Contributor

No description provided.

@lukastruemper lukastruemper merged commit 8dfccaa into main Apr 3, 2026
15 of 18 checks passed
@lukastruemper lukastruemper deleted the loop-rotate branch April 3, 2026 15:58
@daisytuner
Copy link
Copy Markdown

daisytuner Bot commented Apr 3, 2026

Daisytuner Report - mlir_torch_models (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# bn_conv_bn_relu_maxpool_torch18.72 s     +0.30%      N/A         3614.85 J   -0.95%      
# bn_conv_bn_relu_maxpool_run_none3.27 s      +0.51%      N/A         657.20 J    -0.45%      
# bn_conv_bn_relu_maxpool_run_sequential3.29 s      -0.21%      N/A         664.88 J    -1.25%      
# bn_conv_bn_relu_maxpool_run_openmp3.38 s      +3.17%      N/A         690.93 J    +2.81%      
# bn_conv_bn_relu_maxpool_run_cuda3.70 s      -0.28%      N/A         724.29 J    -1.27%      

@daisytuner
Copy link
Copy Markdown

daisytuner Bot commented Apr 3, 2026

Daisytuner Report - python_npbench (zinnia)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# adi_numpy              1.31 s      -2.26%      N/A         130.89 J    -1.94%      
# adi_omp                15.04 s     -1.42%      N/A         1490.30 J   -1.60%      
# adi_cuda               4.72 s      -0.24%      N/A         457.83 J    -0.27%      
# adi_seq_tuning         16.00 s     -0.20%      N/A         1512.42 J   -0.05%      
# atax_numpy             2.16 s      +0.20%      N/A         223.39 J    +0.31%      
# atax_omp               2.99 s      +0.56%      N/A         377.74 J    +0.67%      
# atax_cuda              4.11 s      +0.27%      N/A         423.28 J    +0.25%      
# atax_seq_tuning        3.72 s      -0.46%      N/A         376.86 J    -0.43%      
# gemm_numpy             1.21 s      -1.06%      N/A         192.57 J    -1.02%      
# gemm_omp               1.11 s      -0.14%      N/A         161.64 J    -0.01%      
# gemm_cuda              10.58 s     -0.43%      N/A         1006.31 J   -0.38%      
# gemm_seq_tuning        1.11 s      -0.00%      N/A         161.68 J    +0.15%      
# gesummv_numpy          1.75 s      +0.65%      N/A         249.81 J    +0.52%      
# gesummv_omp            1.95 s      -1.59%      N/A         305.92 J    -2.13%      
# gesummv_cuda           8.32 s      -0.38%      N/A         1000.51 J   -0.36%      
# gesummv_seq_tuning     6.67 s      +0.29%      N/A         814.08 J    +0.03%      
# gemver_numpy           1.08 s      +0.20%      N/A         166.61 J    +0.08%      
# gemver_omp             862.62 ms   +0.41%      N/A         112.68 J    +1.07%      
# gemver_cuda            3.86 s      +0.11%      N/A         386.34 J    -0.07%      
# gemver_seq_tuning      4.49 s      -0.13%      N/A         434.78 J    -0.09%      
# k2mm_numpy             1.20 s      +0.23%      N/A         195.93 J    +0.17%      
# k2mm_omp               3.56 s      +0.86%      N/A         663.85 J    +0.18%      
# k2mm_cuda              13.56 s     -0.44%      N/A         1286.59 J   -0.45%      
# k2mm_seq_tuning        3.64 s      +0.60%      N/A         469.79 J    +0.75%      
# k3mm_numpy             1.03 s      +0.34%      N/A         182.41 J    +0.03%      
# k3mm_omp               5.55 s      -0.33%      N/A         953.94 J    +0.22%      
# k3mm_cuda              19.80 s     +0.02%      N/A         1869.73 J   +0.16%      
# k3mm_seq_tuning        5.75 s      +0.79%      N/A         800.62 J    +1.15%      
# mvt_numpy              2.43 s      -0.04%      N/A         249.22 J    +0.02%      
# mvt_omp                2.74 s      -0.06%      N/A         284.40 J    +0.04%      
# mvt_cuda               3.34 s      -0.28%      N/A         341.16 J    -0.36%      
# mvt_seq_tuning         2.74 s      -0.13%      N/A         284.19 J    -0.13%      
# symm_numpy             784.41 ms   +0.59%      N/A         80.68 J     +0.49%      
# symm_omp               6.06 s      +0.09%      N/A         596.79 J    -0.15%      
# symm_seq_tuning        8.43 s      -0.26%      N/A         804.59 J    -0.20%      
# syr2k_numpy            887.89 ms   -0.21%      N/A         90.39 J     +0.06%      
# syr2k_omp              9.86 s      -0.40%      N/A         937.39 J    -0.35%      
# syr2k_cuda             1.62 s      -0.95%      N/A         168.76 J    -1.10%      
# syr2k_seq_tuning       9.82 s      -0.23%      N/A         934.56 J    -0.21%      
# syrk_numpy             770.40 ms   -0.45%      N/A         79.42 J     -0.34%      
# syrk_omp               5.96 s      +0.09%      N/A         574.08 J    +0.19%      
# syrk_cuda              1.51 s      +0.56%      N/A         158.11 J    +0.51%      
# syrk_seq_tuning        5.94 s      -0.36%      N/A         572.14 J    -0.23%      
# trmm_numpy             874.80 ms   -0.67%      N/A         89.18 J     -0.89%      
# trmm_omp               694.93 ms   -0.21%      N/A         89.66 J     +0.15%      
# trmm_seq_tuning        3.39 s      -0.78%      N/A         324.09 J    -0.68%      

@daisytuner
Copy link
Copy Markdown

daisytuner Bot commented Apr 3, 2026

Daisytuner Report - mlir_torch_layers (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# batchnorm_torch        18.98 s     -0.14%      N/A         3720.84 J   +4.40%      
# batchnorm_run_none     6.19 s      -2.39%      N/A         1204.99 J   +2.41%      
# batchnorm_run_sequential6.49 s      -0.53%      N/A         1264.07 J   +4.24%      
# batchnorm_run_openmp   5.76 s      +0.70%      N/A         1357.99 J   +4.65%      
# batchnorm_run_cuda     8.07 s      -0.54%      N/A         1583.17 J   +4.89%      
# conv2d_torch           18.57 s     +0.15%      N/A         3648.18 J   +5.01%      
# conv2d_run_openmp      5.03 s      +0.57%      N/A         1228.26 J   +5.08%      
# conv2d_run_cuda        6.83 s      -0.36%      N/A         1339.46 J   +4.85%      
# linear_torch           6.09 s      -0.77%      N/A         1463.54 J   +1.47%      
# linear_run_none        11.69 s     +0.29%      N/A         3191.00 J   +2.78%      
# linear_run_sequential  9.92 s      -0.07%      N/A         2762.25 J   +3.43%      
# linear_run_openmp      9.71 s      -0.29%      N/A         2859.92 J   +2.90%      
# linear_run_cuda        9.21 s      -0.01%      N/A         1802.41 J   +5.88%      
# matmul_torch           6.08 s      -2.94%      N/A         1466.98 J   -0.01%      
# matmul_run_none        11.58 s     +0.15%      N/A         3149.69 J   +2.52%      
# matmul_run_sequential  9.96 s      -0.03%      N/A         2777.63 J   +3.99%      
# matmul_run_openmp      9.76 s      +0.05%      N/A         2877.49 J   +3.27%      
# matmul_run_cuda        9.03 s      -0.38%      N/A         1765.23 J   +4.85%      
# pooling_torch          25.62 s     -0.77%      N/A         5109.91 J   +3.71%      
# pooling_run_none       25.08 s     -1.39%      N/A         4812.43 J   +3.24%      
# pooling_run_sequential 25.73 s     -0.47%      N/A         4931.15 J   +4.12%      
# pooling_run_openmp     17.15 s     -1.41%      N/A         3660.41 J   +3.07%      
# pooling_run_cuda       31.18 s     -0.58%      N/A         6071.61 J   +4.46%      
# relu_torch             18.98 s     -0.19%      N/A         3717.68 J   +4.30%      
# relu_run_none          5.24 s      +0.08%      N/A         1021.21 J   +4.40%      
# relu_run_sequential    6.33 s      -0.63%      N/A         1232.34 J   +4.05%      
# relu_run_openmp        5.81 s      +1.58%      N/A         1360.20 J   +5.55%      
# relu_run_cuda          8.29 s      -0.91%      N/A         1624.15 J   +4.29%      

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant