Skip to content

adds memlet simplification for contiguous memory accesses#653

Merged
lukastruemper merged 1 commit intomainfrom
memlet-simplification
Apr 7, 2026
Merged

adds memlet simplification for contiguous memory accesses#653
lukastruemper merged 1 commit intomainfrom
memlet-simplification

Conversation

@lukastruemper
Copy link
Copy Markdown
Contributor

No description provided.

@lukastruemper lukastruemper force-pushed the memlet-simplification branch from 8fde0da to cfe8b8d Compare April 7, 2026 19:58
@daisytuner
Copy link
Copy Markdown

daisytuner bot commented Apr 7, 2026

Daisytuner Report - mlir_torch_models (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# bn_conv_bn_relu_maxpool_torch18.66 s     +0.12%      N/A         3648.74 J   +4.69%      
# bn_conv_bn_relu_maxpool_run_none3.26 s      -0.73%      N/A         661.98 J    +3.69%      
# bn_conv_bn_relu_maxpool_run_sequential3.29 s      -1.00%      N/A         670.81 J    +3.67%      
# bn_conv_bn_relu_maxpool_run_openmp3.40 s      +4.63%      N/A         696.97 J    +8.73%      
# bn_conv_bn_relu_maxpool_run_cuda3.70 s      +0.17%      N/A         729.45 J    +4.84%      

@lukastruemper lukastruemper force-pushed the memlet-simplification branch from cfe8b8d to acaf1c0 Compare April 7, 2026 21:32
@daisytuner
Copy link
Copy Markdown

daisytuner bot commented Apr 7, 2026

Daisytuner Report - mlir_torch_layers (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# batchnorm_torch        19.06 s     +0.58%      N/A         3685.99 J   -0.87%      
# batchnorm_run_none     6.23 s      -3.49%      N/A         1195.00 J   -4.89%      
# batchnorm_run_sequential6.57 s      -0.33%      N/A         1256.77 J   -1.93%      
# batchnorm_run_openmp   5.78 s      -0.88%      N/A         1343.43 J   -1.98%      
# batchnorm_run_cuda     8.13 s      -0.33%      N/A         1568.42 J   -1.82%      
# conv2d_torch           18.61 s     -1.03%      N/A         3605.21 J   -2.37%      
# conv2d_run_openmp      5.04 s      +3.30%      N/A         1211.14 J   +1.59%      
# conv2d_run_cuda        7.83 s      -0.55%      N/A         1510.92 J   -2.09%      
# linear_torch           6.10 s      +0.23%      N/A         1451.16 J   -0.93%      
# linear_run_none        11.71 s     +0.36%      N/A         3176.41 J   +0.01%      
# linear_run_sequential  10.03 s     +0.55%      N/A         2761.98 J   -0.26%      
# linear_run_openmp      9.82 s      -0.50%      N/A         2867.51 J   -0.99%      
# linear_run_cuda        9.29 s      +0.21%      N/A         1799.27 J   -0.64%      
# matmul_torch           6.12 s      +1.10%      N/A         1456.12 J   -0.26%      
# matmul_run_none        11.65 s     +2.36%      N/A         3152.51 J   +1.35%      
# matmul_run_sequential  9.94 s      +0.47%      N/A         2752.06 J   -0.16%      
# matmul_run_openmp      9.89 s      +1.43%      N/A         2899.81 J   +1.02%      
# matmul_run_cuda        9.11 s      +0.44%      N/A         1760.64 J   -0.76%      
# pooling_torch          26.37 s     +2.82%      N/A         5189.56 J   +1.38%      
# pooling_run_none       25.68 s     +0.99%      N/A         4849.18 J   -0.64%      
# pooling_run_sequential 25.92 s     -0.05%      N/A         4902.93 J   -1.60%      
# pooling_run_openmp     17.45 s     +0.90%      N/A         3665.90 J   -0.66%      
# pooling_run_cuda       31.73 s     +1.18%      N/A         6101.62 J   -0.31%      
# relu_torch             18.95 s     +0.32%      N/A         3653.47 J   -1.54%      
# relu_run_none          5.26 s      -0.14%      N/A         1012.56 J   -1.62%      
# relu_run_sequential    6.38 s      +0.39%      N/A         1224.22 J   -1.01%      
# relu_run_openmp        5.70 s      -0.25%      N/A         1320.24 J   -2.17%      
# relu_run_cuda          8.40 s      +0.81%      N/A         1624.21 J   -0.43%      

@lukastruemper lukastruemper merged commit c79215a into main Apr 7, 2026
15 of 18 checks passed
@lukastruemper lukastruemper deleted the memlet-simplification branch April 7, 2026 21:47
@daisytuner
Copy link
Copy Markdown

daisytuner bot commented Apr 7, 2026

Daisytuner Report - python_npbench (zinnia)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# adi_numpy              1.33 s      +1.32%      N/A         132.70 J    +1.32%      
# adi_omp                14.80 s     +0.41%      N/A         1451.45 J   +0.01%      
# adi_cuda               4.68 s      -0.30%      N/A         454.20 J    -0.18%      
# adi_seq_tuning         14.95 s     +0.06%      N/A         1388.08 J   +0.11%      
# atax_numpy             2.16 s      -0.58%      N/A         223.79 J    -0.71%      
# atax_omp               3.02 s      +0.29%      N/A         383.20 J    +0.89%      
# atax_cuda              4.12 s      +0.39%      N/A         424.58 J    +0.46%      
# atax_seq_tuning        4.15 s      +1.12%      N/A         401.92 J    +1.04%      
# gemm_numpy             1.21 s      +0.46%      N/A         193.55 J    +0.20%      
# gemm_omp               1.11 s      -0.08%      N/A         162.66 J    +0.18%      
# gemm_cuda              10.58 s     -0.42%      N/A         1005.09 J   -0.52%      
# gemm_seq_tuning        1.11 s      -0.30%      N/A         161.56 J    -0.08%      
# gesummv_numpy          1.73 s      -1.29%      N/A         245.71 J    -1.52%      
# gesummv_omp            1.94 s      -7.62%      N/A         305.60 J    -8.94%      
# gesummv_cuda           8.30 s      +0.64%      N/A         998.37 J    +0.56%      
# gesummv_seq_tuning     8.58 s      -3.03%      N/A         971.23 J    -1.03%      
# gemver_numpy           1.08 s      -0.08%      N/A         165.54 J    -0.07%      
# gemver_omp             867.30 ms   +0.14%      N/A         113.50 J    +0.28%      
# gemver_cuda            3.87 s      -0.02%      N/A         386.48 J    -0.12%      
# gemver_seq_tuning      5.51 s      +0.39%      N/A         496.93 J    +1.06%      
# k2mm_numpy             1.19 s      -0.32%      N/A         194.52 J    -0.53%      
# k2mm_omp               3.49 s      -0.49%      N/A         651.42 J    -0.88%      
# k2mm_cuda              13.56 s     -0.22%      N/A         1286.19 J   -0.16%      
# k2mm_seq_tuning        2.95 s      -2.62%      N/A         392.00 J    -1.18%      
# k3mm_numpy             1.02 s      -0.31%      N/A         181.42 J    +0.01%      
# k3mm_omp               5.59 s      +0.39%      N/A         962.18 J    +2.65%      
# k3mm_cuda              19.77 s     -0.30%      N/A         1866.09 J   -0.19%      
# k3mm_seq_tuning        5.28 s      -1.49%      N/A         746.64 J    -0.53%      
# mvt_numpy              2.42 s      -0.11%      N/A         246.92 J    -0.11%      
# mvt_omp                2.74 s      -0.17%      N/A         284.46 J    -0.12%      
# mvt_cuda               3.36 s      +0.15%      N/A         342.76 J    +0.12%      
# mvt_seq_tuning         2.74 s      -0.05%      N/A         284.43 J    -0.04%      
# symm_numpy             785.07 ms   -1.54%      N/A         80.78 J     -1.45%      
# symm_omp               6.09 s      +0.23%      N/A         603.02 J    +1.40%      
# symm_seq_tuning        8.22 s      -0.18%      N/A         742.50 J    -0.06%      
# syr2k_numpy            890.55 ms   +0.90%      N/A         90.42 J     +0.78%      
# syr2k_omp              9.85 s      +0.11%      N/A         937.02 J    +0.20%      
# syr2k_cuda             1.63 s      -1.12%      N/A         169.47 J    -1.05%      
# syr2k_seq_tuning       9.82 s      +0.02%      N/A         933.93 J    +0.16%      
# syrk_numpy             771.45 ms   +0.30%      N/A         79.36 J     +0.35%      
# syrk_omp               5.94 s      -1.43%      N/A         571.41 J    -1.30%      
# syrk_cuda              1.53 s      +0.17%      N/A         159.95 J    +0.31%      
# syrk_seq_tuning        5.98 s      +0.48%      N/A         575.31 J    +0.53%      
# trmm_numpy             877.73 ms   +0.33%      N/A         89.50 J     +0.24%      
# trmm_omp               707.48 ms   -1.60%      N/A         90.47 J     -0.42%      
# trmm_seq_tuning        3.39 s      -0.29%      N/A         276.74 J    -0.58%      

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant