Skip to content

Use im2row/im2col as expansion of convolution node#613

Merged
Moehre2 merged 14 commits intomainfrom
im2col
Mar 30, 2026
Merged

Use im2row/im2col as expansion of convolution node#613
Moehre2 merged 14 commits intomainfrom
im2col

Conversation

@Moehre2
Copy link
Copy Markdown
Contributor

@Moehre2 Moehre2 commented Mar 26, 2026

No description provided.

@Moehre2 Moehre2 self-assigned this Mar 26, 2026
@daisytuner
Copy link
Copy Markdown

daisytuner bot commented Mar 26, 2026

Daisytuner Report - mlir_torch_models (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# bn_conv_bn_relu_maxpool_torch18.55 s     +0.43%      N/A         3466.93 J   +0.32%      
+ bn_conv_bn_relu_maxpool_run_none3.27 s      -28.80%     N/A         630.03 J    -22.62%     
+ bn_conv_bn_relu_maxpool_run_sequential3.31 s      -27.89%     N/A         641.00 J    -17.45%     
# bn_conv_bn_relu_maxpool_run_openmp3.41 s      +0.51%      N/A         675.79 J    +1.08%      
# bn_conv_bn_relu_maxpool_run_cuda3.83 s      +4.42%      N/A         718.79 J    +4.12%      

@daisytuner
Copy link
Copy Markdown

daisytuner bot commented Mar 26, 2026

Daisytuner Report - python_npbench (zinnia)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# adi_numpy              1.34 s      +1.82%      N/A         134.11 J    +1.74%      
# adi_omp                15.26 s     +0.07%      N/A         1501.86 J   -1.29%      
# adi_cuda               4.71 s      -0.53%      N/A         457.11 J    -0.59%      
# adi_seq_tuning         16.10 s     +0.23%      N/A         1520.52 J   +0.26%      
# atax_numpy             2.15 s      -0.12%      N/A         223.07 J    -0.13%      
# atax_omp               2.98 s      +0.48%      N/A         375.51 J    +0.32%      
# atax_cuda              4.11 s      -0.13%      N/A         423.39 J    -0.73%      
# atax_seq_tuning        3.73 s      -0.43%      N/A         377.46 J    -0.35%      
# gemm_numpy             1.23 s      -0.59%      N/A         198.34 J    -0.54%      
# gemm_omp               1.12 s      +0.20%      N/A         162.68 J    +0.11%      
# gemm_cuda              10.63 s     +0.39%      N/A         1012.41 J   +0.56%      
# gemm_seq_tuning        1.12 s      +0.17%      N/A         162.34 J    +0.31%      
# gesummv_numpy          1.75 s      +0.56%      N/A         250.37 J    +0.70%      
# gesummv_omp            2.02 s      +3.68%      N/A         319.44 J    +4.14%      
# gesummv_cuda           8.34 s      -0.45%      N/A         1002.55 J   -0.28%      
# gesummv_seq_tuning     6.66 s      -0.22%      N/A         813.28 J    -0.39%      
# gemver_numpy           1.08 s      +1.81%      N/A         166.98 J    +1.77%      
# gemver_omp             853.87 ms   -0.08%      N/A         111.19 J    +0.31%      
# gemver_cuda            3.90 s      +0.38%      N/A         390.01 J    -0.07%      
# gemver_seq_tuning      4.49 s      -0.63%      N/A         434.62 J    -0.50%      
# k2mm_numpy             1.20 s      -0.46%      N/A         197.38 J    -0.76%      
# k2mm_omp               3.56 s      +1.08%      N/A         661.74 J    +0.53%      
# k2mm_cuda              13.59 s     -0.03%      N/A         1289.56 J   +0.12%      
# k2mm_seq_tuning        3.64 s      +0.48%      N/A         469.41 J    +0.57%      
# k3mm_numpy             1.03 s      +0.19%      N/A         184.37 J    +0.18%      
# k3mm_omp               5.67 s      +1.11%      N/A         949.38 J    +0.17%      
# k3mm_cuda              19.84 s     -0.06%      N/A         1871.39 J   -0.05%      
# k3mm_seq_tuning        5.73 s      +0.12%      N/A         791.99 J    -0.12%      
# mvt_numpy              2.43 s      +0.04%      N/A         249.30 J    -0.06%      
# mvt_omp                2.75 s      -0.07%      N/A         285.12 J    -0.15%      
# mvt_cuda               3.35 s      +0.02%      N/A         342.36 J    +0.06%      
# mvt_seq_tuning         2.74 s      -0.07%      N/A         284.97 J    -0.02%      
# symm_numpy             780.33 ms   -1.26%      N/A         80.49 J     -1.20%      
# symm_omp               6.20 s      +0.65%      N/A         607.45 J    +0.14%      
# symm_seq_tuning        8.49 s      -0.06%      N/A         809.86 J    +0.02%      
# syr2k_numpy            879.11 ms   -0.92%      N/A         89.59 J     -0.83%      
# syr2k_omp              9.92 s      +0.09%      N/A         943.76 J    +0.13%      
# syr2k_cuda             1.65 s      +0.40%      N/A         171.44 J    +0.56%      
# syr2k_seq_tuning       9.95 s      +0.86%      N/A         946.11 J    +0.79%      
# syrk_numpy             770.50 ms   -0.72%      N/A         79.44 J     -0.86%      
# syrk_omp               5.97 s      -0.35%      N/A         574.58 J    -0.32%      
# syrk_cuda              1.53 s      -0.78%      N/A         159.97 J    -0.78%      
# syrk_seq_tuning        5.96 s      +0.10%      N/A         573.76 J    +0.21%      
# trmm_numpy             874.55 ms   -1.10%      N/A         89.13 J     -0.97%      
# trmm_omp               708.47 ms   -2.14%      N/A         90.95 J     -0.38%      
# trmm_seq_tuning        3.41 s      +1.05%      N/A         325.21 J    +0.67%      

@daisytuner
Copy link
Copy Markdown

daisytuner bot commented Mar 26, 2026

Daisytuner Report - mlir_torch_layers (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# batchnorm_torch        18.86 s     -0.47%      N/A         3650.38 J   -1.46%      
# batchnorm_run_none     7.71 s      +1.24%      N/A         1473.06 J   +0.10%      
# batchnorm_run_sequential8.59 s      +0.63%      N/A         1634.17 J   -0.82%      
# batchnorm_run_openmp   4.72 s      +0.75%      N/A         1054.13 J   +0.25%      
# batchnorm_run_cuda     12.87 s     +1.29%      N/A         2463.58 J   +0.09%      
# conv2d_torch           18.50 s     -0.24%      N/A         3586.03 J   -1.18%      
# conv2d_run_openmp      4.31 s      -7.57%      N/A         995.61 J    -7.03%      
- conv2d_run_cuda        6.51 s      +32.76%     N/A         1257.76 J   +30.48%     
# linear_torch           6.19 s      +0.36%      N/A         1468.36 J   -0.92%      
# linear_run_none        10.06 s     +0.16%      N/A         2710.41 J   -0.57%      
# linear_run_sequential  8.61 s      +0.65%      N/A         2473.12 J   +0.12%      
# linear_run_openmp      8.12 s      +0.65%      N/A         2332.28 J   +0.06%      
# linear_run_cuda        7.40 s      +0.37%      N/A         1432.40 J   -0.03%      
# matmul_torch           6.24 s      +1.09%      N/A         1486.14 J   -0.13%      
# matmul_run_none        10.10 s     +0.69%      N/A         2727.01 J   +0.35%      
# matmul_run_sequential  8.53 s      -0.04%      N/A         2451.84 J   -0.53%      
# matmul_run_openmp      8.05 s      -1.26%      N/A         2316.29 J   -2.63%      
# matmul_run_cuda        7.31 s      +0.72%      N/A         1407.92 J   -0.68%      
# pooling_torch          25.69 s     +1.55%      N/A         5009.32 J   +0.52%      
# pooling_run_none       17.50 s     +0.86%      N/A         3312.98 J   -0.24%      
# pooling_run_sequential 17.56 s     +1.37%      N/A         3320.70 J   +0.18%      
# pooling_run_openmp     9.95 s      +2.27%      N/A         2017.41 J   +1.08%      
# pooling_run_cuda       23.66 s     +0.52%      N/A         4542.27 J   -0.59%      
# relu_torch             19.11 s     +1.61%      N/A         3700.58 J   +0.49%      
# relu_run_none          4.43 s      +1.21%      N/A         856.02 J    +0.14%      
# relu_run_sequential    4.48 s      +2.14%      N/A         866.97 J    +1.18%      
# relu_run_openmp        3.75 s      +1.22%      N/A         771.21 J    -0.27%      
# relu_run_cuda          6.04 s      +0.75%      N/A         1169.75 J   -0.77%      

@Moehre2 Moehre2 requested review from Atrisan and lukastruemper and removed request for lukastruemper March 30, 2026 07:34
@Moehre2 Moehre2 merged commit b3c11cd into main Mar 30, 2026
28 checks passed
@Moehre2 Moehre2 deleted the im2col branch March 30, 2026 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants