Skip to content

[Python] Adds np.einsum support#617

Merged
lukastruemper merged 4 commits intomainfrom
einsum-numpy
Mar 30, 2026
Merged

[Python] Adds np.einsum support#617
lukastruemper merged 4 commits intomainfrom
einsum-numpy

Conversation

@lukastruemper
Copy link
Copy Markdown
Contributor

No description provided.

@daisytuner
Copy link
Copy Markdown

daisytuner bot commented Mar 27, 2026

Daisytuner Report - mlir_torch_models (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# bn_conv_bn_relu_maxpool_torch18.50 s     -1.30%      N/A         3579.25 J   -1.55%      
# bn_conv_bn_relu_maxpool_run_none3.26 s      -0.19%      N/A         646.28 J    -0.84%      
# bn_conv_bn_relu_maxpool_run_sequential3.29 s      -0.06%      N/A         657.63 J    -0.38%      
# bn_conv_bn_relu_maxpool_run_openmp3.38 s      -0.16%      N/A         686.80 J    -1.33%      
# bn_conv_bn_relu_maxpool_run_cuda3.82 s      +0.31%      N/A         743.12 J    -0.02%      

@daisytuner
Copy link
Copy Markdown

daisytuner bot commented Mar 27, 2026

Daisytuner Report - mlir_torch_layers (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# batchnorm_torch        18.94 s     +0.21%      N/A         3688.52 J   -0.35%      
# batchnorm_run_none     7.86 s      +3.75%      N/A         1487.19 J   +1.25%      
# batchnorm_run_sequential8.60 s      +1.39%      N/A         1650.77 J   +0.53%      
# batchnorm_run_openmp   4.91 s      +4.99%      N/A         1107.43 J   +5.08%      
# batchnorm_run_cuda     13.02 s     +2.25%      N/A         2486.36 J   +0.39%      
# conv2d_torch           18.93 s     +2.29%      N/A         3631.69 J   +0.13%      
# conv2d_run_openmp      4.48 s      +4.74%      N/A         1029.51 J   +1.85%      
# conv2d_run_cuda        6.43 s      +1.30%      N/A         1243.96 J   -0.19%      
# linear_torch           6.31 s      +0.07%      N/A         1511.57 J   -0.85%      
# linear_run_none        10.00 s     -0.68%      N/A         2655.21 J   -2.76%      
# linear_run_sequential  8.50 s      +0.41%      N/A         2393.68 J   -1.96%      
# linear_run_openmp      8.19 s      +1.23%      N/A         2333.59 J   -0.81%      
# linear_run_cuda        7.52 s      +1.70%      N/A         1439.79 J   -0.70%      
# matmul_torch           6.26 s      +1.49%      N/A         1489.83 J   -0.12%      
# matmul_run_none        10.07 s     +0.39%      N/A         2701.81 J   -0.69%      
# matmul_run_sequential  8.64 s      +1.55%      N/A         2455.30 J   -0.13%      
# matmul_run_openmp      8.29 s      +1.83%      N/A         2376.84 J   +0.48%      
# matmul_run_cuda        7.36 s      +1.97%      N/A         1416.67 J   +0.48%      
# pooling_torch          26.27 s     -1.54%      N/A         5002.28 J   -4.68%      
# pooling_run_none       17.45 s     +0.83%      N/A         3325.42 J   +0.24%      
# pooling_run_sequential 17.47 s     +1.60%      N/A         3257.37 J   -1.13%      
# pooling_run_openmp     10.16 s     +4.51%      N/A         2059.36 J   +3.23%      
# pooling_run_cuda       23.82 s     +1.74%      N/A         4540.77 J   -0.56%      
# relu_torch             19.26 s     +1.24%      N/A         3681.31 J   -1.06%      
# relu_run_none          4.54 s      +2.14%      N/A         865.65 J    -0.26%      
# relu_run_sequential    4.60 s      +4.78%      N/A         877.84 J    +2.58%      
# relu_run_openmp        4.04 s      +9.27%      N/A         858.70 J    +12.11%     
# relu_run_cuda          6.22 s      +3.63%      N/A         1196.56 J   +1.68%      

@daisytuner
Copy link
Copy Markdown

daisytuner bot commented Mar 27, 2026

Daisytuner Report - python_npbench (zinnia)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# adi_numpy              1.33 s      +1.47%      N/A         132.20 J    +1.25%      
# adi_omp                15.02 s     -0.57%      N/A         1459.27 J   -2.43%      
# adi_cuda               4.72 s      +0.16%      N/A         457.62 J    +0.01%      
# adi_seq_tuning         15.98 s     -0.90%      N/A         1507.15 J   -0.95%      
# atax_numpy             2.16 s      -0.00%      N/A         223.93 J    -0.02%      
# atax_omp               2.95 s      -1.75%      N/A         371.03 J    -3.14%      
# atax_cuda              4.11 s      -0.82%      N/A         422.86 J    -0.78%      
# atax_seq_tuning        3.72 s      -0.97%      N/A         377.38 J    -1.32%      
# gemm_numpy             1.20 s      -1.31%      N/A         191.92 J    -1.25%      
# gemm_omp               1.12 s      -0.01%      N/A         162.34 J    +0.32%      
# gemm_cuda              10.66 s     +0.68%      N/A         1013.52 J   +0.69%      
# gemm_seq_tuning        1.11 s      -0.06%      N/A         161.79 J    -0.09%      
# gesummv_numpy          1.75 s      +0.35%      N/A         249.12 J    +0.28%      
# gesummv_omp            1.99 s      +1.14%      N/A         313.22 J    +1.64%      
# gesummv_cuda           8.29 s      -0.50%      N/A         996.46 J    -0.45%      
# gesummv_seq_tuning     6.66 s      -0.02%      N/A         813.35 J    -0.07%      
# gemver_numpy           1.08 s      +0.44%      N/A         166.11 J    +0.44%      
# gemver_omp             873.38 ms   +1.34%      N/A         114.92 J    +3.04%      
# gemver_cuda            3.85 s      -1.23%      N/A         385.92 J    -0.79%      
# gemver_seq_tuning      4.49 s      -0.01%      N/A         434.54 J    -0.03%      
# k2mm_numpy             1.19 s      +0.32%      N/A         195.19 J    +0.13%      
# k2mm_omp               3.58 s      -0.62%      N/A         662.99 J    -1.34%      
# k2mm_cuda              13.53 s     -0.04%      N/A         1282.59 J   -0.10%      
# k2mm_seq_tuning        3.63 s      +0.52%      N/A         466.86 J    -0.17%      
# k3mm_numpy             1.03 s      +0.05%      N/A         181.54 J    -0.21%      
# k3mm_omp               5.54 s      -1.20%      N/A         947.29 J    -0.96%      
# k3mm_cuda              19.85 s     +0.09%      N/A         1871.36 J   +0.08%      
# k3mm_seq_tuning        5.72 s      -0.33%      N/A         790.67 J    -0.34%      
# mvt_numpy              2.44 s      +0.06%      N/A         249.54 J    +0.12%      
# mvt_omp                2.74 s      -0.11%      N/A         284.99 J    -0.03%      
# mvt_cuda               3.35 s      -0.28%      N/A         342.16 J    -0.34%      
# mvt_seq_tuning         2.74 s      -0.15%      N/A         284.15 J    -0.32%      
# symm_numpy             782.17 ms   -0.77%      N/A         80.52 J     -1.01%      
# symm_omp               6.15 s      +1.05%      N/A         608.72 J    +1.95%      
# symm_seq_tuning        8.48 s      -0.15%      N/A         807.86 J    -0.18%      
# syr2k_numpy            887.77 ms   -1.30%      N/A         90.15 J     -1.14%      
# syr2k_omp              9.86 s      -0.66%      N/A         936.99 J    -0.63%      
# syr2k_cuda             1.63 s      -1.07%      N/A         169.01 J    -0.93%      
# syr2k_seq_tuning       9.95 s      +0.48%      N/A         944.90 J    +0.48%      
# syrk_numpy             771.66 ms   +0.19%      N/A         79.50 J     +0.16%      
# syrk_omp               5.96 s      +0.07%      N/A         573.27 J    +0.08%      
# syrk_cuda              1.52 s      -1.00%      N/A         158.33 J    -0.89%      
# syrk_seq_tuning        5.96 s      -0.02%      N/A         572.68 J    -0.11%      
# trmm_numpy             882.29 ms   +0.06%      N/A         89.82 J     +0.04%      
# trmm_omp               699.43 ms   -1.39%      N/A         89.41 J     -1.49%      
# trmm_seq_tuning        3.37 s      -0.32%      N/A         322.50 J    -0.33%      

@lukastruemper lukastruemper merged commit ea3a278 into main Mar 30, 2026
19 of 21 checks passed
@lukastruemper lukastruemper deleted the einsum-numpy branch March 30, 2026 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants