Skip to content

allows cmath nodes to be removed by dead data elimination#656

Merged
lukastruemper merged 1 commit intomainfrom
cmath-removal
Apr 8, 2026
Merged

allows cmath nodes to be removed by dead data elimination#656
lukastruemper merged 1 commit intomainfrom
cmath-removal

Conversation

@lukastruemper
Copy link
Copy Markdown
Contributor

No description provided.

@daisytuner
Copy link
Copy Markdown

daisytuner bot commented Apr 8, 2026

Daisytuner Report - mlir_torch_models (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# bn_conv_bn_relu_maxpool_torch18.54 s     -0.67%      N/A         3568.77 J   -0.70%      
# bn_conv_bn_relu_maxpool_run_none3.25 s      -0.23%      N/A         653.03 J    -0.23%      
# bn_conv_bn_relu_maxpool_run_sequential3.28 s      -0.62%      N/A         661.81 J    -0.35%      
# bn_conv_bn_relu_maxpool_run_openmp3.24 s      -0.94%      N/A         655.95 J    -1.20%      
# bn_conv_bn_relu_maxpool_run_cuda3.66 s      -0.54%      N/A         715.60 J    -0.45%      

@lukastruemper lukastruemper merged commit 75a2551 into main Apr 8, 2026
30 of 33 checks passed
@lukastruemper lukastruemper deleted the cmath-removal branch April 8, 2026 10:39
@daisytuner
Copy link
Copy Markdown

daisytuner bot commented Apr 8, 2026

Daisytuner Report - mlir_torch_layers (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# batchnorm_torch        19.27 s     -0.17%      N/A         3777.56 J   -0.21%      
# batchnorm_run_none     6.19 s      -2.58%      N/A         1205.34 J   -2.48%      
# batchnorm_run_sequential6.57 s      +0.27%      N/A         1281.31 J   +0.53%      
# batchnorm_run_openmp   5.70 s      -0.75%      N/A         1343.12 J   -0.75%      
# batchnorm_run_cuda     8.17 s      +1.10%      N/A         1597.30 J   +0.75%      
# conv2d_torch           18.57 s     -0.21%      N/A         3650.62 J   -0.18%      
# conv2d_run_openmp      5.02 s      +1.72%      N/A         1217.26 J   +1.37%      
# conv2d_run_cuda        7.68 s      -0.60%      N/A         1502.47 J   -0.65%      
# linear_torch           6.07 s      -0.72%      N/A         1462.06 J   -1.58%      
# linear_run_none        11.60 s     -0.08%      N/A         3168.86 J   +0.20%      
# linear_run_sequential  9.97 s      -0.32%      N/A         2773.51 J   -0.22%      
# linear_run_openmp      9.73 s      -0.82%      N/A         2857.05 J   -0.77%      
# linear_run_cuda        9.18 s      -0.52%      N/A         1801.42 J   +0.04%      
# matmul_torch           6.12 s      -1.24%      N/A         1491.80 J   -0.56%      
# matmul_run_none        11.63 s     -0.02%      N/A         3172.11 J   +0.10%      
# matmul_run_sequential  9.88 s      -0.59%      N/A         2762.86 J   -0.30%      
# matmul_run_openmp      9.79 s      -0.47%      N/A         2889.17 J   -0.03%      
# matmul_run_cuda        9.03 s      -0.19%      N/A         1769.75 J   +0.11%      
# pooling_torch          25.66 s     +0.06%      N/A         5121.09 J   +0.00%      
# pooling_run_none       25.06 s     +0.00%      N/A         4810.49 J   -0.17%      
# pooling_run_sequential 25.69 s     +0.20%      N/A         4933.67 J   +0.47%      
# pooling_run_openmp     17.06 s     -0.12%      N/A         3634.88 J   +0.04%      
# pooling_run_cuda       31.23 s     -0.13%      N/A         6086.43 J   -0.14%      
# relu_torch             18.90 s     -0.26%      N/A         3713.57 J   -0.22%      
# relu_run_none          5.21 s      -1.22%      N/A         1019.75 J   -1.20%      
+ relu_run_sequential    5.30 s      -17.10%     N/A         1034.55 J   -16.75%     
# relu_run_openmp        5.44 s      -4.89%      N/A         1266.30 J   -5.76%      
+ relu_run_cuda          5.93 s      -28.65%     N/A         1169.02 J   -28.28%     

@daisytuner
Copy link
Copy Markdown

daisytuner bot commented Apr 8, 2026

Daisytuner Report - python_npbench (zinnia)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# adi_numpy              1.35 s      +3.24%      N/A         134.23 J    +2.84%      
# adi_omp                14.92 s     +1.45%      N/A         1497.27 J   +3.27%      
# adi_cuda               4.70 s      +0.65%      N/A         455.59 J    +0.63%      
# adi_seq_tuning         15.16 s     -0.32%      N/A         1403.78 J   +0.27%      
# atax_numpy             2.17 s      +0.15%      N/A         224.37 J    +0.21%      
# atax_omp               2.98 s      +0.63%      N/A         377.04 J    +1.61%      
# atax_cuda              4.14 s      +0.22%      N/A         425.65 J    +0.20%      
# atax_seq_tuning        4.14 s      -0.44%      N/A         398.81 J    -0.65%      
# gemm_numpy             1.21 s      -0.46%      N/A         193.38 J    -0.64%      
# gemm_omp               1.11 s      +0.13%      N/A         161.65 J    -0.22%      
# gemm_cuda              10.62 s     +0.34%      N/A         1009.86 J   +0.42%      
# gemm_seq_tuning        1.11 s      -0.21%      N/A         161.42 J    -0.25%      
# gesummv_numpy          1.75 s      -0.07%      N/A         248.75 J    -0.20%      
# gesummv_omp            2.01 s      +0.64%      N/A         315.27 J    +0.08%      
# gesummv_cuda           8.39 s      +1.07%      N/A         1006.06 J   +0.83%      
# gesummv_seq_tuning     8.70 s      +1.43%      N/A         977.17 J    +0.81%      
# gemver_numpy           1.09 s      +1.00%      N/A         166.57 J    +0.96%      
# gemver_omp             833.43 ms   -4.13%      N/A         105.53 J    -7.08%      
# gemver_cuda            3.87 s      -0.16%      N/A         387.05 J    -0.16%      
# gemver_seq_tuning      5.54 s      +0.97%      N/A         497.49 J    +0.82%      
# k2mm_numpy             1.20 s      +0.14%      N/A         195.82 J    +0.06%      
# k2mm_omp               3.54 s      -1.49%      N/A         663.48 J    -0.85%      
# k2mm_cuda              13.58 s     +0.05%      N/A         1287.68 J   +0.00%      
# k2mm_seq_tuning        3.06 s      +3.90%      N/A         400.66 J    +2.47%      
# k3mm_numpy             1.03 s      +0.63%      N/A         182.03 J    +0.57%      
# k3mm_omp               5.59 s      +0.72%      N/A         955.12 J    +0.01%      
# k3mm_cuda              19.81 s     -0.21%      N/A         1867.42 J   -0.41%      
# k3mm_seq_tuning        5.26 s      +0.53%      N/A         751.60 J    +0.71%      
# mvt_numpy              2.42 s      -0.60%      N/A         246.60 J    -0.96%      
# mvt_omp                2.74 s      -0.04%      N/A         284.21 J    -0.14%      
# mvt_cuda               3.36 s      +0.08%      N/A         342.77 J    -0.04%      
# mvt_seq_tuning         2.74 s      -0.53%      N/A         283.75 J    -0.68%      
# symm_numpy             787.99 ms   +0.89%      N/A         81.18 J     +0.95%      
# symm_omp               6.09 s      -0.37%      N/A         593.87 J    -2.03%      
# symm_seq_tuning        8.26 s      -0.69%      N/A         746.29 J    -0.79%      
# syr2k_numpy            900.31 ms   +0.39%      N/A         91.46 J     +0.38%      
# syr2k_omp              9.83 s      -0.63%      N/A         933.05 J    -0.70%      
# syr2k_cuda             1.65 s      +0.20%      N/A         171.18 J    +0.01%      
# syr2k_seq_tuning       9.86 s      -0.22%      N/A         935.65 J    -0.36%      
# syrk_numpy             784.02 ms   +1.04%      N/A         80.53 J     +1.04%      
# syrk_omp               5.96 s      +0.12%      N/A         572.71 J    +0.08%      
# syrk_cuda              1.51 s      -0.19%      N/A         158.13 J    -0.42%      
# syrk_seq_tuning        5.92 s      -1.47%      N/A         569.70 J    -1.37%      
# trmm_numpy             882.85 ms   +0.43%      N/A         89.84 J     +0.51%      
# trmm_omp               705.02 ms   -0.74%      N/A         89.83 J     -0.51%      
# trmm_seq_tuning        3.45 s      +0.05%      N/A         277.93 J    -0.70%      

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant