Skip to content

Conversation

@avik-pal
Copy link
Collaborator

@avik-pal avik-pal commented Nov 7, 2025

needs new jll

@avik-pal avik-pal marked this pull request as draft November 7, 2025 20:15
@avik-pal avik-pal marked this pull request as ready for review November 9, 2025 18:44
@avik-pal avik-pal changed the title feat: missing upstream passes feat: missing upstream passes + rocm jll Nov 9, 2025
@avik-pal
Copy link
Collaborator Author

avik-pal commented Nov 9, 2025

Failed to precompile ReactantNNlibExt [1cc43b4b-f40c-58cf-bf58-d175afebd789] to "/root/.cache/julia-buildkite-plugin/depots/0190ad31-dfb1-4a4c-ac5b-02d05277ba6c/compiled/v1.10/ReactantNNlibExt/jl_Q7hUgT".
ERROR: LoadError: InitError: could not load library "/root/.cache/julia-buildkite-plugin/depots/0190ad31-dfb1-4a4c-ac5b-02d05277ba6c/artifacts/072462b34f0d79dff26380bbc8d566e4f6048033/lib/libReactantExtra.so"
librocm_sysdeps_dw.so.1: cannot open shared object file: No such file or directory

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reactant.jl Benchmarks

Benchmark suite Current: 5c0e420 Previous: 86bea8f Ratio
DeepONet ([64, 1024], [1, 128])/forward/CPU/Default 0.0025954100000000003 s 0.0025712 s 1.01
DeepONet ([64, 1024], [1, 128])/forward/CPU/DisableScatterGatherPad 0.002437588 s 0.002324076 s 1.05
DeepONet ([64, 1024], [1, 128])/backward/CPU/DisablePadAfterEnzyme 0.0061261760000000005 s 0.005635531 s 1.09
DeepONet ([64, 1024], [1, 128])/backward/CPU/DefaultAfterEnzyme 0.0060800870000000005 s 0.005918332 s 1.03
DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableScatterGatherPadBeforeEnzyme 0.006573224000000001 s 0.005868992 s 1.12
DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableScatterGatherPadAll 0.0062186920000000005 s 0.005969917000000001 s 1.04
DeepONet ([64, 1024], [1, 128])/backward/CPU/DisablePadBeforeEnzyme 0.006500447 s 0.005719778 s 1.14
DeepONet ([64, 1024], [1, 128])/backward/CPU/DisablePadAll 0.006505485 s 0.005898044000000001 s 1.10
DeepONet ([64, 1024], [1, 128])/forward/CPU/DisableScatterGather 0.002540848 s 0.002515556 s 1.01
DeepONet ([64, 1024], [1, 128])/backward/CPU/DefaultAll 0.006662922000000001 s 0.005925306 s 1.12
DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableTransposeReshapeAfterEnzyme 0.006241464 s 0.005697300000000001 s 1.10
DeepONet ([64, 1024], [1, 128])/forward/CPU/XLA 0.002386434 s 0.002469012 s 0.97
DeepONet ([64, 1024], [1, 128])/backward/CPU/XLA 0.006137005 s 0.005527595000000001 s 1.11
DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableScatterGatherAfterEnzyme 0.006352184 s 0.0058676050000000006 s 1.08
DeepONet ([64, 1024], [1, 128])/backward/CPU/DefaultBeforeEnzyme 0.006712335000000001 s 0.006050863 s 1.11
DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableTransposeReshapeBeforeEnzyme 0.006531435 s 0.0056281510000000005 s 1.16
DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableScatterGatherPadAfterEnzyme 0.006324708 s 0.005697946000000001 s 1.11
DeepONet ([64, 1024], [1, 128])/forward/CPU/DisablePad 0.0023203720000000002 s 0.002477824 s 0.94
DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableTransposeReshapeAll 0.0065091310000000005 s 0.0059030440000000005 s 1.10
DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableScatterGatherAll 0.006718002000000001 s 0.005717776 s 1.17
DeepONet ([64, 1024], [1, 128])/forward/CPU/DisableTransposeReshape 0.002258277 s 0.002280626 s 0.99
DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableScatterGatherBeforeEnzyme 0.006824253000000001 s 0.005962824 s 1.14
VGG11 bn=true [224, 224, 3, 4]/forward/CUDA/DisablePad 0.002061516 s 0.0020651890000000003 s 1.00
DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableScatterGatherAll 0.000681872 s 0.0006401400000000001 s 1.07
DeepONet ([64, 1024], [1, 128])/backward/CUDA/XLA 0.0008448790000000001 s 0.00077032 s 1.10
VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableTransposeReshapeAll 0.0071811900000000005 s 0.007178500000000001 s 1.00
DeepONet ([64, 1024], [1, 128])/backward/CUDA/DefaultBeforeEnzyme 0.000723888 s 0.0006608050000000001 s 1.10
FNO [64, 64, 1, 4]/backward/CUDA/DefaultAll 0.002950161 s 0.002965329 s 0.99
VGG11 bn=true [224, 224, 3, 4]/forward/CUDA/DisableScatterGatherPad 0.0020558910000000002 s 0.002049345 s 1.00
VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableTransposeReshapeBeforeEnzyme 0.007321437 s 0.007293340000000001 s 1.00
FNO [64, 64, 1, 4]/backward/CUDA/DefaultBeforeEnzyme 0.0030004180000000004 s 0.0029945640000000003 s 1.00
DeepONet ([64, 1024], [1, 128])/forward/CUDA/XLA 0.00031256400000000004 s 0.000336395 s 0.93
FNO [64, 64, 1, 4]/backward/CUDA/DisablePadBeforeEnzyme 0.0029962170000000002 s 0.0029657420000000004 s 1.01
FNO [64, 64, 1, 4]/backward/CUDA/DisableScatterGatherPadAll 0.0029803940000000004 s 0.002952564 s 1.01
FNO [64, 64, 1, 4]/forward/CUDA/DisablePad 0.0010878700000000001 s 0.001090559 s 1.00
VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisablePadBeforeEnzyme 0.0072776360000000005 s 0.007267997000000001 s 1.00
VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableScatterGatherPadBeforeEnzyme 0.007264527000000001 s 0.007312644 s 0.99
VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableScatterGatherAfterEnzyme 0.0072111020000000005 s 0.007179213 s 1.00
FNO [64, 64, 1, 4]/forward/CUDA/DisableScatterGatherPad 0.0011089840000000001 s 0.0011024140000000001 s 1.01
DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisablePadAll 0.000673258 s 0.000638042 s 1.06
DeepONet ([64, 1024], [1, 128])/forward/CUDA/DisableTransposeReshape 0.00031367300000000004 s 0.000327728 s 0.96
ViT tiny [256, 256, 3, 4]/forward/CUDA/DisableScatterGatherPad 0.003182831 s 0.003228463 s 0.99
ViT tiny [256, 256, 3, 4]/forward/CUDA/DisableScatterGather 0.00318327 s 0.003220179 s 0.99
DeepONet ([64, 1024], [1, 128])/backward/CUDA/DefaultAfterEnzyme 0.0006913970000000001 s 0.000640295 s 1.08
ViT tiny [256, 256, 3, 4]/backward/CUDA/XLA 0.012773682000000001 s 0.012791198 s 1.00
FNO [64, 64, 1, 4]/backward/CUDA/DisableScatterGatherAfterEnzyme 0.0029783210000000004 s 0.00293947 s 1.01
VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DefaultAfterEnzyme 0.007224273000000001 s 0.00718253 s 1.01
DeepONet ([64, 1024], [1, 128])/forward/CUDA/DisableScatterGather 0.000333484 s 0.00033621400000000004 s 0.99
FNO [64, 64, 1, 4]/forward/CUDA/DisableTransposeReshape 0.0011543970000000001 s 0.001141049 s 1.01
FNO [64, 64, 1, 4]/backward/CUDA/DisablePadAll 0.002943588 s 0.0029775910000000004 s 0.99
ViT tiny [256, 256, 3, 4]/forward/CUDA/XLA 0.0033760450000000003 s 0.0033792590000000003 s 1.00
VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableScatterGatherBeforeEnzyme 0.007264314000000001 s 0.007278332 s 1.00
FNO [64, 64, 1, 4]/backward/CUDA/DisableTransposeReshapeAll 0.003091926 s 0.003099075 s 1.00
ViT tiny [256, 256, 3, 4]/forward/CUDA/DisableTransposeReshape 0.0031534320000000003 s 0.0031934570000000002 s 0.99
VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisablePadAll 0.007177297 s 0.007183577 s 1.00
VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableScatterGatherPadAfterEnzyme 0.007203689 s 0.007147457 s 1.01
DeepONet ([64, 1024], [1, 128])/forward/CUDA/DisablePad 0.00033595 s 0.00032739 s 1.03
FNO [64, 64, 1, 4]/backward/CUDA/DisableTransposeReshapeBeforeEnzyme 0.003147958 s 0.003182325 s 0.99
DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableScatterGatherAfterEnzyme 0.000693495 s 0.000633988 s 1.09
DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableTransposeReshapeAfterEnzyme 0.000693858 s 0.000636892 s 1.09
VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableScatterGatherPadAll 0.007206718000000001 s 0.007160163000000001 s 1.01
DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableTransposeReshapeBeforeEnzyme 0.000703526 s 0.000657869 s 1.07
VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DefaultAll 0.007191877 s 0.007158685000000001 s 1.00
FNO [64, 64, 1, 4]/backward/CUDA/XLA 0.003143158 s 0.0031169690000000002 s 1.01
VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/XLA 0.007344024 s 0.007315152 s 1.00
VGG11 bn=true [224, 224, 3, 4]/forward/CUDA/DisableTransposeReshape 0.0020697560000000003 s 0.0020601950000000003 s 1.00
FNO [64, 64, 1, 4]/backward/CUDA/DisableTransposeReshapeAfterEnzyme 0.0031044510000000003 s 0.0030792920000000004 s 1.01
DeepONet ([64, 1024], [1, 128])/backward/CUDA/DefaultAll 0.0006853240000000001 s 0.000633231 s 1.08
FNO [64, 64, 1, 4]/backward/CUDA/DisablePadAfterEnzyme 0.00296546 s 0.002967267 s 1.00
VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableScatterGatherAll 0.007198156000000001 s 0.007147589 s 1.01
DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableScatterGatherPadAfterEnzyme 0.0007014590000000001 s 0.000627795 s 1.12
FNO [64, 64, 1, 4]/forward/CUDA/XLA 0.0011913240000000001 s 0.0011958930000000002 s 1.00
ViT tiny [256, 256, 3, 4]/forward/CUDA/DisablePad 0.0024830290000000003 s 0.0025319170000000003 s 0.98
VGG11 bn=true [224, 224, 3, 4]/forward/CUDA/XLA 0.0021179050000000002 s 0.0020984610000000002 s 1.01
VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisablePadAfterEnzyme 0.007167492 s 0.007189999000000001 s 1.00
FNO [64, 64, 1, 4]/backward/CUDA/DisableScatterGatherBeforeEnzyme 0.0029995390000000003 s 0.0029903250000000003 s 1.00
ViT tiny [256, 256, 3, 4]/forward/CUDA/Default 0.0025233010000000004 s 0.002571082 s 0.98
DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableScatterGatherBeforeEnzyme 0.000728298 s 0.0006573600000000001 s 1.11
FNO [64, 64, 1, 4]/forward/CUDA/DisableScatterGather 0.001091299 s 0.001087157 s 1.00
VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableTransposeReshapeAfterEnzyme 0.0072032070000000005 s 0.007163539 s 1.01
DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisablePadAfterEnzyme 0.0006943720000000001 s 0.00063382 s 1.10
DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableScatterGatherPadAll 0.0006786380000000001 s 0.000629372 s 1.08
FNO [64, 64, 1, 4]/backward/CUDA/DisableScatterGatherAll 0.002980724 s 0.002952361 s 1.01
FNO [64, 64, 1, 4]/backward/CUDA/DefaultAfterEnzyme 0.00296261 s 0.002947505 s 1.01
DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableTransposeReshapeAll 0.000671019 s 0.0006293890000000001 s 1.07
VGG11 bn=true [224, 224, 3, 4]/forward/CUDA/Default 0.002057349 s 0.0020647 s 1.00
FNO [64, 64, 1, 4]/forward/CUDA/Default 0.001092981 s 0.001090569 s 1.00
DeepONet ([64, 1024], [1, 128])/forward/CUDA/DisableScatterGatherPad 0.00033048400000000003 s 0.00033845200000000005 s 0.98
VGG11 bn=true [224, 224, 3, 4]/forward/CUDA/DisableScatterGather 0.002059403 s 0.002057487 s 1.00
ViT tiny [256, 256, 3, 4]/backward/CUDA/DefaultAll 0.010719085000000001 s 0.010060634 s 1.07
DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisablePadBeforeEnzyme 0.0007196020000000001 s 0.000660656 s 1.09
DeepONet ([64, 1024], [1, 128])/forward/CUDA/Default 0.00036012200000000005 s 0.000325973 s 1.10
FNO [64, 64, 1, 4]/backward/CUDA/DisableScatterGatherPadAfterEnzyme 0.002972118 s 0.002937775 s 1.01
FNO [64, 64, 1, 4]/backward/CUDA/DisableScatterGatherPadBeforeEnzyme 0.0030055090000000004 s 0.00300421 s 1.00
VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DefaultBeforeEnzyme 0.007263103000000001 s 0.00727585 s 1.00
DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableScatterGatherPadBeforeEnzyme 0.0007218530000000001 s 0.0006709270000000001 s 1.08
VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableScatterGatherPadAfterEnzyme 0.00473415 s 0.004669587 s 1.01
VGG11 bn=true [224, 224, 3, 4]/forward/TPU/DisablePad 0.0013366600000000001 s 0.0013345890000000002 s 1.00
FNO [64, 64, 1, 4]/backward/TPU/DisableScatterGatherPadAll 0.00309228 s 0.003075178 s 1.01
ViT tiny [256, 256, 3, 4]/backward/TPU/XLA 0.0029770900000000004 s 0.002936268 s 1.01
DeepONet ([64, 1024], [1, 128])/backward/TPU/DisablePadBeforeEnzyme 0.00042125 s 0.00037486 s 1.12
DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableTransposeReshapeAfterEnzyme 0.00044268000000000004 s 0.00037656 s 1.18
FNO [64, 64, 1, 4]/backward/TPU/DisableScatterGatherAfterEnzyme 0.00298488 s 0.0029474090000000002 s 1.01
FNO [64, 64, 1, 4]/backward/TPU/DefaultAll 0.00312167 s 0.0030703880000000003 s 1.02
VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableScatterGatherPadAll 0.00470935 s 0.004676248 s 1.01
DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableScatterGatherPadBeforeEnzyme 0.00040558 s 0.0003824 s 1.06
DeepONet ([64, 1024], [1, 128])/forward/TPU/Default 0.00022179 s 0.00021546 s 1.03
VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisablePadBeforeEnzyme 0.004680011 s 0.004667517 s 1.00
VGG11 bn=true [224, 224, 3, 4]/forward/TPU/DisableScatterGatherPad 0.00133468 s 0.001338529 s 1.00
FNO [64, 64, 1, 4]/backward/TPU/DisableTransposeReshapeAfterEnzyme 0.0030680900000000003 s 0.0030494880000000004 s 1.01
DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableScatterGatherBeforeEnzyme 0.00042187 s 0.000373901 s 1.13
DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableScatterGatherAfterEnzyme 0.00042842000000000005 s 0.00037397 s 1.15
ViT tiny [256, 256, 3, 4]/forward/TPU/DisableScatterGatherPad 0.00064167 s 0.00061719 s 1.04
FNO [64, 64, 1, 4]/backward/TPU/DisableTransposeReshapeAll 0.0030922600000000003 s 0.003057929 s 1.01
FNO [64, 64, 1, 4]/backward/TPU/DisableTransposeReshapeBeforeEnzyme 0.0030817600000000002 s 0.0030527590000000004 s 1.01
VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DefaultAfterEnzyme 0.00469695 s 0.004695808 s 1.00
FNO [64, 64, 1, 4]/forward/TPU/DisableScatterGather 0.0011458500000000001 s 0.00112548 s 1.02
FNO [64, 64, 1, 4]/backward/TPU/DisablePadBeforeEnzyme 0.00310324 s 0.0030808190000000003 s 1.01
ViT tiny [256, 256, 3, 4]/forward/TPU/XLA 0.0010661400000000001 s 0.0010192 s 1.05
ViT tiny [256, 256, 3, 4]/backward/TPU/DefaultAll 0.00270512 s 0.002634731 s 1.03
VGG11 bn=true [224, 224, 3, 4]/forward/TPU/Default 0.0013384 s 0.001330129 s 1.01
DeepONet ([64, 1024], [1, 128])/forward/TPU/DisableScatterGatherPad 0.00022026 s 0.00020765 s 1.06
DeepONet ([64, 1024], [1, 128])/backward/TPU/DefaultAll 0.00043421000000000003 s 0.00038853000000000005 s 1.12
VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DefaultAll 0.00471267 s 0.004703418 s 1.00
DeepONet ([64, 1024], [1, 128])/forward/TPU/DisableScatterGather 0.00022008000000000002 s 0.00020499 s 1.07
VGG11 bn=true [224, 224, 3, 4]/forward/TPU/DisableTransposeReshape 0.0013272800000000001 s 0.00132477 s 1.00
FNO [64, 64, 1, 4]/backward/TPU/DisableScatterGatherPadBeforeEnzyme 0.00309784 s 0.003067039 s 1.01
ViT tiny [256, 256, 3, 4]/forward/TPU/DisablePad 0.00063864 s 0.00062201 s 1.03
DeepONet ([64, 1024], [1, 128])/backward/TPU/DefaultBeforeEnzyme 0.00043794000000000005 s 0.00038045 s 1.15
DeepONet ([64, 1024], [1, 128])/forward/TPU/DisablePad 0.00022827 s 0.00020732000000000001 s 1.10
DeepONet ([64, 1024], [1, 128])/backward/TPU/XLA 0.00044505 s 0.00040402000000000005 s 1.10
VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableTransposeReshapeAll 0.004688240000000001 s 0.004686738 s 1.00
FNO [64, 64, 1, 4]/backward/TPU/DisableScatterGatherAll 0.0030929 s 0.0030645890000000004 s 1.01
DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableScatterGatherPadAll 0.00042651 s 0.00037855 s 1.13
VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableTransposeReshapeBeforeEnzyme 0.00468671 s 0.004717398 s 0.99
VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableTransposeReshapeAfterEnzyme 0.004701980000000001 s 0.004687918 s 1.00
VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableScatterGatherAll 0.00468292 s 0.004671738 s 1.00
FNO [64, 64, 1, 4]/backward/TPU/DisablePadAfterEnzyme 0.002972789 s 0.002954079 s 1.01
VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableScatterGatherBeforeEnzyme 0.004689540000000001 s 0.004683747 s 1.00
VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableScatterGatherAfterEnzyme 0.00468294 s 0.004671217 s 1.00
VGG11 bn=true [224, 224, 3, 4]/forward/TPU/XLA 0.0012603100000000002 s 0.001246609 s 1.01
ViT tiny [256, 256, 3, 4]/forward/TPU/DisableTransposeReshape 0.00065216 s 0.0006304500000000001 s 1.03
ViT tiny [256, 256, 3, 4]/forward/TPU/Default 0.0006506600000000001 s 0.000614759 s 1.06
FNO [64, 64, 1, 4]/backward/TPU/DefaultBeforeEnzyme 0.0031083400000000002 s 0.0030579690000000002 s 1.02
FNO [64, 64, 1, 4]/forward/TPU/Default 0.0011460600000000002 s 0.00112333 s 1.02
VGG11 bn=true [224, 224, 3, 4]/backward/TPU/XLA 0.004660790000000001 s 0.004592738 s 1.01
FNO [64, 64, 1, 4]/forward/TPU/XLA 0.00143676 s 0.00139945 s 1.03
DeepONet ([64, 1024], [1, 128])/forward/TPU/DisableTransposeReshape 0.00022103000000000002 s 0.00021055000000000002 s 1.05
FNO [64, 64, 1, 4]/backward/TPU/DisablePadAll 0.003114471 s 0.003073998 s 1.01
DeepONet ([64, 1024], [1, 128])/forward/TPU/XLA 0.00035083 s 0.00029197 s 1.20
FNO [64, 64, 1, 4]/forward/TPU/DisableScatterGatherPad 0.00114089 s 0.0011228800000000001 s 1.02
VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisablePadAfterEnzyme 0.00471536 s 0.004685428 s 1.01
DeepONet ([64, 1024], [1, 128])/backward/TPU/DefaultAfterEnzyme 0.00042574 s 0.00037276100000000004 s 1.14
VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisablePadAll 0.00469793 s 0.004692018 s 1.00
VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DefaultBeforeEnzyme 0.00471606 s 0.004682218 s 1.01
VGG11 bn=true [224, 224, 3, 4]/forward/TPU/DisableScatterGather 0.0013440400000000001 s 0.00132944 s 1.01
ViT tiny [256, 256, 3, 4]/forward/TPU/DisableScatterGather 0.0006433200000000001 s 0.0006145500000000001 s 1.05
FNO [64, 64, 1, 4]/backward/TPU/DisableScatterGatherBeforeEnzyme 0.00310552 s 0.0030752590000000003 s 1.01
DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableTransposeReshapeBeforeEnzyme 0.00041875 s 0.00037988000000000003 s 1.10
DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableScatterGatherPadAfterEnzyme 0.00040824000000000004 s 0.00037752 s 1.08
DeepONet ([64, 1024], [1, 128])/backward/TPU/DisablePadAll 0.00042529000000000004 s 0.00038125 s 1.12
DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableScatterGatherAll 0.00042008000000000003 s 0.00037955000000000004 s 1.11
FNO [64, 64, 1, 4]/forward/TPU/DisableTransposeReshape 0.00117537 s 0.00115818 s 1.01
FNO [64, 64, 1, 4]/backward/TPU/DefaultAfterEnzyme 0.00298409 s 0.002962269 s 1.01
FNO [64, 64, 1, 4]/backward/TPU/DisableScatterGatherPadAfterEnzyme 0.0029757810000000003 s 0.0029555590000000004 s 1.01
VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableScatterGatherPadBeforeEnzyme 0.0047024400000000004 s 0.004668627000000001 s 1.01
FNO [64, 64, 1, 4]/forward/TPU/DisablePad 0.00113947 s 0.00112242 s 1.02
FNO [64, 64, 1, 4]/backward/TPU/XLA 0.00331291 s 0.0032443090000000003 s 1.02
DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableTransposeReshapeAll 0.00041836 s 0.00036277 s 1.15
DeepONet ([64, 1024], [1, 128])/backward/TPU/DisablePadAfterEnzyme 0.00042050000000000003 s 0.0003813 s 1.10

This comment was automatically generated by workflow using github-action-benchmark.

@avik-pal
Copy link
Collaborator Author

avik-pal commented Nov 9, 2025

arith.constant not being raised specifically on TPU

https://github.com/EnzymeAD/Reactant.jl/actions/runs/19214211473/job/54921046081?pr=1835#step:20:843

@avik-pal
Copy link
Collaborator Author

avik-pal commented Nov 9, 2025

#loc = loc(unknown)
module @reactant_gradien... attributes {mhlo.num_partitions = 1 : i64, mhlo.num_replicas = 1 : i64} {
  func.func @main(%arg0: tensor<12x16x4xf32> loc(unknown), %arg1: tensor<4x4xf32> loc(unknown), %arg2: tensor<4x4xf32> loc(unknown), %arg3: tensor<4xf32> loc(unknown), %arg4: tensor<4xf32> loc(unknown), %arg5: tensor<2xui64> loc(unknown)) -> (tensor<f32>, tensor<4x4xf32>, tensor<4x4xf32>, tensor<4xf32>, tensor<4xf32>) {
    %cst = stablehlo.constant dense<2.000000e+00> : tensor<4x12xf32> loc(#loc)
    %c = stablehlo.constant dense<14> : tensor<i64> loc(#loc)
    %c0 = arith.constant 0 : tensor<15xi64> loc(#loc)
    %cst_0 = stablehlo.constant dense<0.000000e+00> : tensor<15x4x12xf32> loc(#loc)
    %cst_1 = stablehlo.constant dense<1.000000e+00> : tensor<4x12xf32> loc(#loc)
    %c_2 = stablehlo.constant dense<0> : tensor<i64> loc(#loc)
    %c_3 = stablehlo.constant dense<1> : tensor<i64> loc(#loc)
    %c_4 = stablehlo.constant dense<15> : tensor<i64> loc(#loc)
    %cst_5 = stablehlo.constant dense<0.000000e+00> : tensor<f32> loc(#loc)
    %cst_6 = stablehlo.constant dense<0.000000e+00> : tensor<4x4xf32> loc(#loc)
    %cst_7 = stablehlo.constant dense<0.000000e+00> : tensor<4x12xf32> loc(#loc)
    %0 = stablehlo.transpose %arg0, dims = [2, 1, 0] : (tensor<12x16x4xf32>) -> tensor<4x16x12xf32> loc(#loc)
    %1 = stablehlo.slice %0 [0:4, 0:1, 0:12] : (tensor<4x16x12xf32>) -> tensor<4x1x12xf32> loc(#loc)
    %2 = stablehlo.reshape %1 : (tensor<4x1x12xf32>) -> tensor<4x12xf32> loc(#loc)
    %3 = stablehlo.broadcast_in_dim %arg4, dims = [0] : (tensor<4xf32>) -> tensor<4x12xf32> loc(#loc)
    %4 = stablehlo.broadcast_in_dim %arg3, dims = [0] : (tensor<4xf32>) -> tensor<4x12xf32> loc(#loc)
    %5 = stablehlo.dot_general %arg1, %2, contracting_dims = [0] x [0], precision = [DEFAULT, DEFAULT] : (tensor<4x4xf32>, tensor<4x12xf32>) -> tensor<4x12xf32> loc(#loc)
    %6 = stablehlo.add %5, %4 : tensor<4x12xf32> loc(#loc)
    %7 = stablehlo.add %3, %6 : tensor<4x12xf32> loc(#loc)
    %8 = stablehlo.tanh %7 : tensor<4x12xf32> loc(#loc)
    %9 = stablehlo.broadcast_in_dim %arg1, dims = [1, 2] : (tensor<4x4xf32>) -> tensor<15x4x4xf32> loc(#loc)
    %10 = stablehlo.slice %0 [0:4, 1:16, 0:12] : (tensor<4x16x12xf32>) -> tensor<4x15x12xf32> loc(#loc)
    %11 = stablehlo.dot_general %9, %10, batching_dims = [0] x [1], contracting_dims = [1] x [0], precision = [DEFAULT, DEFAULT] : (tensor<15x4x4xf32>, tensor<4x15x12xf32>) -> tensor<15x4x12xf32> loc(#loc)
    %12 = stablehlo.broadcast_in_dim %arg3, dims = [1] : (tensor<4xf32>) -> tensor<15x4x12xf32> loc(#loc)
    %13 = stablehlo.add %11, %12 : tensor<15x4x12xf32> loc(#loc)
    %14:5 = stablehlo.while(%iterArg = %c_2, %iterArg_8 = %8, %iterArg_9 = %cst_0, %iterArg_10 = %c0, %iterArg_11 = %cst_0) : tensor<i64>, tensor<4x12xf32>, tensor<15x4x12xf32>, tensor<15xi64>, tensor<15x4x12xf32>
    cond {
      %31 = stablehlo.compare  LT, %iterArg, %c_4 : (tensor<i64>, tensor<i64>) -> tensor<i1> loc(#loc)
      stablehlo.return %31 : tensor<i1> loc(#loc)
    } do {
      %31 = stablehlo.add %iterArg, %c_3 : tensor<i64> loc(#loc)
      %32 = stablehlo.reshape %iterArg_8 : (tensor<4x12xf32>) -> tensor<1x4x12xf32> loc(#loc)
      %33 = stablehlo.dynamic_update_slice %iterArg_9, %32, %iterArg, %c_2, %c_2 : (tensor<15x4x12xf32>, tensor<1x4x12xf32>, tensor<i64>, tensor<i64>, tensor<i64>) -> tensor<15x4x12xf32> loc(#loc)
      %34 = stablehlo.dot_general %arg2, %iterArg_8, contracting_dims = [0] x [0], precision = [DEFAULT, DEFAULT] : (tensor<4x4xf32>, tensor<4x12xf32>) -> tensor<4x12xf32> loc(#loc)
      %35 = stablehlo.add %34, %3 : tensor<4x12xf32> loc(#loc)
      %36 = stablehlo.reshape %iterArg : (tensor<i64>) -> tensor<1xi64> loc(#loc)
      %37 = stablehlo.dynamic_update_slice %iterArg_10, %36, %iterArg : (tensor<15xi64>, tensor<1xi64>, tensor<i64>) -> tensor<15xi64> loc(#loc)
      %38 = stablehlo.dynamic_slice %13, %iterArg, %c_2, %c_2, sizes = [1, 4, 12] : (tensor<15x4x12xf32>, tensor<i64>, tensor<i64>, tensor<i64>) -> tensor<1x4x12xf32> loc(#loc)
      %39 = stablehlo.reshape %38 : (tensor<1x4x12xf32>) -> tensor<4x12xf32> loc(#loc)
      %40 = stablehlo.add %35, %39 : tensor<4x12xf32> loc(#loc)
      %41 = stablehlo.reshape %40 : (tensor<4x12xf32>) -> tensor<1x4x12xf32> loc(#loc)
      %42 = stablehlo.dynamic_update_slice %iterArg_11, %41, %iterArg, %c_2, %c_2 : (tensor<15x4x12xf32>, tensor<1x4x12xf32>, tensor<i64>, tensor<i64>, tensor<i64>) -> tensor<15x4x12xf32> loc(#loc)
      %43 = stablehlo.tanh %40 : tensor<4x12xf32> loc(#loc)
      stablehlo.return %31, %43, %33, %37, %42 : tensor<i64>, tensor<4x12xf32>, tensor<15x4x12xf32>, tensor<15xi64>, tensor<15x4x12xf32> loc(#loc)
    } loc(#loc)
    %15 = stablehlo.multiply %14#1, %14#1 : tensor<4x12xf32> loc(#loc)
    %16 = stablehlo.reduce(%15 init: %cst_5) applies stablehlo.add across dimensions = [0, 1] : (tensor<4x12xf32>, tensor<f32>) -> tensor<f32> loc(#loc)
    %17 = stablehlo.multiply %14#1, %cst : tensor<4x12xf32> loc(#loc)
    %18:6 = stablehlo.while(%iterArg = %c_2, %iterArg_8 = %17, %iterArg_9 = %cst_0, %iterArg_10 = %cst_7, %iterArg_11 = %cst_6, %iterArg_12 = %c) : tensor<i64>, tensor<4x12xf32>, tensor<15x4x12xf32>, tensor<4x12xf32>, tensor<4x4xf32>, tensor<i64>
    cond {
      %31 = stablehlo.compare  LT, %iterArg, %c_4 : (tensor<i64>, tensor<i64>) -> tensor<i1> loc(#loc)
      stablehlo.return %31 : tensor<i1> loc(#loc)
    } do {
      %31 = stablehlo.add %iterArg, %c_3 : tensor<i64> loc(#loc)
      %32 = stablehlo.dynamic_slice %14#4, %iterArg_12, %c_2, %c_2, sizes = [1, 4, 12] : (tensor<15x4x12xf32>, tensor<i64>, tensor<i64>, tensor<i64>) -> tensor<1x4x12xf32> loc(#loc)
      %33 = stablehlo.tanh %32 : tensor<1x4x12xf32> loc(#loc)
      %34 = stablehlo.multiply %33, %33 : tensor<1x4x12xf32> loc(#loc)
      %35 = stablehlo.reshape %34 : (tensor<1x4x12xf32>) -> tensor<4x12xf32> loc(#loc)
      %36 = stablehlo.subtract %cst_1, %35 : tensor<4x12xf32> loc(#loc)
      %37 = stablehlo.multiply %iterArg_8, %36 : tensor<4x12xf32> loc(#loc)
      %38 = stablehlo.reshape %37 : (tensor<4x12xf32>) -> tensor<1x4x12xf32> loc(#loc)
      %39 = stablehlo.dynamic_slice %14#3, %iterArg_12, sizes = [1] : (tensor<15xi64>, tensor<i64>) -> tensor<1xi64> loc(#loc)
      %40 = stablehlo.reshape %39 : (tensor<1xi64>) -> tensor<i64> loc(#loc)
      %41 = stablehlo.dynamic_update_slice %cst_0, %38, %40, %c_2, %c_2 : (tensor<15x4x12xf32>, tensor<1x4x12xf32>, tensor<i64>, tensor<i64>, tensor<i64>) -> tensor<15x4x12xf32> loc(#loc)
      %42 = stablehlo.add %iterArg_9, %41 : tensor<15x4x12xf32> loc(#loc)
      %43 = stablehlo.add %iterArg_10, %37 : tensor<4x12xf32> loc(#loc)
      %44 = stablehlo.dynamic_slice %14#2, %iterArg_12, %c_2, %c_2, sizes = [1, 4, 12] : (tensor<15x4x12xf32>, tensor<i64>, tensor<i64>, tensor<i64>) -> tensor<1x4x12xf32> loc(#loc)
      %45 = stablehlo.reshape %44 : (tensor<1x4x12xf32>) -> tensor<4x12xf32> loc(#loc)
      %46 = stablehlo.dot_general %45, %37, contracting_dims = [1] x [1], precision = [DEFAULT, DEFAULT] : (tensor<4x12xf32>, tensor<4x12xf32>) -> tensor<4x4xf32> loc(#loc)
      %47 = stablehlo.add %iterArg_11, %46 : tensor<4x4xf32> loc(#loc)
      %48 = stablehlo.dot_general %arg2, %37, contracting_dims = [1] x [0], precision = [DEFAULT, DEFAULT] : (tensor<4x4xf32>, tensor<4x12xf32>) -> tensor<4x12xf32> loc(#loc)
      %49 = stablehlo.subtract %iterArg_12, %c_3 : tensor<i64> loc(#loc)
      stablehlo.return %31, %48, %42, %43, %47, %49 : tensor<i64>, tensor<4x12xf32>, tensor<15x4x12xf32>, tensor<4x12xf32>, tensor<4x4xf32>, tensor<i64> loc(#loc)
    } loc(#loc)
    %19 = stablehlo.reduce(%18#2 init: %cst_5) applies stablehlo.add across dimensions = [0, 2] : (tensor<15x4x12xf32>, tensor<f32>) -> tensor<4xf32> loc(#loc)
    %20 = stablehlo.dot_general %10, %18#2, batching_dims = [1] x [0], contracting_dims = [2] x [2], precision = [DEFAULT, DEFAULT] : (tensor<4x15x12xf32>, tensor<15x4x12xf32>) -> tensor<15x4x4xf32> loc(#loc)
    %21 = stablehlo.reduce(%20 init: %cst_5) applies stablehlo.add across dimensions = [0] : (tensor<15x4x4xf32>, tensor<f32>) -> tensor<4x4xf32> loc(#loc)
    %22 = stablehlo.multiply %8, %8 : tensor<4x12xf32> loc(#loc)
    %23 = stablehlo.subtract %cst_1, %22 : tensor<4x12xf32> loc(#loc)
    %24 = stablehlo.multiply %18#1, %23 : tensor<4x12xf32> loc(#loc)
    %25 = stablehlo.add %18#3, %24 : tensor<4x12xf32> loc(#loc)
    %26 = stablehlo.dot_general %2, %24, contracting_dims = [1] x [1], precision = [DEFAULT, DEFAULT] : (tensor<4x12xf32>, tensor<4x12xf32>) -> tensor<4x4xf32> loc(#loc)
    %27 = stablehlo.add %21, %26 : tensor<4x4xf32> loc(#loc)
    %28 = stablehlo.reduce(%24 init: %cst_5) applies stablehlo.add across dimensions = [1] : (tensor<4x12xf32>, tensor<f32>) -> tensor<4xf32> loc(#loc)
    %29 = stablehlo.add %19, %28 : tensor<4xf32> loc(#loc)
    %30 = stablehlo.reduce(%25 init: %cst_5) applies stablehlo.add across dimensions = [1] : (tensor<4x12xf32>, tensor<f32>) -> tensor<4xf32> loc(#loc)
    return %16, %27, %18#4, %29, %30 : tensor<f32>, tensor<4x4xf32>, tensor<4x4xf32>, tensor<4xf32>, tensor<4xf32> loc(#loc)
  } loc(#loc)
} loc(#loc)

@avik-pal avik-pal merged commit d2f10ee into main Nov 10, 2025
60 of 67 checks passed
@avik-pal avik-pal deleted the ap/missing_passes branch November 10, 2025 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants