feat: missing upstream passes + rocm jll #1835

avik-pal · 2025-11-07T20:05:58Z

needs new jll

avik-pal · 2025-11-09T19:52:58Z

Failed to precompile ReactantNNlibExt [1cc43b4b-f40c-58cf-bf58-d175afebd789] to "/root/.cache/julia-buildkite-plugin/depots/0190ad31-dfb1-4a4c-ac5b-02d05277ba6c/compiled/v1.10/ReactantNNlibExt/jl_Q7hUgT".
ERROR: LoadError: InitError: could not load library "/root/.cache/julia-buildkite-plugin/depots/0190ad31-dfb1-4a4c-ac5b-02d05277ba6c/artifacts/072462b34f0d79dff26380bbc8d566e4f6048033/lib/libReactantExtra.so"
librocm_sysdeps_dw.so.1: cannot open shared object file: No such file or directory

github-actions

Reactant.jl Benchmarks

Benchmark suite	Current: `5c0e420`	Previous: `86bea8f`	Ratio
`DeepONet ([64, 1024], [1, 128])/forward/CPU/Default`	`0.0025954100000000003` s	`0.0025712` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/forward/CPU/DisableScatterGatherPad`	`0.002437588` s	`0.002324076` s	`1.05`
`DeepONet ([64, 1024], [1, 128])/backward/CPU/DisablePadAfterEnzyme`	`0.0061261760000000005` s	`0.005635531` s	`1.09`
`DeepONet ([64, 1024], [1, 128])/backward/CPU/DefaultAfterEnzyme`	`0.0060800870000000005` s	`0.005918332` s	`1.03`
`DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableScatterGatherPadBeforeEnzyme`	`0.006573224000000001` s	`0.005868992` s	`1.12`
`DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableScatterGatherPadAll`	`0.0062186920000000005` s	`0.005969917000000001` s	`1.04`
`DeepONet ([64, 1024], [1, 128])/backward/CPU/DisablePadBeforeEnzyme`	`0.006500447` s	`0.005719778` s	`1.14`
`DeepONet ([64, 1024], [1, 128])/backward/CPU/DisablePadAll`	`0.006505485` s	`0.005898044000000001` s	`1.10`
`DeepONet ([64, 1024], [1, 128])/forward/CPU/DisableScatterGather`	`0.002540848` s	`0.002515556` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/backward/CPU/DefaultAll`	`0.006662922000000001` s	`0.005925306` s	`1.12`
`DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableTransposeReshapeAfterEnzyme`	`0.006241464` s	`0.005697300000000001` s	`1.10`
`DeepONet ([64, 1024], [1, 128])/forward/CPU/XLA`	`0.002386434` s	`0.002469012` s	`0.97`
`DeepONet ([64, 1024], [1, 128])/backward/CPU/XLA`	`0.006137005` s	`0.005527595000000001` s	`1.11`
`DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableScatterGatherAfterEnzyme`	`0.006352184` s	`0.0058676050000000006` s	`1.08`
`DeepONet ([64, 1024], [1, 128])/backward/CPU/DefaultBeforeEnzyme`	`0.006712335000000001` s	`0.006050863` s	`1.11`
`DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableTransposeReshapeBeforeEnzyme`	`0.006531435` s	`0.0056281510000000005` s	`1.16`
`DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableScatterGatherPadAfterEnzyme`	`0.006324708` s	`0.005697946000000001` s	`1.11`
`DeepONet ([64, 1024], [1, 128])/forward/CPU/DisablePad`	`0.0023203720000000002` s	`0.002477824` s	`0.94`
`DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableTransposeReshapeAll`	`0.0065091310000000005` s	`0.0059030440000000005` s	`1.10`
`DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableScatterGatherAll`	`0.006718002000000001` s	`0.005717776` s	`1.17`
`DeepONet ([64, 1024], [1, 128])/forward/CPU/DisableTransposeReshape`	`0.002258277` s	`0.002280626` s	`0.99`
`DeepONet ([64, 1024], [1, 128])/backward/CPU/DisableScatterGatherBeforeEnzyme`	`0.006824253000000001` s	`0.005962824` s	`1.14`
`VGG11 bn=true [224, 224, 3, 4]/forward/CUDA/DisablePad`	`0.002061516` s	`0.0020651890000000003` s	`1.00`
`DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableScatterGatherAll`	`0.000681872` s	`0.0006401400000000001` s	`1.07`
`DeepONet ([64, 1024], [1, 128])/backward/CUDA/XLA`	`0.0008448790000000001` s	`0.00077032` s	`1.10`
`VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableTransposeReshapeAll`	`0.0071811900000000005` s	`0.007178500000000001` s	`1.00`
`DeepONet ([64, 1024], [1, 128])/backward/CUDA/DefaultBeforeEnzyme`	`0.000723888` s	`0.0006608050000000001` s	`1.10`
`FNO [64, 64, 1, 4]/backward/CUDA/DefaultAll`	`0.002950161` s	`0.002965329` s	`0.99`
`VGG11 bn=true [224, 224, 3, 4]/forward/CUDA/DisableScatterGatherPad`	`0.0020558910000000002` s	`0.002049345` s	`1.00`
`VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableTransposeReshapeBeforeEnzyme`	`0.007321437` s	`0.007293340000000001` s	`1.00`
`FNO [64, 64, 1, 4]/backward/CUDA/DefaultBeforeEnzyme`	`0.0030004180000000004` s	`0.0029945640000000003` s	`1.00`
`DeepONet ([64, 1024], [1, 128])/forward/CUDA/XLA`	`0.00031256400000000004` s	`0.000336395` s	`0.93`
`FNO [64, 64, 1, 4]/backward/CUDA/DisablePadBeforeEnzyme`	`0.0029962170000000002` s	`0.0029657420000000004` s	`1.01`
`FNO [64, 64, 1, 4]/backward/CUDA/DisableScatterGatherPadAll`	`0.0029803940000000004` s	`0.002952564` s	`1.01`
`FNO [64, 64, 1, 4]/forward/CUDA/DisablePad`	`0.0010878700000000001` s	`0.001090559` s	`1.00`
`VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisablePadBeforeEnzyme`	`0.0072776360000000005` s	`0.007267997000000001` s	`1.00`
`VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableScatterGatherPadBeforeEnzyme`	`0.007264527000000001` s	`0.007312644` s	`0.99`
`VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableScatterGatherAfterEnzyme`	`0.0072111020000000005` s	`0.007179213` s	`1.00`
`FNO [64, 64, 1, 4]/forward/CUDA/DisableScatterGatherPad`	`0.0011089840000000001` s	`0.0011024140000000001` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisablePadAll`	`0.000673258` s	`0.000638042` s	`1.06`
`DeepONet ([64, 1024], [1, 128])/forward/CUDA/DisableTransposeReshape`	`0.00031367300000000004` s	`0.000327728` s	`0.96`
`ViT tiny [256, 256, 3, 4]/forward/CUDA/DisableScatterGatherPad`	`0.003182831` s	`0.003228463` s	`0.99`
`ViT tiny [256, 256, 3, 4]/forward/CUDA/DisableScatterGather`	`0.00318327` s	`0.003220179` s	`0.99`
`DeepONet ([64, 1024], [1, 128])/backward/CUDA/DefaultAfterEnzyme`	`0.0006913970000000001` s	`0.000640295` s	`1.08`
`ViT tiny [256, 256, 3, 4]/backward/CUDA/XLA`	`0.012773682000000001` s	`0.012791198` s	`1.00`
`FNO [64, 64, 1, 4]/backward/CUDA/DisableScatterGatherAfterEnzyme`	`0.0029783210000000004` s	`0.00293947` s	`1.01`
`VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DefaultAfterEnzyme`	`0.007224273000000001` s	`0.00718253` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/forward/CUDA/DisableScatterGather`	`0.000333484` s	`0.00033621400000000004` s	`0.99`
`FNO [64, 64, 1, 4]/forward/CUDA/DisableTransposeReshape`	`0.0011543970000000001` s	`0.001141049` s	`1.01`
`FNO [64, 64, 1, 4]/backward/CUDA/DisablePadAll`	`0.002943588` s	`0.0029775910000000004` s	`0.99`
`ViT tiny [256, 256, 3, 4]/forward/CUDA/XLA`	`0.0033760450000000003` s	`0.0033792590000000003` s	`1.00`
`VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableScatterGatherBeforeEnzyme`	`0.007264314000000001` s	`0.007278332` s	`1.00`
`FNO [64, 64, 1, 4]/backward/CUDA/DisableTransposeReshapeAll`	`0.003091926` s	`0.003099075` s	`1.00`
`ViT tiny [256, 256, 3, 4]/forward/CUDA/DisableTransposeReshape`	`0.0031534320000000003` s	`0.0031934570000000002` s	`0.99`
`VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisablePadAll`	`0.007177297` s	`0.007183577` s	`1.00`
`VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableScatterGatherPadAfterEnzyme`	`0.007203689` s	`0.007147457` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/forward/CUDA/DisablePad`	`0.00033595` s	`0.00032739` s	`1.03`
`FNO [64, 64, 1, 4]/backward/CUDA/DisableTransposeReshapeBeforeEnzyme`	`0.003147958` s	`0.003182325` s	`0.99`
`DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableScatterGatherAfterEnzyme`	`0.000693495` s	`0.000633988` s	`1.09`
`DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableTransposeReshapeAfterEnzyme`	`0.000693858` s	`0.000636892` s	`1.09`
`VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableScatterGatherPadAll`	`0.007206718000000001` s	`0.007160163000000001` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableTransposeReshapeBeforeEnzyme`	`0.000703526` s	`0.000657869` s	`1.07`
`VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DefaultAll`	`0.007191877` s	`0.007158685000000001` s	`1.00`
`FNO [64, 64, 1, 4]/backward/CUDA/XLA`	`0.003143158` s	`0.0031169690000000002` s	`1.01`
`VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/XLA`	`0.007344024` s	`0.007315152` s	`1.00`
`VGG11 bn=true [224, 224, 3, 4]/forward/CUDA/DisableTransposeReshape`	`0.0020697560000000003` s	`0.0020601950000000003` s	`1.00`
`FNO [64, 64, 1, 4]/backward/CUDA/DisableTransposeReshapeAfterEnzyme`	`0.0031044510000000003` s	`0.0030792920000000004` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/backward/CUDA/DefaultAll`	`0.0006853240000000001` s	`0.000633231` s	`1.08`
`FNO [64, 64, 1, 4]/backward/CUDA/DisablePadAfterEnzyme`	`0.00296546` s	`0.002967267` s	`1.00`
`VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableScatterGatherAll`	`0.007198156000000001` s	`0.007147589` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableScatterGatherPadAfterEnzyme`	`0.0007014590000000001` s	`0.000627795` s	`1.12`
`FNO [64, 64, 1, 4]/forward/CUDA/XLA`	`0.0011913240000000001` s	`0.0011958930000000002` s	`1.00`
`ViT tiny [256, 256, 3, 4]/forward/CUDA/DisablePad`	`0.0024830290000000003` s	`0.0025319170000000003` s	`0.98`
`VGG11 bn=true [224, 224, 3, 4]/forward/CUDA/XLA`	`0.0021179050000000002` s	`0.0020984610000000002` s	`1.01`
`VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisablePadAfterEnzyme`	`0.007167492` s	`0.007189999000000001` s	`1.00`
`FNO [64, 64, 1, 4]/backward/CUDA/DisableScatterGatherBeforeEnzyme`	`0.0029995390000000003` s	`0.0029903250000000003` s	`1.00`
`ViT tiny [256, 256, 3, 4]/forward/CUDA/Default`	`0.0025233010000000004` s	`0.002571082` s	`0.98`
`DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableScatterGatherBeforeEnzyme`	`0.000728298` s	`0.0006573600000000001` s	`1.11`
`FNO [64, 64, 1, 4]/forward/CUDA/DisableScatterGather`	`0.001091299` s	`0.001087157` s	`1.00`
`VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DisableTransposeReshapeAfterEnzyme`	`0.0072032070000000005` s	`0.007163539` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisablePadAfterEnzyme`	`0.0006943720000000001` s	`0.00063382` s	`1.10`
`DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableScatterGatherPadAll`	`0.0006786380000000001` s	`0.000629372` s	`1.08`
`FNO [64, 64, 1, 4]/backward/CUDA/DisableScatterGatherAll`	`0.002980724` s	`0.002952361` s	`1.01`
`FNO [64, 64, 1, 4]/backward/CUDA/DefaultAfterEnzyme`	`0.00296261` s	`0.002947505` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableTransposeReshapeAll`	`0.000671019` s	`0.0006293890000000001` s	`1.07`
`VGG11 bn=true [224, 224, 3, 4]/forward/CUDA/Default`	`0.002057349` s	`0.0020647` s	`1.00`
`FNO [64, 64, 1, 4]/forward/CUDA/Default`	`0.001092981` s	`0.001090569` s	`1.00`
`DeepONet ([64, 1024], [1, 128])/forward/CUDA/DisableScatterGatherPad`	`0.00033048400000000003` s	`0.00033845200000000005` s	`0.98`
`VGG11 bn=true [224, 224, 3, 4]/forward/CUDA/DisableScatterGather`	`0.002059403` s	`0.002057487` s	`1.00`
`ViT tiny [256, 256, 3, 4]/backward/CUDA/DefaultAll`	`0.010719085000000001` s	`0.010060634` s	`1.07`
`DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisablePadBeforeEnzyme`	`0.0007196020000000001` s	`0.000660656` s	`1.09`
`DeepONet ([64, 1024], [1, 128])/forward/CUDA/Default`	`0.00036012200000000005` s	`0.000325973` s	`1.10`
`FNO [64, 64, 1, 4]/backward/CUDA/DisableScatterGatherPadAfterEnzyme`	`0.002972118` s	`0.002937775` s	`1.01`
`FNO [64, 64, 1, 4]/backward/CUDA/DisableScatterGatherPadBeforeEnzyme`	`0.0030055090000000004` s	`0.00300421` s	`1.00`
`VGG11 bn=true [224, 224, 3, 4]/backward/CUDA/DefaultBeforeEnzyme`	`0.007263103000000001` s	`0.00727585` s	`1.00`
`DeepONet ([64, 1024], [1, 128])/backward/CUDA/DisableScatterGatherPadBeforeEnzyme`	`0.0007218530000000001` s	`0.0006709270000000001` s	`1.08`
`VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableScatterGatherPadAfterEnzyme`	`0.00473415` s	`0.004669587` s	`1.01`
`VGG11 bn=true [224, 224, 3, 4]/forward/TPU/DisablePad`	`0.0013366600000000001` s	`0.0013345890000000002` s	`1.00`
`FNO [64, 64, 1, 4]/backward/TPU/DisableScatterGatherPadAll`	`0.00309228` s	`0.003075178` s	`1.01`
`ViT tiny [256, 256, 3, 4]/backward/TPU/XLA`	`0.0029770900000000004` s	`0.002936268` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/backward/TPU/DisablePadBeforeEnzyme`	`0.00042125` s	`0.00037486` s	`1.12`
`DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableTransposeReshapeAfterEnzyme`	`0.00044268000000000004` s	`0.00037656` s	`1.18`
`FNO [64, 64, 1, 4]/backward/TPU/DisableScatterGatherAfterEnzyme`	`0.00298488` s	`0.0029474090000000002` s	`1.01`
`FNO [64, 64, 1, 4]/backward/TPU/DefaultAll`	`0.00312167` s	`0.0030703880000000003` s	`1.02`
`VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableScatterGatherPadAll`	`0.00470935` s	`0.004676248` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableScatterGatherPadBeforeEnzyme`	`0.00040558` s	`0.0003824` s	`1.06`
`DeepONet ([64, 1024], [1, 128])/forward/TPU/Default`	`0.00022179` s	`0.00021546` s	`1.03`
`VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisablePadBeforeEnzyme`	`0.004680011` s	`0.004667517` s	`1.00`
`VGG11 bn=true [224, 224, 3, 4]/forward/TPU/DisableScatterGatherPad`	`0.00133468` s	`0.001338529` s	`1.00`
`FNO [64, 64, 1, 4]/backward/TPU/DisableTransposeReshapeAfterEnzyme`	`0.0030680900000000003` s	`0.0030494880000000004` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableScatterGatherBeforeEnzyme`	`0.00042187` s	`0.000373901` s	`1.13`
`DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableScatterGatherAfterEnzyme`	`0.00042842000000000005` s	`0.00037397` s	`1.15`
`ViT tiny [256, 256, 3, 4]/forward/TPU/DisableScatterGatherPad`	`0.00064167` s	`0.00061719` s	`1.04`
`FNO [64, 64, 1, 4]/backward/TPU/DisableTransposeReshapeAll`	`0.0030922600000000003` s	`0.003057929` s	`1.01`
`FNO [64, 64, 1, 4]/backward/TPU/DisableTransposeReshapeBeforeEnzyme`	`0.0030817600000000002` s	`0.0030527590000000004` s	`1.01`
`VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DefaultAfterEnzyme`	`0.00469695` s	`0.004695808` s	`1.00`
`FNO [64, 64, 1, 4]/forward/TPU/DisableScatterGather`	`0.0011458500000000001` s	`0.00112548` s	`1.02`
`FNO [64, 64, 1, 4]/backward/TPU/DisablePadBeforeEnzyme`	`0.00310324` s	`0.0030808190000000003` s	`1.01`
`ViT tiny [256, 256, 3, 4]/forward/TPU/XLA`	`0.0010661400000000001` s	`0.0010192` s	`1.05`
`ViT tiny [256, 256, 3, 4]/backward/TPU/DefaultAll`	`0.00270512` s	`0.002634731` s	`1.03`
`VGG11 bn=true [224, 224, 3, 4]/forward/TPU/Default`	`0.0013384` s	`0.001330129` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/forward/TPU/DisableScatterGatherPad`	`0.00022026` s	`0.00020765` s	`1.06`
`DeepONet ([64, 1024], [1, 128])/backward/TPU/DefaultAll`	`0.00043421000000000003` s	`0.00038853000000000005` s	`1.12`
`VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DefaultAll`	`0.00471267` s	`0.004703418` s	`1.00`
`DeepONet ([64, 1024], [1, 128])/forward/TPU/DisableScatterGather`	`0.00022008000000000002` s	`0.00020499` s	`1.07`
`VGG11 bn=true [224, 224, 3, 4]/forward/TPU/DisableTransposeReshape`	`0.0013272800000000001` s	`0.00132477` s	`1.00`
`FNO [64, 64, 1, 4]/backward/TPU/DisableScatterGatherPadBeforeEnzyme`	`0.00309784` s	`0.003067039` s	`1.01`
`ViT tiny [256, 256, 3, 4]/forward/TPU/DisablePad`	`0.00063864` s	`0.00062201` s	`1.03`
`DeepONet ([64, 1024], [1, 128])/backward/TPU/DefaultBeforeEnzyme`	`0.00043794000000000005` s	`0.00038045` s	`1.15`
`DeepONet ([64, 1024], [1, 128])/forward/TPU/DisablePad`	`0.00022827` s	`0.00020732000000000001` s	`1.10`
`DeepONet ([64, 1024], [1, 128])/backward/TPU/XLA`	`0.00044505` s	`0.00040402000000000005` s	`1.10`
`VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableTransposeReshapeAll`	`0.004688240000000001` s	`0.004686738` s	`1.00`
`FNO [64, 64, 1, 4]/backward/TPU/DisableScatterGatherAll`	`0.0030929` s	`0.0030645890000000004` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableScatterGatherPadAll`	`0.00042651` s	`0.00037855` s	`1.13`
`VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableTransposeReshapeBeforeEnzyme`	`0.00468671` s	`0.004717398` s	`0.99`
`VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableTransposeReshapeAfterEnzyme`	`0.004701980000000001` s	`0.004687918` s	`1.00`
`VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableScatterGatherAll`	`0.00468292` s	`0.004671738` s	`1.00`
`FNO [64, 64, 1, 4]/backward/TPU/DisablePadAfterEnzyme`	`0.002972789` s	`0.002954079` s	`1.01`
`VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableScatterGatherBeforeEnzyme`	`0.004689540000000001` s	`0.004683747` s	`1.00`
`VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableScatterGatherAfterEnzyme`	`0.00468294` s	`0.004671217` s	`1.00`
`VGG11 bn=true [224, 224, 3, 4]/forward/TPU/XLA`	`0.0012603100000000002` s	`0.001246609` s	`1.01`
`ViT tiny [256, 256, 3, 4]/forward/TPU/DisableTransposeReshape`	`0.00065216` s	`0.0006304500000000001` s	`1.03`
`ViT tiny [256, 256, 3, 4]/forward/TPU/Default`	`0.0006506600000000001` s	`0.000614759` s	`1.06`
`FNO [64, 64, 1, 4]/backward/TPU/DefaultBeforeEnzyme`	`0.0031083400000000002` s	`0.0030579690000000002` s	`1.02`
`FNO [64, 64, 1, 4]/forward/TPU/Default`	`0.0011460600000000002` s	`0.00112333` s	`1.02`
`VGG11 bn=true [224, 224, 3, 4]/backward/TPU/XLA`	`0.004660790000000001` s	`0.004592738` s	`1.01`
`FNO [64, 64, 1, 4]/forward/TPU/XLA`	`0.00143676` s	`0.00139945` s	`1.03`
`DeepONet ([64, 1024], [1, 128])/forward/TPU/DisableTransposeReshape`	`0.00022103000000000002` s	`0.00021055000000000002` s	`1.05`
`FNO [64, 64, 1, 4]/backward/TPU/DisablePadAll`	`0.003114471` s	`0.003073998` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/forward/TPU/XLA`	`0.00035083` s	`0.00029197` s	`1.20`
`FNO [64, 64, 1, 4]/forward/TPU/DisableScatterGatherPad`	`0.00114089` s	`0.0011228800000000001` s	`1.02`
`VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisablePadAfterEnzyme`	`0.00471536` s	`0.004685428` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/backward/TPU/DefaultAfterEnzyme`	`0.00042574` s	`0.00037276100000000004` s	`1.14`
`VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisablePadAll`	`0.00469793` s	`0.004692018` s	`1.00`
`VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DefaultBeforeEnzyme`	`0.00471606` s	`0.004682218` s	`1.01`
`VGG11 bn=true [224, 224, 3, 4]/forward/TPU/DisableScatterGather`	`0.0013440400000000001` s	`0.00132944` s	`1.01`
`ViT tiny [256, 256, 3, 4]/forward/TPU/DisableScatterGather`	`0.0006433200000000001` s	`0.0006145500000000001` s	`1.05`
`FNO [64, 64, 1, 4]/backward/TPU/DisableScatterGatherBeforeEnzyme`	`0.00310552` s	`0.0030752590000000003` s	`1.01`
`DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableTransposeReshapeBeforeEnzyme`	`0.00041875` s	`0.00037988000000000003` s	`1.10`
`DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableScatterGatherPadAfterEnzyme`	`0.00040824000000000004` s	`0.00037752` s	`1.08`
`DeepONet ([64, 1024], [1, 128])/backward/TPU/DisablePadAll`	`0.00042529000000000004` s	`0.00038125` s	`1.12`
`DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableScatterGatherAll`	`0.00042008000000000003` s	`0.00037955000000000004` s	`1.11`
`FNO [64, 64, 1, 4]/forward/TPU/DisableTransposeReshape`	`0.00117537` s	`0.00115818` s	`1.01`
`FNO [64, 64, 1, 4]/backward/TPU/DefaultAfterEnzyme`	`0.00298409` s	`0.002962269` s	`1.01`
`FNO [64, 64, 1, 4]/backward/TPU/DisableScatterGatherPadAfterEnzyme`	`0.0029757810000000003` s	`0.0029555590000000004` s	`1.01`
`VGG11 bn=true [224, 224, 3, 4]/backward/TPU/DisableScatterGatherPadBeforeEnzyme`	`0.0047024400000000004` s	`0.004668627000000001` s	`1.01`
`FNO [64, 64, 1, 4]/forward/TPU/DisablePad`	`0.00113947` s	`0.00112242` s	`1.02`
`FNO [64, 64, 1, 4]/backward/TPU/XLA`	`0.00331291` s	`0.0032443090000000003` s	`1.02`
`DeepONet ([64, 1024], [1, 128])/backward/TPU/DisableTransposeReshapeAll`	`0.00041836` s	`0.00036277` s	`1.15`
`DeepONet ([64, 1024], [1, 128])/backward/TPU/DisablePadAfterEnzyme`	`0.00042050000000000003` s	`0.0003813` s	`1.10`

This comment was automatically generated by workflow using github-action-benchmark.

.buildkite/pipeline.yml

avik-pal · 2025-11-09T22:02:45Z

arith.constant not being raised specifically on TPU

https://github.com/EnzymeAD/Reactant.jl/actions/runs/19214211473/job/54921046081?pr=1835#step:20:843

avik-pal · 2025-11-09T22:48:11Z

#loc = loc(unknown)
module @reactant_gradien... attributes {mhlo.num_partitions = 1 : i64, mhlo.num_replicas = 1 : i64} {
  func.func @main(%arg0: tensor<12x16x4xf32> loc(unknown), %arg1: tensor<4x4xf32> loc(unknown), %arg2: tensor<4x4xf32> loc(unknown), %arg3: tensor<4xf32> loc(unknown), %arg4: tensor<4xf32> loc(unknown), %arg5: tensor<2xui64> loc(unknown)) -> (tensor<f32>, tensor<4x4xf32>, tensor<4x4xf32>, tensor<4xf32>, tensor<4xf32>) {
    %cst = stablehlo.constant dense<2.000000e+00> : tensor<4x12xf32> loc(#loc)
    %c = stablehlo.constant dense<14> : tensor<i64> loc(#loc)
    %c0 = arith.constant 0 : tensor<15xi64> loc(#loc)
    %cst_0 = stablehlo.constant dense<0.000000e+00> : tensor<15x4x12xf32> loc(#loc)
    %cst_1 = stablehlo.constant dense<1.000000e+00> : tensor<4x12xf32> loc(#loc)
    %c_2 = stablehlo.constant dense<0> : tensor<i64> loc(#loc)
    %c_3 = stablehlo.constant dense<1> : tensor<i64> loc(#loc)
    %c_4 = stablehlo.constant dense<15> : tensor<i64> loc(#loc)
    %cst_5 = stablehlo.constant dense<0.000000e+00> : tensor<f32> loc(#loc)
    %cst_6 = stablehlo.constant dense<0.000000e+00> : tensor<4x4xf32> loc(#loc)
    %cst_7 = stablehlo.constant dense<0.000000e+00> : tensor<4x12xf32> loc(#loc)
    %0 = stablehlo.transpose %arg0, dims = [2, 1, 0] : (tensor<12x16x4xf32>) -> tensor<4x16x12xf32> loc(#loc)
    %1 = stablehlo.slice %0 [0:4, 0:1, 0:12] : (tensor<4x16x12xf32>) -> tensor<4x1x12xf32> loc(#loc)
    %2 = stablehlo.reshape %1 : (tensor<4x1x12xf32>) -> tensor<4x12xf32> loc(#loc)
    %3 = stablehlo.broadcast_in_dim %arg4, dims = [0] : (tensor<4xf32>) -> tensor<4x12xf32> loc(#loc)
    %4 = stablehlo.broadcast_in_dim %arg3, dims = [0] : (tensor<4xf32>) -> tensor<4x12xf32> loc(#loc)
    %5 = stablehlo.dot_general %arg1, %2, contracting_dims = [0] x [0], precision = [DEFAULT, DEFAULT] : (tensor<4x4xf32>, tensor<4x12xf32>) -> tensor<4x12xf32> loc(#loc)
    %6 = stablehlo.add %5, %4 : tensor<4x12xf32> loc(#loc)
    %7 = stablehlo.add %3, %6 : tensor<4x12xf32> loc(#loc)
    %8 = stablehlo.tanh %7 : tensor<4x12xf32> loc(#loc)
    %9 = stablehlo.broadcast_in_dim %arg1, dims = [1, 2] : (tensor<4x4xf32>) -> tensor<15x4x4xf32> loc(#loc)
    %10 = stablehlo.slice %0 [0:4, 1:16, 0:12] : (tensor<4x16x12xf32>) -> tensor<4x15x12xf32> loc(#loc)
    %11 = stablehlo.dot_general %9, %10, batching_dims = [0] x [1], contracting_dims = [1] x [0], precision = [DEFAULT, DEFAULT] : (tensor<15x4x4xf32>, tensor<4x15x12xf32>) -> tensor<15x4x12xf32> loc(#loc)
    %12 = stablehlo.broadcast_in_dim %arg3, dims = [1] : (tensor<4xf32>) -> tensor<15x4x12xf32> loc(#loc)
    %13 = stablehlo.add %11, %12 : tensor<15x4x12xf32> loc(#loc)
    %14:5 = stablehlo.while(%iterArg = %c_2, %iterArg_8 = %8, %iterArg_9 = %cst_0, %iterArg_10 = %c0, %iterArg_11 = %cst_0) : tensor<i64>, tensor<4x12xf32>, tensor<15x4x12xf32>, tensor<15xi64>, tensor<15x4x12xf32>
    cond {
      %31 = stablehlo.compare  LT, %iterArg, %c_4 : (tensor<i64>, tensor<i64>) -> tensor<i1> loc(#loc)
      stablehlo.return %31 : tensor<i1> loc(#loc)
    } do {
      %31 = stablehlo.add %iterArg, %c_3 : tensor<i64> loc(#loc)
      %32 = stablehlo.reshape %iterArg_8 : (tensor<4x12xf32>) -> tensor<1x4x12xf32> loc(#loc)
      %33 = stablehlo.dynamic_update_slice %iterArg_9, %32, %iterArg, %c_2, %c_2 : (tensor<15x4x12xf32>, tensor<1x4x12xf32>, tensor<i64>, tensor<i64>, tensor<i64>) -> tensor<15x4x12xf32> loc(#loc)
      %34 = stablehlo.dot_general %arg2, %iterArg_8, contracting_dims = [0] x [0], precision = [DEFAULT, DEFAULT] : (tensor<4x4xf32>, tensor<4x12xf32>) -> tensor<4x12xf32> loc(#loc)
      %35 = stablehlo.add %34, %3 : tensor<4x12xf32> loc(#loc)
      %36 = stablehlo.reshape %iterArg : (tensor<i64>) -> tensor<1xi64> loc(#loc)
      %37 = stablehlo.dynamic_update_slice %iterArg_10, %36, %iterArg : (tensor<15xi64>, tensor<1xi64>, tensor<i64>) -> tensor<15xi64> loc(#loc)
      %38 = stablehlo.dynamic_slice %13, %iterArg, %c_2, %c_2, sizes = [1, 4, 12] : (tensor<15x4x12xf32>, tensor<i64>, tensor<i64>, tensor<i64>) -> tensor<1x4x12xf32> loc(#loc)
      %39 = stablehlo.reshape %38 : (tensor<1x4x12xf32>) -> tensor<4x12xf32> loc(#loc)
      %40 = stablehlo.add %35, %39 : tensor<4x12xf32> loc(#loc)
      %41 = stablehlo.reshape %40 : (tensor<4x12xf32>) -> tensor<1x4x12xf32> loc(#loc)
      %42 = stablehlo.dynamic_update_slice %iterArg_11, %41, %iterArg, %c_2, %c_2 : (tensor<15x4x12xf32>, tensor<1x4x12xf32>, tensor<i64>, tensor<i64>, tensor<i64>) -> tensor<15x4x12xf32> loc(#loc)
      %43 = stablehlo.tanh %40 : tensor<4x12xf32> loc(#loc)
      stablehlo.return %31, %43, %33, %37, %42 : tensor<i64>, tensor<4x12xf32>, tensor<15x4x12xf32>, tensor<15xi64>, tensor<15x4x12xf32> loc(#loc)
    } loc(#loc)
    %15 = stablehlo.multiply %14#1, %14#1 : tensor<4x12xf32> loc(#loc)
    %16 = stablehlo.reduce(%15 init: %cst_5) applies stablehlo.add across dimensions = [0, 1] : (tensor<4x12xf32>, tensor<f32>) -> tensor<f32> loc(#loc)
    %17 = stablehlo.multiply %14#1, %cst : tensor<4x12xf32> loc(#loc)
    %18:6 = stablehlo.while(%iterArg = %c_2, %iterArg_8 = %17, %iterArg_9 = %cst_0, %iterArg_10 = %cst_7, %iterArg_11 = %cst_6, %iterArg_12 = %c) : tensor<i64>, tensor<4x12xf32>, tensor<15x4x12xf32>, tensor<4x12xf32>, tensor<4x4xf32>, tensor<i64>
    cond {
      %31 = stablehlo.compare  LT, %iterArg, %c_4 : (tensor<i64>, tensor<i64>) -> tensor<i1> loc(#loc)
      stablehlo.return %31 : tensor<i1> loc(#loc)
    } do {
      %31 = stablehlo.add %iterArg, %c_3 : tensor<i64> loc(#loc)
      %32 = stablehlo.dynamic_slice %14#4, %iterArg_12, %c_2, %c_2, sizes = [1, 4, 12] : (tensor<15x4x12xf32>, tensor<i64>, tensor<i64>, tensor<i64>) -> tensor<1x4x12xf32> loc(#loc)
      %33 = stablehlo.tanh %32 : tensor<1x4x12xf32> loc(#loc)
      %34 = stablehlo.multiply %33, %33 : tensor<1x4x12xf32> loc(#loc)
      %35 = stablehlo.reshape %34 : (tensor<1x4x12xf32>) -> tensor<4x12xf32> loc(#loc)
      %36 = stablehlo.subtract %cst_1, %35 : tensor<4x12xf32> loc(#loc)
      %37 = stablehlo.multiply %iterArg_8, %36 : tensor<4x12xf32> loc(#loc)
      %38 = stablehlo.reshape %37 : (tensor<4x12xf32>) -> tensor<1x4x12xf32> loc(#loc)
      %39 = stablehlo.dynamic_slice %14#3, %iterArg_12, sizes = [1] : (tensor<15xi64>, tensor<i64>) -> tensor<1xi64> loc(#loc)
      %40 = stablehlo.reshape %39 : (tensor<1xi64>) -> tensor<i64> loc(#loc)
      %41 = stablehlo.dynamic_update_slice %cst_0, %38, %40, %c_2, %c_2 : (tensor<15x4x12xf32>, tensor<1x4x12xf32>, tensor<i64>, tensor<i64>, tensor<i64>) -> tensor<15x4x12xf32> loc(#loc)
      %42 = stablehlo.add %iterArg_9, %41 : tensor<15x4x12xf32> loc(#loc)
      %43 = stablehlo.add %iterArg_10, %37 : tensor<4x12xf32> loc(#loc)
      %44 = stablehlo.dynamic_slice %14#2, %iterArg_12, %c_2, %c_2, sizes = [1, 4, 12] : (tensor<15x4x12xf32>, tensor<i64>, tensor<i64>, tensor<i64>) -> tensor<1x4x12xf32> loc(#loc)
      %45 = stablehlo.reshape %44 : (tensor<1x4x12xf32>) -> tensor<4x12xf32> loc(#loc)
      %46 = stablehlo.dot_general %45, %37, contracting_dims = [1] x [1], precision = [DEFAULT, DEFAULT] : (tensor<4x12xf32>, tensor<4x12xf32>) -> tensor<4x4xf32> loc(#loc)
      %47 = stablehlo.add %iterArg_11, %46 : tensor<4x4xf32> loc(#loc)
      %48 = stablehlo.dot_general %arg2, %37, contracting_dims = [1] x [0], precision = [DEFAULT, DEFAULT] : (tensor<4x4xf32>, tensor<4x12xf32>) -> tensor<4x12xf32> loc(#loc)
      %49 = stablehlo.subtract %iterArg_12, %c_3 : tensor<i64> loc(#loc)
      stablehlo.return %31, %48, %42, %43, %47, %49 : tensor<i64>, tensor<4x12xf32>, tensor<15x4x12xf32>, tensor<4x12xf32>, tensor<4x4xf32>, tensor<i64> loc(#loc)
    } loc(#loc)
    %19 = stablehlo.reduce(%18#2 init: %cst_5) applies stablehlo.add across dimensions = [0, 2] : (tensor<15x4x12xf32>, tensor<f32>) -> tensor<4xf32> loc(#loc)
    %20 = stablehlo.dot_general %10, %18#2, batching_dims = [1] x [0], contracting_dims = [2] x [2], precision = [DEFAULT, DEFAULT] : (tensor<4x15x12xf32>, tensor<15x4x12xf32>) -> tensor<15x4x4xf32> loc(#loc)
    %21 = stablehlo.reduce(%20 init: %cst_5) applies stablehlo.add across dimensions = [0] : (tensor<15x4x4xf32>, tensor<f32>) -> tensor<4x4xf32> loc(#loc)
    %22 = stablehlo.multiply %8, %8 : tensor<4x12xf32> loc(#loc)
    %23 = stablehlo.subtract %cst_1, %22 : tensor<4x12xf32> loc(#loc)
    %24 = stablehlo.multiply %18#1, %23 : tensor<4x12xf32> loc(#loc)
    %25 = stablehlo.add %18#3, %24 : tensor<4x12xf32> loc(#loc)
    %26 = stablehlo.dot_general %2, %24, contracting_dims = [1] x [1], precision = [DEFAULT, DEFAULT] : (tensor<4x12xf32>, tensor<4x12xf32>) -> tensor<4x4xf32> loc(#loc)
    %27 = stablehlo.add %21, %26 : tensor<4x4xf32> loc(#loc)
    %28 = stablehlo.reduce(%24 init: %cst_5) applies stablehlo.add across dimensions = [1] : (tensor<4x12xf32>, tensor<f32>) -> tensor<4xf32> loc(#loc)
    %29 = stablehlo.add %19, %28 : tensor<4xf32> loc(#loc)
    %30 = stablehlo.reduce(%25 init: %cst_5) applies stablehlo.add across dimensions = [1] : (tensor<4x12xf32>, tensor<f32>) -> tensor<4xf32> loc(#loc)
    return %16, %27, %18#4, %29, %30 : tensor<f32>, tensor<4x4xf32>, tensor<4x4xf32>, tensor<4xf32>, tensor<4xf32> loc(#loc)
  } loc(#loc)
} loc(#loc)

Comment out the Julia CI job configuration in the pipeline.

avik-pal marked this pull request as draft November 7, 2025 20:15

avik-pal force-pushed the ap/missing_passes branch from 8534d09 to 54f5156 Compare November 9, 2025 15:40

avik-pal added the run benchmarks label Nov 9, 2025

avik-pal marked this pull request as ready for review November 9, 2025 18:44

avik-pal changed the title ~~feat: missing upstream passes~~ feat: missing upstream passes + rocm jll Nov 9, 2025

avik-pal force-pushed the ap/missing_passes branch from 89eebc4 to 3957f65 Compare November 9, 2025 18:47

github-actions bot reviewed Nov 9, 2025

View reviewed changes

avik-pal force-pushed the ap/missing_passes branch from 7fee732 to 50f6e79 Compare November 9, 2025 20:42

avik-pal commented Nov 9, 2025

View reviewed changes

.buildkite/pipeline.yml Outdated Show resolved Hide resolved

avik-pal force-pushed the ap/missing_passes branch from 6398289 to 58f746a Compare November 9, 2025 23:12

avik-pal and others added 8 commits November 9, 2025 17:13

feat: missing upstream passes

9c19482

chore: update Reactant_jll version to 0.0.258

76ef3bb

ci: run on AMDGPU ci

5e2c2b1

Update Reactant_jll version in Project.toml

77a870c

Update Reactant_jll version in Project.toml

a4f0f53

ci: disable amdgpu ci for now

10bdab9

Comment out the Julia CI job configuration in the pipeline.

Apply suggestion from @avik-pal

b0a8255

fix: force a raising pass after enzyme is run

5c0e420

avik-pal force-pushed the ap/missing_passes branch from 58f746a to 5c0e420 Compare November 9, 2025 23:13

avik-pal merged commit d2f10ee into main Nov 10, 2025
60 of 67 checks passed

avik-pal deleted the ap/missing_passes branch November 10, 2025 01:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: missing upstream passes + rocm jll #1835

feat: missing upstream passes + rocm jll #1835

Uh oh!

avik-pal commented Nov 7, 2025 •

edited

Loading

Uh oh!

avik-pal commented Nov 9, 2025

Uh oh!

github-actions bot left a comment •

edited

Loading

Uh oh!

Uh oh!

avik-pal commented Nov 9, 2025

Uh oh!

avik-pal commented Nov 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: missing upstream passes + rocm jll #1835

feat: missing upstream passes + rocm jll #1835

Uh oh!

Conversation

avik-pal commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avik-pal commented Nov 9, 2025

Uh oh!

github-actions bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Reactant.jl Benchmarks

Uh oh!

Uh oh!

avik-pal commented Nov 9, 2025

Uh oh!

avik-pal commented Nov 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

avik-pal commented Nov 7, 2025 •

edited

Loading

github-actions bot left a comment •

edited

Loading