Thunder 0.2.5 - Summer Harvest
What's Changed
- bump version to 0.2.5.dev0 by @t-vi in #2274
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2271
- Add support for torch.argsort by @protonu in #2246
- Add test for Phi-3-vision-128k-instruct by @kshitij12345 in #1850
- DTensor: NVFuser Integration by @kshitij12345 in #2177
- fix lint from merge by @t-vi in #2277
- Remove F842 from ignore rules by @crcrpar in #2270
- E2E Coverage Test for Thunder by @tejapulagam in #2086
- convert pow gradient to new style by @t-vi in #2283
- add Windows xfail/skipif to tests not working on windows by @t-vi in #2284
- fix activation checkpointing in the joint trace by @beverlylytle in #2203
- Remove F811 from ignore rules by @crcrpar in #2268
- fix lint by @t-vi in #2286
- Return saved-for-backward objects as tuples by @beverlylytle in #2279
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2287
- Update coverage requirement from ~=7.8.2 to ~=7.9.1 by @dependabot[bot] in #2292
- Add "transformer_engine_v2" to expected all executors set by @crcrpar in #2226
- Fix CI benchmarks by @KaelanDt in #2303
- Bump pytest-random-order from 1.1.1 to 1.2.0 by @dependabot[bot] in #2289
- prune redundant ifs in ci workflows by @Borda in #2296
- Bump pytest-cov from 6.1.1 to 6.2.1 by @dependabot[bot] in #2291
- Add hardsigmoid op by @beverlylytle in #2304
- Update hypothesis requirement from ~=6.133.0 to ~=6.135.20 by @dependabot[bot] in #2290
- skip complex dtype tensor from
aminmaxby @crcrpar in #2276 - Extend thunder.jit coverage on HF models by @lantiga in #2281
- Add PEFT benchmarking script in
thunder/benchmarksby @riccardofelluga in #2254 - support tuples as
replacesarg for operator registration by @KaelanDt in #2308 - Improve reporting from thunder.jit coverage CI job by @lantiga in #2309
- Decrease SKIPPED by adding dependencies by @lantiga in #2312
- Add scalar tensor input to full_sample_generator by @IvanYashchuk in #2318
- Update requirements/test.txt bitsandbytes to cover aarch64 platform_machine by @nWEIdia in #2321
- update TE test by @kshitij12345 in #2319
- [TE] catch different error for xfail test by @kshitij12345 in #2322
- Run TE tests in CI by @kshitij12345 in #2320
- Add ops for HF transformers by @kiya00 in #2217
- Move PEFT model materialization by @riccardofelluga in #2334
- Fix dataflow ordering for recomputed symbols by @riccardofelluga in #2317
- Update LoRA config for mamba models in PEFT by @riccardofelluga in #2333
- Remove external logger dependency by @riccardofelluga in #2327
- Relax inplace sanity check by @beverlylytle in #2314
- Make alias updating the default in-place operator approach by @beverlylytle in #2052
- Propagate backward tags more consistently by @beverlylytle in #2336
- Make inplace flags defaults consistent by @beverlylytle in #2349
- add/debug Lit CI by @Borda in #2339
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2311
- ci: run notebooks with lit CI by @Borda in #2351
- fix docker sanity check by @Borda in #2352
- empty nvfuser.FusionCache after each test by @t-vi in #2354
- bump bitsandbytes from 0.42.0 to 0.46.1 by @lianakoleva in #2238
- remove spurious print by @t-vi in #2357
- limit gpu mem usage by @t-vi in #2358
- lit CI: switch to
L4_X_2by @Borda in #2360 - [grad test] relax tolerance by @kshitij12345 in #2364
- [dtensor] don't rely on repr of DTenorSpec by @kshitij12345 in #2359
- [thunderfx] mark output non-differentiable based on FXGraph output node inspection by @kshitij12345 in #2348
- Specify torch.randint's default dtype by @shino16 in #2342
- Enable
takeandtake_along_axisin nvfuser executor by @crcrpar in #2031 - Add
torch.scalar_tensortodefault_torch_ops.pyby @crcrpar in #2310 - Make pre-commit hooks work on all python3s by @wujingyue in #2373
- Remove unnecessary underscores by @wujingyue in #2372
- Remove functionalization path by @beverlylytle in #2368
- Add the support of
uint8/Byteto nvfuser executor by @crcrpar in #2299 - Revert "Add the support of
uint8/Byteto nvfuser executor (#2299)" by @t-vi in #2378 - Update hypothesis requirement from ~=6.135.20 to ~=6.136.6 by @dependabot[bot] in #2384
- Remove early split trace path by @beverlylytle in #2375
- remove debugging leftover by @t-vi in #2390
- Bump graphviz from 0.20.3 to 0.21 by @dependabot[bot] in #2387
- Fixes mincut error in rematerialization when there's overlap between source and sink variables by @kiya00 in #2369
- Lower cumsum to nvfuser by @wujingyue in #2374
- Fix example in README.md by @zasdfgbnm in #2381
- Update TE v2 executor tests by @riccardofelluga in #2376
- Remove the bookend optimization by @wujingyue in #2379
- Bugfix/fix binary subscr class getitem by @tejapulagam in #2366
- Add class_getitem for list, tuple, and dict by @t-vi in #2394
- TransformerEngine executor checkpointing by @riccardofelluga in #2344
- Add _grouped_mm and lower it to nvFuser and torchex by @protonu in #2326
- [nvfuser] register prims.le by @kshitij12345 in #2377
- [dtensor] use nvfuser_direct for nvfuser dtensor execution by @kshitij12345 in #2370
- Support
torch.squarenatively by @crcrpar in #2329 - Adds save_thunderfx_repros to save scripts for all the subgraphs and optionally save fusion region and traces by @kiya00 in #2232
- Add TEv2 Transform reset by @riccardofelluga in #2401
- deps: pin
cuda-python >=12.0, <13.0.0by @Borda in #2410 - docker: build images for Torch 2.8 by @Borda in #2408
- chore: bump Torch 2.8 by @Borda in #2413
- Simplify the argsort support by @wujingyue in #2395
- ci: reinstall correct torch dependencies by @Borda in #2415
- nvfuserex: return cumsum result in
int64when input is int/bool and result dtypes is not specified by @crcrpar in #2418 - fix typo: "NotImplementedErrror" -> "NotImplementedError" by @crcrpar in #2417
- [dtensor] add reshape prim and grad rule by @kshitij12345 in #2383
- ci: testing minimal requirements & required bumps by @Borda in #2295
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2367
- skip - failing dtensor test by @kshitij12345 in #2425
- [minor] fix type annotation by @kshitij12345 in #2400
- Return the updated
inpfromseteitem_by @crcrpar in #2397 - [docs] Add
thunderfxto dynamo/index.rst by @crcrpar in #2421 - bitsandbytes: update _bitsandbytes_available by @kshitij12345 in #2426
- re-enable cuda-python nb by @kshitij12345 in #2414
- Add scatter support in nvfuserex by @jjsjann123 in #2431
- Revert "skip - failing dtensor test" by @wujingyue in #2428
- [thunderfx] Handle output node with no example_value by @kshitij12345 in #2429
- Change
mypyignore_errorsvalue from string to a bool by @fnhirwa in #2325 - const-fold tfms: update and re-enable test by @kshitij12345 in #2244
- replace True with true in pyproject.toml by @kshitij12345 in #2433
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2434
- Relax tolerance for apex xentropy for float16 by @beverlylytle in #2411
- Remove outdated comments in autodiff by @beverlylytle in #2442
- try to fix te ci by @t-vi in #2443
- Diffuser test by @kiya00 in #2141
- CI: fix slicing for PyTorch nightly by @kshitij12345 in #2447
- Add index_put support in nvfuser by @jjsjann123 in #2183
- Remove
op_name_to_fn: dict[str, Callable]from test_cudnn_executor by @crcrpar in #2441 - [DTensor] Add a test with Opinfos by @kshitij12345 in #2412
- remove setitem_ output manipulation by @beverlylytle in #2427
- Fix nvfuserex scatter translation to use compile time scatter dim. by @jjsjann123 in #2435
- [reporting tool] Fixes import error in report script by @kiya00 in #2424
- Fix memory leak due to CompileData being in a cycle by @kshitij12345 in #2449
- Add
torch.float8_e8m0fnuby @crcrpar in #2215 - Support mode 'bilinear' for torch.nn.interpolate by @kilpelainenj in #2241
- Support
torch.Tensor.view(dtype)by @crcrpar in #2213 - require PyTorch 2.7 by @t-vi in #2451
- disable interpolate consistency test for NVFuser by @t-vi in #2453
- Don't replace unused variables with None by @beverlylytle in #2396
- TEv2 Add multi-gpu support and tests by @riccardofelluga in #2404
- update linting: replace
yesqabyRUF100by @Borda in #2457 - fixing interpolation decomposition in torch by @jjsjann123 in #2454
- fix/update linting configuration by @Borda in #2455
- Break ref cycle in interpreter.py:fn_ by @kshitij12345 in #2459
- fix
F841prune unused variables by @Borda in #2456 - Remove JAX from tests by @crcrpar in #2460
RowParallelEmbeddingPreProcess->RowParallelEmbeddingPrePostProcessby @crcrpar in #2448- Fix memory leak related to prologue holding onto user passed nn.Module by @kshitij12345 in #2461
- fix typo of "extracts" by @crcrpar in #2465
- Define
CompiledObjectonce atthunder.dynamoload time by @crcrpar in #2204 - Restore resnet18 test by @beverlylytle in #2473
- fix
F822undefined-export by @Borda in #2466 - fix
F601multi-value-repeated-key-literal by @Borda in #2467 - fix
F403undefined-local-with-import-star by @Borda in #2468 - fix
E711none-comparison by @Borda in #2469 - Break ref cycle related to CompileStatistics object by @kshitij12345 in #2463
- Add cudnn-frontend based backward of layer_norm by @crcrpar in #2402
- Remove
Nonefromrequired_producer_varsinfind_cutby @crcrpar in #2393 - benchmark_litgpt - update
low_precision_modeaccepted values by @kshitij12345 in #1779 - [test_ops] Make repro command more specific by @crcrpar in #2190
- Return tuple of 3
Nones whentry_and_log_benchmarkfails by @crcrpar in #2172 - Clean up
thunder/tests/README.mdby @crcrpar in #2464 - ThunderFX: Modify GraphModule in-place by @shino16 in #2399
- Implement
thunder.torch.custom_op._register_custom_opby @crcrpar in #2403 - convert transformer-engine to lit CI by @Borda in #2483
- Replace linear checker for TEv2 by @riccardofelluga in #2485
- [DTensor] Add prims for dtype conversion and broadcasting by @kshitij12345 in #2382
- Skip TE test on SM120+ as Float8BlockScaling is currently unsupported in thunder by @kshitij12345 in #2475
- Bump the gha-updates group with 4 updates by @dependabot[bot] in #2493
- Bump bitsandbytes from 0.46.1 to 0.47.0 by @dependabot[bot] in #2488
- Bump pytest-xdist from 3.7.0 to 3.8.0 by @dependabot[bot] in #2489
- Ensure autograd is enabled before connecting Thunder-compiled fn to autograd by @shino16 in #2479
- Add Llama4 MoE implementation to test_networks by @kshitij12345 in #2450
- [DTensor] Use enum for PrimIDs similar to prims.PrimIDs by @kshitij12345 in #2495
- Add support for torch.float4_e2m1fn_x2 by @IvanYashchuk in #2315
- remove F401 from ignore rules -- unused imports by @crcrpar in #2272
- thunderfx: Avoid failure when example_value does not have attr of grad_fn by @crcrpar in #2486
- Reflect
cd.is_grad_enabledto pytorch by @shino16 in #2499 - [pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2498
- restore type condition on training resnet test by @beverlylytle in #2494
- Skip sections with vmap for thunderfx by @t-vi in #2504
- [DTensor] Update creation of nvFuser.DeviceMesh by @kshitij12345 in #2423
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2505
- add skip for litGPT by @Borda in #2506
- drop nll_loss shapes forbidden by PT 2.9 by @t-vi in #2508
- Append the output of
grad_transform_on_tracetocomputation_tracesby @crcrpar in #2392 - Remove input tensors'
check_tensor_shape_and_metadatafrom prologue when inputs are free fromtorch.SymIntby @crcrpar in #2205 - pin transformers for quickstart by @t-vi in #2511
- release 0.2.5 by @t-vi in #2512
New Contributors
- @tejapulagam made their first contribution in #2086
- @nWEIdia made their first contribution in #2321
- @zasdfgbnm made their first contribution in #2381
- @fnhirwa made their first contribution in #2325
- @kilpelainenj made their first contribution in #2241
Full Changelog: 0.2.4...0.2.5