Release Thunder 0.2.5 - Summer Harvest · Lightning-AI/lightning-thunder

What's Changed

bump version to 0.2.5.dev0 by @t-vi in #2274
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2271
Add support for torch.argsort by @protonu in #2246
Add test for Phi-3-vision-128k-instruct by @kshitij12345 in #1850
DTensor: NVFuser Integration by @kshitij12345 in #2177
fix lint from merge by @t-vi in #2277
Remove F842 from ignore rules by @crcrpar in #2270
E2E Coverage Test for Thunder by @tejapulagam in #2086
convert pow gradient to new style by @t-vi in #2283
add Windows xfail/skipif to tests not working on windows by @t-vi in #2284
fix activation checkpointing in the joint trace by @beverlylytle in #2203
Remove F811 from ignore rules by @crcrpar in #2268
fix lint by @t-vi in #2286
Return saved-for-backward objects as tuples by @beverlylytle in #2279
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2287
Update coverage requirement from ~=7.8.2 to ~=7.9.1 by @dependabot[bot] in #2292
Add "transformer_engine_v2" to expected all executors set by @crcrpar in #2226
Fix CI benchmarks by @KaelanDt in #2303
Bump pytest-random-order from 1.1.1 to 1.2.0 by @dependabot[bot] in #2289
prune redundant ifs in ci workflows by @Borda in #2296
Bump pytest-cov from 6.1.1 to 6.2.1 by @dependabot[bot] in #2291
Add hardsigmoid op by @beverlylytle in #2304
Update hypothesis requirement from ~=6.133.0 to ~=6.135.20 by @dependabot[bot] in #2290
skip complex dtype tensor from aminmax by @crcrpar in #2276
Extend thunder.jit coverage on HF models by @lantiga in #2281
Add PEFT benchmarking script in thunder/benchmarks by @riccardofelluga in #2254
support tuples as replaces arg for operator registration by @KaelanDt in #2308
Improve reporting from thunder.jit coverage CI job by @lantiga in #2309
Decrease SKIPPED by adding dependencies by @lantiga in #2312
Add scalar tensor input to full_sample_generator by @IvanYashchuk in #2318
Update requirements/test.txt bitsandbytes to cover aarch64 platform_machine by @nWEIdia in #2321
update TE test by @kshitij12345 in #2319
[TE] catch different error for xfail test by @kshitij12345 in #2322
Run TE tests in CI by @kshitij12345 in #2320
Add ops for HF transformers by @kiya00 in #2217
Move PEFT model materialization by @riccardofelluga in #2334
Fix dataflow ordering for recomputed symbols by @riccardofelluga in #2317
Update LoRA config for mamba models in PEFT by @riccardofelluga in #2333
Remove external logger dependency by @riccardofelluga in #2327
Relax inplace sanity check by @beverlylytle in #2314
Make alias updating the default in-place operator approach by @beverlylytle in #2052
Propagate backward tags more consistently by @beverlylytle in #2336
Make inplace flags defaults consistent by @beverlylytle in #2349
add/debug Lit CI by @Borda in #2339
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2311
ci: run notebooks with lit CI by @Borda in #2351
fix docker sanity check by @Borda in #2352
empty nvfuser.FusionCache after each test by @t-vi in #2354
bump bitsandbytes from 0.42.0 to 0.46.1 by @lianakoleva in #2238
remove spurious print by @t-vi in #2357
limit gpu mem usage by @t-vi in #2358
lit CI: switch to L4_X_2 by @Borda in #2360
[grad test] relax tolerance by @kshitij12345 in #2364
[dtensor] don't rely on repr of DTenorSpec by @kshitij12345 in #2359
[thunderfx] mark output non-differentiable based on FXGraph output node inspection by @kshitij12345 in #2348
Specify torch.randint's default dtype by @shino16 in #2342
Enable take and take_along_axis in nvfuser executor by @crcrpar in #2031
Add torch.scalar_tensor to default_torch_ops.py by @crcrpar in #2310
Make pre-commit hooks work on all python3s by @wujingyue in #2373
Remove unnecessary underscores by @wujingyue in #2372
Remove functionalization path by @beverlylytle in #2368
Add the support of uint8 / Byte to nvfuser executor by @crcrpar in #2299
Revert "Add the support of uint8 / Byte to nvfuser executor (#2299)" by @t-vi in #2378
Update hypothesis requirement from ~=6.135.20 to ~=6.136.6 by @dependabot[bot] in #2384
Remove early split trace path by @beverlylytle in #2375
remove debugging leftover by @t-vi in #2390
Bump graphviz from 0.20.3 to 0.21 by @dependabot[bot] in #2387
Fixes mincut error in rematerialization when there's overlap between source and sink variables by @kiya00 in #2369
Lower cumsum to nvfuser by @wujingyue in #2374
Fix example in README.md by @zasdfgbnm in #2381
Update TE v2 executor tests by @riccardofelluga in #2376
Remove the bookend optimization by @wujingyue in #2379
Bugfix/fix binary subscr class getitem by @tejapulagam in #2366
Add class_getitem for list, tuple, and dict by @t-vi in #2394
TransformerEngine executor checkpointing by @riccardofelluga in #2344
Add _grouped_mm and lower it to nvFuser and torchex by @protonu in #2326
[nvfuser] register prims.le by @kshitij12345 in #2377
[dtensor] use nvfuser_direct for nvfuser dtensor execution by @kshitij12345 in #2370
Support torch.square natively by @crcrpar in #2329
Adds save_thunderfx_repros to save scripts for all the subgraphs and optionally save fusion region and traces by @kiya00 in #2232
Add TEv2 Transform reset by @riccardofelluga in #2401
deps: pin cuda-python >=12.0, <13.0.0 by @Borda in #2410
docker: build images for Torch 2.8 by @Borda in #2408
chore: bump Torch 2.8 by @Borda in #2413
Simplify the argsort support by @wujingyue in #2395
ci: reinstall correct torch dependencies by @Borda in #2415
nvfuserex: return cumsum result in int64 when input is int/bool and result dtypes is not specified by @crcrpar in #2418
fix typo: "NotImplementedErrror" -> "NotImplementedError" by @crcrpar in #2417
[dtensor] add reshape prim and grad rule by @kshitij12345 in #2383
ci: testing minimal requirements & required bumps by @Borda in #2295
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2367
skip - failing dtensor test by @kshitij12345 in #2425
[minor] fix type annotation by @kshitij12345 in #2400
Return the updated inp from seteitem_ by @crcrpar in #2397
[docs] Add thunderfx to dynamo/index.rst by @crcrpar in #2421
bitsandbytes: update _bitsandbytes_available by @kshitij12345 in #2426
re-enable cuda-python nb by @kshitij12345 in #2414
Add scatter support in nvfuserex by @jjsjann123 in #2431
Revert "skip - failing dtensor test" by @wujingyue in #2428
[thunderfx] Handle output node with no example_value by @kshitij12345 in #2429
Change mypy ignore_errors value from string to a bool by @fnhirwa in #2325
const-fold tfms: update and re-enable test by @kshitij12345 in #2244
replace True with true in pyproject.toml by @kshitij12345 in #2433
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2434
Relax tolerance for apex xentropy for float16 by @beverlylytle in #2411
Remove outdated comments in autodiff by @beverlylytle in #2442
try to fix te ci by @t-vi in #2443
Diffuser test by @kiya00 in #2141
CI: fix slicing for PyTorch nightly by @kshitij12345 in #2447
Add index_put support in nvfuser by @jjsjann123 in #2183
Remove op_name_to_fn: dict[str, Callable] from test_cudnn_executor by @crcrpar in #2441
[DTensor] Add a test with Opinfos by @kshitij12345 in #2412
remove setitem_ output manipulation by @beverlylytle in #2427
Fix nvfuserex scatter translation to use compile time scatter dim. by @jjsjann123 in #2435
[reporting tool] Fixes import error in report script by @kiya00 in #2424
Fix memory leak due to CompileData being in a cycle by @kshitij12345 in #2449
Add torch.float8_e8m0fnu by @crcrpar in #2215
Support mode 'bilinear' for torch.nn.interpolate by @kilpelainenj in #2241
Support torch.Tensor.view(dtype) by @crcrpar in #2213
require PyTorch 2.7 by @t-vi in #2451
disable interpolate consistency test for NVFuser by @t-vi in #2453
Don't replace unused variables with None by @beverlylytle in #2396
TEv2 Add multi-gpu support and tests by @riccardofelluga in #2404
update linting: replace yesqa by RUF100 by @Borda in #2457
fixing interpolation decomposition in torch by @jjsjann123 in #2454
fix/update linting configuration by @Borda in #2455
Break ref cycle in interpreter.py:fn_ by @kshitij12345 in #2459
fix F841 prune unused variables by @Borda in #2456
Remove JAX from tests by @crcrpar in #2460
RowParallelEmbeddingPreProcess -> RowParallelEmbeddingPrePostProcess by @crcrpar in #2448
Fix memory leak related to prologue holding onto user passed nn.Module by @kshitij12345 in #2461
fix typo of "extracts" by @crcrpar in #2465
Define CompiledObject once at thunder.dynamo load time by @crcrpar in #2204
Restore resnet18 test by @beverlylytle in #2473
fix F822 undefined-export by @Borda in #2466
fix F601 multi-value-repeated-key-literal by @Borda in #2467
fix F403 undefined-local-with-import-star by @Borda in #2468
fix E711 none-comparison by @Borda in #2469
Break ref cycle related to CompileStatistics object by @kshitij12345 in #2463
Add cudnn-frontend based backward of layer_norm by @crcrpar in #2402
Remove None from required_producer_vars in find_cut by @crcrpar in #2393
benchmark_litgpt - update low_precision_mode accepted values by @kshitij12345 in #1779
[test_ops] Make repro command more specific by @crcrpar in #2190
Return tuple of 3 Nones when try_and_log_benchmark fails by @crcrpar in #2172
Clean up thunder/tests/README.md by @crcrpar in #2464
ThunderFX: Modify GraphModule in-place by @shino16 in #2399
Implement thunder.torch.custom_op._register_custom_op by @crcrpar in #2403
convert transformer-engine to lit CI by @Borda in #2483
Replace linear checker for TEv2 by @riccardofelluga in #2485
[DTensor] Add prims for dtype conversion and broadcasting by @kshitij12345 in #2382
Skip TE test on SM120+ as Float8BlockScaling is currently unsupported in thunder by @kshitij12345 in #2475
Bump the gha-updates group with 4 updates by @dependabot[bot] in #2493
Bump bitsandbytes from 0.46.1 to 0.47.0 by @dependabot[bot] in #2488
Bump pytest-xdist from 3.7.0 to 3.8.0 by @dependabot[bot] in #2489
Ensure autograd is enabled before connecting Thunder-compiled fn to autograd by @shino16 in #2479
Add Llama4 MoE implementation to test_networks by @kshitij12345 in #2450
[DTensor] Use enum for PrimIDs similar to prims.PrimIDs by @kshitij12345 in #2495
Add support for torch.float4_e2m1fn_x2 by @IvanYashchuk in #2315
remove F401 from ignore rules -- unused imports by @crcrpar in #2272
thunderfx: Avoid failure when example_value does not have attr of grad_fn by @crcrpar in #2486
Reflect cd.is_grad_enabled to pytorch by @shino16 in #2499
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2498
restore type condition on training resnet test by @beverlylytle in #2494
Skip sections with vmap for thunderfx by @t-vi in #2504
[DTensor] Update creation of nvFuser.DeviceMesh by @kshitij12345 in #2423
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci[bot] in #2505
add skip for litGPT by @Borda in #2506
drop nll_loss shapes forbidden by PT 2.9 by @t-vi in #2508
Append the output of grad_transform_on_trace to computation_traces by @crcrpar in #2392
Remove input tensors' check_tensor_shape_and_metadata from prologue when inputs are free from torch.SymInt by @crcrpar in #2205
pin transformers for quickstart by @t-vi in #2511
release 0.2.5 by @t-vi in #2512

New Contributors

@tejapulagam made their first contribution in #2086
@nWEIdia made their first contribution in #2321
@zasdfgbnm made their first contribution in #2381
@fnhirwa made their first contribution in #2325
@kilpelainenj made their first contribution in #2241

Full Changelog: 0.2.4...0.2.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thunder 0.2.5 - Summer Harvest

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!