Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
316 commits
Select commit Hold shift + click to select a range
c37802a
use multi-dtype bucketing (#166527)
eellison Oct 30, 2025
629293f
bucket all reduce (#166528)
eellison Oct 30, 2025
694d205
Revert "shrink_group implementation to expose ncclCommShrink API (#16…
pytorchmergebot Oct 30, 2025
ba71e9c
[DeviceMesh] Isolate pg creation logic in Device Mesh into a separate…
fduwjj Oct 30, 2025
a553ea9
Fix missing symbol when printing guards (#165723)
aorenste Oct 29, 2025
a5c3c08
[Pytorch] Use exp_u20 for aarch64's erf (#166594)
Nicoshev Oct 30, 2025
8f40a0c
Revert "address DDE in matmul decomp (#166541)"
pytorchmergebot Oct 30, 2025
4acc66f
Make PT2 compile backprop through custom op without autograd key a ha…
ezyang Oct 29, 2025
fcd5f8c
[CodeClean] Remove the Unused MACRO for AOT Inductor Runtime (#165139)
fffrog Oct 30, 2025
398775a
[CodeClean] Replace std::runtime_error with TORCH_CHECK (#165119)
fffrog Oct 30, 2025
639a0b1
Remove torch.distributed.tensor.OpSchema.has_symints (#163667)
swolchok Oct 30, 2025
694db5f
Use 'is' in callable comparisons (#166624)
cyyever Oct 30, 2025
b939de2
Avoid writing temporary modules to disk (#157713)
apmorton Oct 30, 2025
8221ee6
[xpu] Fix type annotation for ProcessGroupXCCL (#166418)
frost-intel Oct 30, 2025
0ec0549
Introduce a new API torch.xpu.get_per_process_memory_fraction (#165511)
guangyey Oct 15, 2025
181ee3b
fix: Add missing signals_to_handle to launcher logging (#166631)
leopold-tzafon Oct 30, 2025
a7fd0b4
[ROCm][CI] fix disk space message (#166645)
amdfaa Oct 30, 2025
ad3a56a
Add a compile-time flag to trigger verbose logging for device-side as…
drdarshan Oct 30, 2025
56838ba
[CP][BE][1/2] Refactor the code structure (#166456)
fegin Oct 30, 2025
52db601
Enable verify_dynamo on Python 3.13 (#166497)
cyyever Oct 30, 2025
f911d64
[CUDA] xFail `max-autotune` grouped gemm tests on devices with insuff…
eqy Oct 30, 2025
99b05d1
Better 1x128, 128x128 error handling on non-Hopper (#166639)
slayton58 Oct 30, 2025
0d50e5d
[3/N] Fix unused loop variables (#166509)
cyyever Oct 30, 2025
80ba6e4
Add warning when users have incomplete setup for type checking (#166603)
maggiemoss Oct 30, 2025
df71b70
[cuDNN][conv] Re-enable cuDNN for 3D convolutions (fixed in 9.15+) (#…
eqy Oct 30, 2025
7692fa0
[Code Clean] Clean asserts in torch/ao/quantization/fx/* (#165420)
zhudada0120 Oct 30, 2025
5fc2c7a
[ROCm][inductor] More configs for pointwise kernels. (#166470)
naromero77amd Oct 30, 2025
f5543e3
[wip] fix searchsorted non dense (#165064)
eellison Oct 30, 2025
45c3f02
[ROCm] moved gfx1100 back to experimental status for AOTriton (#166397)
k-artem Oct 30, 2025
7e3b9d1
[CP][BE][2/2] Refactor the code structure (#166501)
fegin Oct 30, 2025
b9bcb37
[DebugMode] store stringify args by default (#166347)
pianpwk Oct 29, 2025
984e64b
[inductor] Fix constant folder (#166655)
angelayi Oct 30, 2025
7a0cd8e
[ROCm] Disable `__builtin_amdgcn_rcpf` for gfx90a (#166454)
pragupta Oct 30, 2025
bfb47ec
[dynamo] support tracing new typing union syntax X | Y (#166599)
williamwen42 Oct 30, 2025
5d288bc
[BE] Move GreenContext implementation details to cpp (#166462)
malfet Oct 31, 2025
98d640b
Remove AT_USE_HIPSPARSE_GENERIC_API (#166393)
cyyever Oct 31, 2025
47f0024
[CI][BE] Factor out repeated test code (#166481)
malfet Oct 30, 2025
3206677
Fix torch.full with dynamic tensor fill_value in torch.compile (#166554)
amaldevh Oct 31, 2025
24b6eb7
[Inductor] Enable Custom op Autotune Decompositions and Parameter Tun…
tianrengao Oct 31, 2025
1257706
[MPS] Fix crash when max/min ops called for complex types (#166214)
malfet Oct 28, 2025
a6b1ef1
[GraphPartition] cache get_free_symbol_uses (#166338)
BoyuanFeng Oct 31, 2025
1129605
[ROCm][CI] create ROCm 7.1 images for binary builds (#166665)
jeffdaily Oct 31, 2025
d3be06c
[MTIAGraph][Pytorch][2/n] Add binding for Python to C++, and hook for…
andyanwang Oct 31, 2025
d3e511f
[Inductor] support masked vectorization for the tail_loop for fp8 dat…
jiayisunx Oct 30, 2025
f1e4c42
[BE][Typing][Dynamo] Type misc files in `torch/_dynamo/variables/` (#…
Lucaskabela Oct 31, 2025
e3ae059
Add CUDA MXFP4 scaled mm support via. FBGEMM (#166526)
slayton58 Oct 30, 2025
7d39401
Revert "[BE][Typing][Dynamo] Type misc files in `torch/_dynamo/variab…
pytorchmergebot Oct 31, 2025
797cd80
[dynamo, nested graph breaks] codegen dead nested cells correctly (#1…
williamwen42 Oct 31, 2025
1dec8a6
[dynamo, nested graph breaks] add disable_nested_graph_breaks decorat…
williamwen42 Oct 31, 2025
267d019
[dynamo] fix error_on_graph_break bug where non-empty checkpoint resu…
williamwen42 Oct 31, 2025
85b035c
[nativert] Downcast triton double arguments to floats (#166620)
minjang Oct 31, 2025
7d67a41
make FXConverter.generate use V.fake_mode instead of _detect_fake_mod…
jazlyn5 Oct 31, 2025
030de07
[2/N] Use 'is' in callable comparisons (#166685)
cyyever Oct 31, 2025
fc8ac12
[4/N] Remove unused loop variables in tests (#166690)
cyyever Oct 31, 2025
108bb22
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec mod…
XuehaiPan Oct 31, 2025
0d3a4f7
[CD] Enable Inductor performance test for xpu (#166289)
chuanqi129 Oct 31, 2025
fd68d40
[xpu][feature] Integrate OneDNN SDPA training forward/backward into X…
LuFinch Oct 31, 2025
c01636e
Fixes the sparse tensor issue (#163535)
arkadip-maitra Oct 31, 2025
b083193
[inductor] Mark / restrict tests that only work if ATen is used for m…
kundaMwiza Oct 31, 2025
657f8c3
Revert "Fix torch.full with dynamic tensor fill_value in torch.compil…
pytorchmergebot Oct 31, 2025
26534e9
Revert "[GraphPartition] cache get_free_symbol_uses (#166338)"
pytorchmergebot Oct 31, 2025
4e8ba37
Revert "[BE] Move GreenContext implementation details to cpp (#166462)"
pytorchmergebot Oct 31, 2025
5bcfdae
Revert "Make PT2 compile backprop through custom op without autograd …
pytorchmergebot Oct 31, 2025
160ab53
Update weight tensor initialization in RMSNormalization (#166550)
justinchuby Oct 31, 2025
034e951
[CUDA][cuBLASLt] addmm -- extend bias fusions to cases with (1 by n) …
nikitaved Oct 31, 2025
69be99e
Remove manually synced arch versions in `tools/nightly.py` (#166616)
XuehaiPan Oct 30, 2025
24e94e0
[ROCm][CI] create ROCm 7.1 magma tarball (#166693)
jeffdaily Oct 31, 2025
fee7624
[PT2] set choice handler in config (#166607)
xuanzhang816 Oct 31, 2025
1e3600b
[MPS] Move `logaddexp/logaddexp2` to Metal and support complex (#166670)
kurtamohler Oct 30, 2025
c3b71d5
[ROCm][CI] remove relaxed tolerance for tf32 tests (#166478)
jeffdaily Oct 31, 2025
aa9c96a
[BE][Typing][Dynamo] Type misc files in `torch/_dynamo/variables/` (#…
Lucaskabela Oct 31, 2025
1212359
update Node.is_impure check if subgraph contains impure ops (#166609)
jazlyn5 Oct 31, 2025
fcc1063
Revert "[BE][Typing][Dynamo] Type misc files in `torch/_dynamo/variab…
pytorchmergebot Oct 31, 2025
365ed62
Document LibTorch ABI more, add README to headeronly (#166661)
janeyx99 Oct 30, 2025
ffaa657
Revise deprecation warning for ONNX exporter (#166692)
justinchuby Oct 31, 2025
239e7b5
[ROCm][CI] upgrade nightly wheels to ROCm 7.1 (#166730)
jeffdaily Oct 31, 2025
0947765
Cache even more work for return_and_correct_aliasing (#166365)
swolchok Oct 30, 2025
b71966f
[PyTorch] Improve aarch64 performance of bfloat16 ops - retry (#16602…
Nicoshev Oct 31, 2025
85b85f6
Revert "[pytree] add `treespec_{leaf,tuple,dict}` functions for args_…
pytorchmergebot Oct 31, 2025
b470e59
partitioner option to ignore partitioner_tag for abstract usage (#166…
IvanKobzarev Oct 31, 2025
30157d3
Add regional aot eager support to AOTAutogradCacheEntry (#166650)
jamesjwu Oct 30, 2025
08f4535
Refactor AOTAutogradCacheEntry into AOTAutogradResult (#166656)
jamesjwu Oct 31, 2025
d2be06f
[cpu][fix] Update ACL version to fix crashes with tensor sizes > 2^3…
fadara01 Oct 31, 2025
ef8d97e
fix broken nn_convolution test (#166666)
Camyll Oct 31, 2025
856a7a5
Add missing device to namedtensor tests (#166717)
cyyever Oct 31, 2025
cf9a834
[BE] Move GreenContext implementation details to cpp (#166462)
malfet Oct 31, 2025
70aeb49
[dynamo] clarify graph break handling/logging in symbolic_convert (#1…
williamwen42 Oct 31, 2025
8209a05
[Pytorch] Enable aarch64 convert autovec only on clang (#166739)
Nicoshev Oct 31, 2025
4a7bc1d
[BE][Typing][Dynamo] Type misc files in `torch/_dynamo/variables/` (#…
Lucaskabela Oct 31, 2025
e404388
[dynamo, 3.14] fix segfault due to improper create_call_function_ex (…
williamwen42 Oct 31, 2025
d97144d
[5/N] Remove unused loop variables in tests (#166716)
cyyever Oct 31, 2025
93a70c7
Revert "Add CUDA MXFP4 scaled mm support via. FBGEMM (#166526)"
pytorchmergebot Oct 31, 2025
4e7232c
[MPS] Fix `smooth_l1_loss` backward for fp16 (#166687)
malfet Oct 31, 2025
b09fb48
[CD] Upgrade GCC version to 13 for XPU build (#162474)
chuanqi129 Oct 31, 2025
dfebdca
[GraphPartition] cache get_free_symbol_uses (#166338)
BoyuanFeng Oct 31, 2025
9970fb9
Fix Tril Triu SymInt (#166627)
parsshar-RH Oct 31, 2025
2699f54
Revert "[xpu][feature] Integrate OneDNN SDPA training forward/backwar…
pytorchmergebot Oct 31, 2025
5166743
[FlexFlash] Wire up mask_mod + blockmask to flash impl (#166359)
drisspg Oct 31, 2025
d80ae73
compile_worker: Make a timer class (#166465)
c00w Oct 31, 2025
9261a1f
[MPS] Error out when BatchNorm is called for Complex (#166215)
malfet Oct 31, 2025
fd5da81
[AI Codemod][DevmateFBSourceTestFailureBot] Fix for T243177299 ("Your…
pdesupinski Oct 31, 2025
8d59904
add shape check for avg_pool2d (#161952)
jiayisunx Oct 30, 2025
83cc38d
[precompile] Preserve default arguments for dynamo capture (#166654)
zhxchen17 Nov 1, 2025
e2dc32f
Replace decltype(auto) with auto (#166537)
cyyever Nov 1, 2025
f91899c
[2/N] Add strict parameter to Python zip calls (#166257)
cyyever Nov 1, 2025
3dc92d6
Remove setup-env instructions; it's confusing (#166749)
ezyang Nov 1, 2025
60333de
Revert "Remove setup-env instructions; it's confusing (#166749)"
pytorchmergebot Nov 1, 2025
e8fadba
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec mod…
XuehaiPan Nov 1, 2025
9d6597b
Correctly use test parameters (#166726)
cyyever Nov 1, 2025
4316df8
[3.14] Fix torch.package.importer (#166767)
malfet Oct 31, 2025
f0745dd
Replace c10::call_once with static initialization (#166381)
cyyever Nov 1, 2025
1aef88c
Avoid DDE in narrow with unbacked start (#166361)
laithsakka Oct 29, 2025
4cc64d6
[inductor] pre grad graph bisecting (#166344)
shunting314 Nov 1, 2025
b3861ac
[reland] Warn if AccumulateGrad stream does not match producer node s…
soulitzer Nov 1, 2025
84776e1
Make PT2 compile backprop through custom op without autograd key a ha…
ezyang Nov 1, 2025
3b5d38a
Fix comparing inductor actual strides vs bw graph for activations sho…
laithsakka Oct 29, 2025
82d86ba
[inductor] track reduction before splitting (#166053)
shunting314 Oct 31, 2025
13549e0
Revert "Avoid DDE in narrow with unbacked start (#166361)"
pytorchmergebot Nov 1, 2025
401c2f9
[FP8][H100][TF32] Disable tf32 for emulated reference computation in …
eqy Nov 1, 2025
82fafb3
Revert "Make PT2 compile backprop through custom op without autograd …
pytorchmergebot Nov 1, 2025
0d81bb7
[3/N] Use 'is' in callable comparisons (#166780)
cyyever Nov 1, 2025
764c54e
[DebugMode] dispatch call hooks (#166348)
pianpwk Oct 31, 2025
a663eb9
[FlexFlash] CuteDSL flat indexer needs to be colexigraphic in coordin…
drisspg Nov 1, 2025
0573747
[inductor] more aggressive mix order reduction (#166382)
shunting314 Oct 31, 2025
04d6a6f
[inductor] Make mix-order-reduction split size not depends on split-r…
shunting314 Oct 31, 2025
c3dc0c7
[Inductor] mix order reduction heuristics and tuning (#166585)
shunting314 Oct 31, 2025
a19e92d
report geomean for norm bwd benchmarking (#166675)
shunting314 Oct 31, 2025
9f9dbe0
add a curve for customized compilation in the kernel benchmarking scr…
shunting314 Oct 31, 2025
b7d348a
[vision hash update] update the pinned vision hash (#166771)
pytorchupdatebot Nov 2, 2025
0674e0a
Fix: list index out of range with softmax when using 0 dim (#166547)
krastogi-in Nov 2, 2025
f013e80
[user-streams] Fix stream graph output semantics (#164819)
mlazos Nov 2, 2025
bc03d7c
[user-streams] Add current stream source (#165211)
mlazos Nov 2, 2025
cee0363
[user-streams] Track symbolic current stream (#165212)
mlazos Nov 2, 2025
76780b1
[user-streams] Handle returning the current stream with/without devic…
mlazos Nov 2, 2025
d962bed
[user-streams] Add basic stream tests (#164523)
mlazos Nov 2, 2025
18f4259
[dynamo] Remove retrieving objects by ID (#162905)
mlazos Nov 2, 2025
e471800
[user-streams] cleanup StreamVariable signature (#166471)
mlazos Nov 2, 2025
2986666
[user-streams] Switch to fx annotations at trace time (#166472)
mlazos Nov 2, 2025
5e05a0a
Revert "Fix: list index out of range with softmax when using 0 dim (#…
pytorchmergebot Nov 2, 2025
bb54296
Fix source_fn_stack being None (#166728)
tugsbayasgalan Nov 2, 2025
6c7cad6
Use Python 3.10 typing (#148418)
cyyever Nov 2, 2025
23b57a4
Remove setup-env instructions; it's confusing (#166749)
ezyang Nov 2, 2025
c8adc08
[Fix] Optimize max unpooling index validation using aminmax (#165394)
lingebeng Nov 2, 2025
16212f0
[Sparse] support for exp op (#166801)
Isalia20 Nov 2, 2025
6268883
[MPS] Refactor `torch.cat` and add fast path for contiguous inputs (#…
kurtamohler Oct 30, 2025
9c22bbb
Add min/max support for barebones uint types (#166813)
ezyang Nov 2, 2025
3ca216a
Add claude skills for uint support and AT_DISPATCH_V2 (#166814)
ezyang Nov 2, 2025
7c203b8
[BE] Using std::move to reduce copy constructor calls by one. (#163599)
thenumberouscode Nov 2, 2025
3eddf04
Revert "Add min/max support for barebones uint types (#166813)"
pytorchmergebot Nov 2, 2025
3b43159
[export] Fix static_input_indices for aot_export_joint (#166761)
angelayi Nov 3, 2025
4a7fefd
[dynamo] fix pos-only names should can be collected in `**kwargs` (#1…
XuehaiPan Nov 2, 2025
fee1ac9
[DebugMode] add stack traces (#166440)
pianpwk Nov 2, 2025
392acee
[6/N] Remove unused loop variables in tests (#166785)
cyyever Nov 3, 2025
1c4ced2
[2/N] Correctly use test parameters (#166783)
cyyever Nov 3, 2025
69fb3eb
Fix: type promotion in FakeTensor (#166522)
krastogi-in Nov 3, 2025
a5f0007
torch.cond supports autograd now (#165908)
ezyang Nov 3, 2025
5a3930a
Revert "Back out "Do not decompose in functionalization/proxy tensor …
ezyang Nov 2, 2025
3f54010
[3/N] Add clang-tidy readability checks (#164692)
cyyever Nov 3, 2025
e1d011d
[2/N] Change C-style casts to static_cast or reinterpret_cast (#165891)
cyyever Nov 3, 2025
e0791fc
Give full Dynamo stack traces in CI (#160417)
ezyang Aug 31, 2025
9501405
[caffe2] Ignore -Wswitch-enum warnings (#166760)
NSProgrammer Nov 3, 2025
061fa73
Reapply "Back out "Do not decompose in functionalization/proxy tensor…
pytorchmergebot Nov 3, 2025
defac66
[xla hash update] update the pinned xla hash (#166845)
pytorchupdatebot Nov 3, 2025
ae038f8
[inductor] Collectives estimations: option to use nccl estimator for …
IvanKobzarev Nov 3, 2025
a4077b5
Revert "[MPS] Error out when BatchNorm is called for Complex (#166215)"
pytorchmergebot Nov 3, 2025
5d62307
Revert "Give full Dynamo stack traces in CI (#160417)"
pytorchmergebot Nov 3, 2025
1656b25
Revert "[MPS] Fix `smooth_l1_loss` backward for fp16 (#166687)"
pytorchmergebot Nov 3, 2025
61bcc8d
Revert "Fixes torch.compile(nn.ModuleList()) changes bool() behavior …
pytorchmergebot Nov 3, 2025
d177900
[Code Clean] Clean asserts in torch/ao/quantization (root, quantizer,…
zhudada0120 Nov 3, 2025
a2da693
Remove nightly pth check from pyrefly (#166857)
ezyang Nov 3, 2025
76bb27e
Revert "Back out "Do not decompose in functionalization/proxy tensor …
ezyang Nov 2, 2025
335b5c7
Avoid std::copy_n in CopyKernel and IndexKernel (#143544)
cyyever Nov 3, 2025
73da7a4
[MPS] Error out when BatchNorm is called for Complex (#166215)
malfet Nov 3, 2025
f33abae
Switch to pyrefly as only type checker (#166197)
maggiemoss Nov 3, 2025
3f6538f
Remove tools from BC linter (#166858)
ezyang Nov 3, 2025
94f2657
[Inductor] addmm with bias -> unfuse bias if there is a pointwise/red…
nikitaved Nov 3, 2025
104b868
Fix build error by checking cuda version in CUDAGreenContext (#166800)
irshadcc Nov 3, 2025
984b096
[ROCm][CI] Change rocm.yml and inductor-rocm.yml cron schedule to run…
amdfaa Nov 3, 2025
f3fa560
Integrate NVIDIA cuSolver backend into ATen/Linalg (initial implement…
johannesz-codes Nov 3, 2025
7b29926
Update test jobs in pull workflow to c7i (#165646)
zxiiro Nov 3, 2025
5b17ef3
Update docs-build to c7i (#166727)
zxiiro Nov 3, 2025
bcad4f2
[FSDP][Replicate] final version integrating 1D device mesh replicate …
anshul-si Oct 28, 2025
d67d807
[FSDP][Replicate] added two replicate overload declarations and chang…
anshul-si Oct 28, 2025
2f3f88f
Revert "[FSDP][Replicate] added two replicate overload declarations a…
pytorchmergebot Nov 3, 2025
fa0fd6b
Revert "[FSDP][Replicate] final version integrating 1D device mesh re…
pytorchmergebot Nov 3, 2025
aa4a8c9
[Inductor][Triton][FP8] Support tile-wise (1x128) scaling in Inductor…
jananisriram Nov 3, 2025
e3bd7bd
[FP8] Enable FP16 output support for torch scaled_mm when using CUTLA…
oyye Nov 3, 2025
c761999
Avoid DDE in narrow with unbacked start (#166361)
laithsakka Nov 2, 2025
71a2e93
[cuDNN][SDPA] Check-in test for #166211 (#166570)
eqy Nov 3, 2025
3af1f7b
[easy][MTIAGraph][Pytorch] clang-format files (#166805)
andyanwang Nov 3, 2025
612ead1
[distributed] Replace assert statements with AssertionError exception…
RohitRathore1 Nov 3, 2025
ee1bc3f
Manylinux ROCm docker images. use devtoolset-13 (#166764)
atalman Nov 3, 2025
68e31e2
[CUDA] Skip pynvml test on platforms that don't have complete support…
eqy Nov 3, 2025
c10975d
Revert "Avoid DDE in narrow with unbacked start (#166361)"
pytorchmergebot Nov 3, 2025
5125872
Fix unused assignments (#166791)
cyyever Nov 3, 2025
83cd626
[opaque_obj_v2] make_fx support (#165005)
angelayi Nov 3, 2025
77b9399
[random] Add `generator` arg to `rand*_like` APIs (#166160)
KarhouTam Nov 3, 2025
3a38ec7
[inductor] Expand use of generic benchmark function (#164938)
kundaMwiza Nov 3, 2025
6725ee8
Fix cuda blas build error due to extra && (#166811)
irshadcc Nov 3, 2025
b8855e7
Add conv ops to operator microbenchmark (#166331)
jainapurva Nov 3, 2025
01d8d85
[MTIAGraph][Pytorch][2.1/n] Add API to destroy graph C++ instance (#1…
andyanwang Nov 3, 2025
27cfdd9
[export] Return more information from tracing context in graph captur…
zhxchen17 Nov 1, 2025
7d1b976
[export] Make dict_keys_getitem tracable. (#166776)
zhxchen17 Nov 1, 2025
11f73d7
[export] Downgrade captured buffers as normal constants. (#166777)
zhxchen17 Nov 1, 2025
eea8ff2
Fix torch.full with dynamic tensor fill_value in torch.compile (#166554)
amaldevh Nov 3, 2025
86b2d82
Revert "[Inductor] addmm with bias -> unfuse bias if there is a point…
pytorchmergebot Nov 3, 2025
6c98657
Add some Triton related suppressions that don't show on CI (#166868)
ezyang Nov 3, 2025
2b7e4c3
[DCP] Add option to use PrefixStore to create checkpoint background p…
kevinmtang Nov 3, 2025
616314c
[FSDP][Replicate] final version integrating 1D device mesh replicate …
anshul-si Nov 3, 2025
5048e47
explicitly remove call_mod_node_to_replace after inlining the submodu…
jazlyn5 Nov 3, 2025
d944279
[FSDP][Replicate] added two replicate overload declarations and chang…
anshul-si Nov 3, 2025
7b64ad9
[FSDP][Replicate] got rid of reshard_after_forward and updated test c…
anshul-si Nov 3, 2025
5c89bdb
[MPS] Fix `smooth_l1_loss` backward for fp16 (#166687)
malfet Nov 3, 2025
665a411
Revert "[CUDA] Skip pynvml test on platforms that don't have complete…
pytorchmergebot Nov 4, 2025
79ff2c6
Revert "Fix unused assignments (#166791)"
pytorchmergebot Nov 4, 2025
64819e3
[Pytorch] Improve conversion from bf16 on aarch64/NEON (#166880)
Nicoshev Nov 4, 2025
ee708ea
fix test_type_hints (#163150)
parsshar-RH Nov 4, 2025
22a7457
Remove ifndef C10_MOBILE around aoti_torch_abi_version impl (#166882)
janeyx99 Nov 4, 2025
e1fc01b
Enable clang-tidy on some excluded headers (#166835)
cyyever Nov 4, 2025
f92834d
Fix unused assignments (#166791)
cyyever Nov 4, 2025
7551507
[BE][Typing][Dynamo] Type torch/_dynamo/variables/builtin.py (#166745)
Lucaskabela Nov 4, 2025
0958f30
Add `_heapq` polyfill (#161093)
guilhermeleobas Nov 3, 2025
a0a8eca
Fixes torch.compile(nn.ModuleList()) changes bool() behavior (#159208)
i3hz Nov 4, 2025
c21868b
[inductor] require shape in TritonCSEVariable (#162275)
isuruf Sep 30, 2025
864633f
[xpu][test] Enable test_fxir_backend tests for XPU (#166493)
Stonepia Nov 4, 2025
f288433
[dynamo] Raise on as_python_constant error on getattr (#166909)
anijain2305 Nov 3, 2025
40133fe
Fix MSCV C++ compilation error of `pycore_stackref.h` header (#165686)
Nov 4, 2025
eec3749
[DebugMode] .fwd_stack_trace for autograd bwd ops (#166842)
pianpwk Nov 4, 2025
875b18d
[xpu][feature] Introduce ExpandableSegment for XPU (#166299)
guangyey Oct 31, 2025
167e64b
[xpu][feature] Support expandable segment feature for XPU (#166292)
guangyey Oct 31, 2025
f70faf2
[xpu][feature] Introduce PeerToPeerAccess API for XPU (#166424)
guangyey Oct 31, 2025
24aa9a2
[ROCm][CI] Add distributed testing back to trunk.yml (#166915)
jithunnair-amd Nov 4, 2025
888efcc
[dynamo, 3.14] support tracing type.__dict__[__annotations__].__get__…
williamwen42 Nov 3, 2025
ba72c6b
[dynamo, 3.14] fix dynamo error message test for 3.14 (#166894)
williamwen42 Nov 3, 2025
344cebd
[dynamo, 3.14] disable cpython dynamo unittests if 3.14 (#166895)
williamwen42 Nov 3, 2025
55be1cc
[dynamo, 3.14] add explicit SymFloat int conversion (#166902)
williamwen42 Nov 3, 2025
a6c6ace
[11/N] Apply ruff UP035 rule (#166225)
cyyever Nov 4, 2025
3232caa
[XPU][Fix] Register convolution_overrideable for flops count (#166839)
Stonepia Nov 4, 2025
0e1a889
[Inductor][Grouped Gemm] Add Blackwell CuTeDSL Kernel (#165036)
NikhilAPatel Nov 4, 2025
d3cf90a
Revert "[inductor] require shape in TritonCSEVariable (#162275)"
pytorchmergebot Nov 4, 2025
c7d00de
[xpu][fix] Fix XPU oneDNN memory query bug: pointer to array (#166830)
guangyey Nov 4, 2025
d980d8d
[dynamo] Implement __sym_float__ for SymBool to fix multiplication Ty…
Flink-ddd Nov 4, 2025
09e0285
[xpu][feature][inductor] Enable decompose_mm_pass and UT on Intel GPU…
jianyizh Nov 4, 2025
82fa2aa
DTensor: Fix trivial as_strided case, add alias support (#166867)
ezyang Nov 3, 2025
8fff7e3
[xpu][test] Add UT for expandable segments (#166495)
guangyey Oct 31, 2025
c45b156
Fix DeepSeek scaling tensor handling (#166752)
slayton58 Nov 3, 2025
cc8bfd1
Docker release build: Use 13.0.0 nvidia docker (#166904)
atalman Nov 4, 2025
24db5c4
[inductor] do not hard fail on FakePG with nccl estimator (#166869)
IvanKobzarev Nov 3, 2025
223b9c5
Merge remote-tracking branch 'upstream/main' into develop_IFU_20251104
github-actions[bot] Nov 4, 2025
b4c1e1e
Fix merge conflict
pragupta Nov 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .bc-linter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ exclude:
- "**/benchmarks/**"
- "**/test_*.py"
- "**/*_test.py"
- "tools/**"
5 changes: 4 additions & 1 deletion .ci/docker/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -195,13 +195,16 @@ case "$tag" in
NINJA_VERSION=1.9.0
TRITON=yes
;;
pytorch-linux-jammy-xpu-n-py3)
pytorch-linux-jammy-xpu-n-py3 | pytorch-linux-jammy-xpu-n-py3-inductor-benchmarks)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
VISION=yes
XPU_VERSION=2025.2
NINJA_VERSION=1.9.0
TRITON=yes
if [[ $tag =~ "benchmarks" ]]; then
INDUCTOR_BENCHMARKS=yes
fi
;;
pytorch-linux-jammy-py3-gcc11-inductor-benchmarks)
ANACONDA_PYTHON_VERSION=3.10
Expand Down
2 changes: 1 addition & 1 deletion .ci/docker/common/install_acl.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

set -eux

ACL_VERSION=${ACL_VERSION:-"v25.02"}
ACL_VERSION=${ACL_VERSION:-"v52.6.0"}
ACL_INSTALL_DIR="/acl"

# Clone ACL
Expand Down
10 changes: 9 additions & 1 deletion .ci/docker/common/install_conda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,20 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
export SYSROOT_DEP="sysroot_linux-64=2.17"
fi

# Install correct Python version
# Also ensure sysroot is using a modern GLIBC to match system compilers
if [ "$ANACONDA_PYTHON_VERSION" = "3.14" ]; then
as_jenkins conda create -n py_$ANACONDA_PYTHON_VERSION -y\
python="3.14.0" \
${SYSROOT_DEP} \
-c conda-forge
else
# Install correct Python version
# Also ensure sysroot is using a modern GLIBC to match system compilers
as_jenkins conda create -n py_$ANACONDA_PYTHON_VERSION -y\
python="$ANACONDA_PYTHON_VERSION" \
${SYSROOT_DEP}

fi
# libstdcxx from conda default channels are too old, we need GLIBCXX_3.4.30
# which is provided in libstdcxx 12 and up.
conda_install libstdcxx-ng=12.3.0 --update-deps -c conda-forge
Expand Down
4 changes: 0 additions & 4 deletions .ci/docker/common/install_rocm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,7 @@ EOF

# Default url values
rocm_baseurl="http://repo.radeon.com/rocm/apt/${ROCM_VERSION}"
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${ROCM_VERSION}/ubuntu"

# Add amdgpu repository
UBUNTU_VERSION_NAME=`cat /etc/os-release | grep UBUNTU_CODENAME | awk -F= '{print $2}'`
echo "deb [arch=amd64] ${amdgpu_baseurl} ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/amdgpu.list

# Add rocm repository
wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -
Expand Down
4 changes: 2 additions & 2 deletions .ci/docker/common/install_rocm_magma.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ function do_install() {

rocm_version_nodot=${rocm_version//./}

# https://github.com/icl-utk-edu/magma/pull/65
MAGMA_VERSION=d6e4117bc88e73f06d26c6c2e14f064e8fc3d1ec
# post merge of https://github.com/icl-utk-edu/magma/pull/65
MAGMA_VERSION=c0792ae825fb36872784892ea643dd6f3456bc5f
magma_archive="magma-rocm${rocm_version_nodot}-${MAGMA_VERSION}-1.tar.bz2"

rocm_dir="/opt/rocm"
Expand Down
2 changes: 1 addition & 1 deletion .ci/docker/manywheel/Dockerfile_2_28
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ FROM cpu_final as rocm_final
ARG ROCM_VERSION=6.0
ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
ARG DEVTOOLSET_VERSION=11
ARG DEVTOOLSET_VERSION=13
ENV LDFLAGS="-Wl,-rpath=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib64 -Wl,-rpath=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib"
# Somewhere in ROCm stack, we still use non-existing /opt/rocm/hip path,
# below workaround helps avoid error
Expand Down
2 changes: 1 addition & 1 deletion .ci/docker/manywheel/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ case ${image} in
manylinux2_28-builder:xpu)
TARGET=xpu_final
GPU_IMAGE=amd64/almalinux:8
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=11"
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=13"
MANY_LINUX_VERSION="2_28"
;;
*)
Expand Down
23 changes: 14 additions & 9 deletions .ci/docker/requirements-ci.txt
Original file line number Diff line number Diff line change
Expand Up @@ -136,10 +136,11 @@ numba==0.61.2 ; python_version > "3.9"
#test_nn.py, test_namedtensor.py, test_linalg.py, test_jit_cuda_fuser.py,
#test_jit.py, test_indexing.py, test_datapipe.py, test_dataloader.py,
#test_binary_ufuncs.py
numpy==2.0.2 ; python_version == "3.9"
numpy==2.1.2 ; python_version > "3.9"
numpy==2.1.2; python_version > "3.9" and python_version < "3.14"
numpy==2.3.4; python_version >= "3.14"

pandas==2.2.3
pandas==2.2.3; python_version >= "3.9" and python_version < "3.14"
pandas==2.3.3; python_version >= "3.14"

#onnxruntime
#Description: scoring engine for Open Neural Network Exchange (ONNX) models
Expand All @@ -151,7 +152,8 @@ opt-einsum==3.3
#Pinned versions: 3.3
#test that import: test_linalg.py

optree==0.13.0
optree==0.13.0 ; python_version < "3.14"
optree==0.17.0 ; python_version >= "3.14"
#Description: A library for tree manipulation
#Pinned versions: 0.13.0
#test that import: test_vmap.py, test_aotdispatch.py, test_dynamic_shapes.py,
Expand Down Expand Up @@ -249,8 +251,8 @@ scikit-image==0.22.0
#Pinned versions: 0.20.3
#test that import:

scipy==1.13.1 ; python_version == "3.9"
scipy==1.14.1 ; python_version > "3.9"
scipy==1.14.1 ; python_version > "3.9" and python_version < "3.14"
scipy==1.16.2 ; python_version >= "3.14"
# Pin SciPy because of failing distribution tests (see #60347)
#Description: scientific python
#Pinned versions: 1.10.1
Expand Down Expand Up @@ -321,7 +323,8 @@ pywavelets==1.7.0 ; python_version >= "3.12"
#Pinned versions: 1.4.1
#test that import:

lxml==5.3.0
lxml==5.3.0 ; python_version < "3.14"
lxml==6.0.2 ; python_version >= "3.14"
#Description: This is a requirement of unittest-xml-reporting

PyGithub==2.3.0
Expand All @@ -331,7 +334,9 @@ sympy==1.13.3
#Pinned versions:
#test that import:

onnx==1.19.1
onnx==1.19.1 ; python_version < "3.14"
# Unpin once Python 3.14 is supported. See onnxruntime issue 26309.
onnx==1.18.0 ; python_version == "3.14"
#Description: Required by onnx tests, and mypy and test_public_bindings.py when checking torch.onnx._internal
#Pinned versions:
#test that import:
Expand All @@ -356,7 +361,7 @@ pwlf==2.2.1
#test that import: test_sac_estimator.py

# To build PyTorch itself
pyyaml==6.0.2
pyyaml==6.0.3
pyzstd
setuptools==78.1.1
packaging==23.1
Expand Down
5 changes: 4 additions & 1 deletion .ci/docker/ubuntu-xpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,15 @@ ENV OPENSSL_DIR /opt/openssl
RUN rm install_openssl.sh

ARG INDUCTOR_BENCHMARKS
ARG ANACONDA_PYTHON_VERSION
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
COPY ./common/install_inductor_benchmark_deps.sh install_inductor_benchmark_deps.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/huggingface-requirements.txt huggingface-requirements.txt
COPY ci_commit_pins/timm.txt timm.txt
COPY ci_commit_pins/torchbench.txt torchbench.txt
RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface-requirements.txt
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface-requirements.txt torchbench.txt

# Install XPU Dependencies
ARG XPU_VERSION
Expand Down
2 changes: 1 addition & 1 deletion .ci/lumen_cli/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ dependencies = [
"GitPython==3.1.45",
"docker==7.1.0",
"pytest==7.3.2",
"uv==0.9.5"
"uv==0.9.6"
]

[tool.setuptools]
Expand Down
8 changes: 7 additions & 1 deletion .ci/magma-rocm/Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
SHELL=/usr/bin/env bash

DOCKER_CMD ?= docker
DESIRED_ROCM ?= 7.0
DESIRED_ROCM ?= 7.1
DESIRED_ROCM_SHORT = $(subst .,,$(DESIRED_ROCM))
PACKAGE_NAME = magma-rocm
# inherit this from underlying docker image, do not pass this env var to docker
Expand All @@ -16,6 +16,7 @@ DOCKER_RUN = set -eou pipefail; ${DOCKER_CMD} run --rm -i \
magma-rocm/build_magma.sh

.PHONY: all
all: magma-rocm71
all: magma-rocm70
all: magma-rocm64

Expand All @@ -24,6 +25,11 @@ clean:
$(RM) -r magma-*
$(RM) -r output

.PHONY: magma-rocm71
magma-rocm71: DESIRED_ROCM := 7.1
magma-rocm71:
$(DOCKER_RUN)

.PHONY: magma-rocm70
magma-rocm70: DESIRED_ROCM := 7.0
magma-rocm70:
Expand Down
6 changes: 3 additions & 3 deletions .ci/magma-rocm/build_magma.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ set -eou pipefail
# The script expects DESIRED_CUDA and PACKAGE_NAME to be set
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"

# https://github.com/icl-utk-edu/magma/pull/65
MAGMA_VERSION=d6e4117bc88e73f06d26c6c2e14f064e8fc3d1ec
# post merge of https://github.com/icl-utk-edu/magma/pull/65
MAGMA_VERSION=c0792ae825fb36872784892ea643dd6f3456bc5f

# Folders for the build
PACKAGE_FILES=${ROOT_DIR}/magma-rocm/package_files # metadata
Expand All @@ -20,7 +20,7 @@ mkdir -p ${PACKAGE_DIR} ${PACKAGE_OUTPUT}/linux-64 ${PACKAGE_BUILD} ${PACKAGE_RE

# Fetch magma sources and verify checksum
pushd ${PACKAGE_DIR}
git clone https://github.com/jeffdaily/magma
git clone https://github.com/icl-utk-edu/magma
pushd magma
git checkout ${MAGMA_VERSION}
popd
Expand Down
2 changes: 1 addition & 1 deletion .ci/pytorch/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,7 @@ fi
if [[ "$BUILD_ENVIRONMENT" != *libtorch* && "$BUILD_ENVIRONMENT" != *bazel* ]]; then
# export test times so that potential sharded tests that'll branch off this build will use consistent data
# don't do this for libtorch as libtorch is C++ only and thus won't have python tests run on its build
python tools/stats/export_test_times.py
PYTHONPATH=. python tools/stats/export_test_times.py
fi
# don't do this for bazel or s390x or riscv64 as they don't use sccache
if [[ "$BUILD_ENVIRONMENT" != *s390x* && "$BUILD_ENVIRONMENT" != *riscv64* && "$BUILD_ENVIRONMENT" != *-bazel-* ]]; then
Expand Down
10 changes: 7 additions & 3 deletions .ci/pytorch/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,7 @@ test_python() {

test_python_smoke() {
# Smoke tests for H100/B200
time python test/run_test.py --include test_matmul_cuda test_scaled_matmul_cuda inductor/test_fp8 inductor/test_max_autotune $PYTHON_TEST_EXTRA_OPTION --upload-artifacts-while-running
time python test/run_test.py --include test_matmul_cuda test_scaled_matmul_cuda inductor/test_fp8 inductor/test_max_autotune inductor/test_cutedsl_grouped_mm $PYTHON_TEST_EXTRA_OPTION --upload-artifacts-while-running
assert_git_not_dirty
}

Expand Down Expand Up @@ -572,6 +572,8 @@ fi

if [[ "${TEST_CONFIG}" == *cpu* ]]; then
DYNAMO_BENCHMARK_FLAGS+=(--device cpu)
elif [[ "${TEST_CONFIG}" == *xpu* ]]; then
DYNAMO_BENCHMARK_FLAGS+=(--device xpu)
else
DYNAMO_BENCHMARK_FLAGS+=(--device cuda)
fi
Expand Down Expand Up @@ -665,6 +667,8 @@ test_perf_for_dashboard() {
device=cuda_b200
elif [[ "${TEST_CONFIG}" == *rocm* ]]; then
device=rocm
elif [[ "${TEST_CONFIG}" == *xpu* ]]; then
device=xpu
fi

for mode in "${modes[@]}"; do
Expand Down Expand Up @@ -1649,7 +1653,7 @@ test_operator_microbenchmark() {

cd "${TEST_DIR}"/benchmarks/operator_benchmark

for OP_BENCHMARK_TESTS in matmul mm addmm bmm; do
for OP_BENCHMARK_TESTS in matmul mm addmm bmm conv; do
$TASKSET python -m pt.${OP_BENCHMARK_TESTS}_test --tag-filter long \
--output-json-for-dashboard "${TEST_REPORTS_DIR}/operator_microbenchmark_${OP_BENCHMARK_TESTS}_compile.json" \
--benchmark-name "PyTorch operator microbenchmark" --use-compile
Expand Down Expand Up @@ -1757,7 +1761,7 @@ elif [[ "${TEST_CONFIG}" == *torchbench* ]]; then
else
# Do this after checkout_install_torchbench to ensure we clobber any
# nightlies that torchbench may pull in
if [[ "${TEST_CONFIG}" != *cpu* ]]; then
if [[ "${TEST_CONFIG}" != *cpu* && "${TEST_CONFIG}" != *xpu* ]]; then
install_torchrec_and_fbgemm
fi
PYTHONPATH=/torchbench test_dynamo_benchmark torchbench "$id"
Expand Down
2 changes: 2 additions & 0 deletions .clang-tidy
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,11 @@ performance-*,
readability-container-size-empty,
readability-delete-null-pointer,
readability-duplicate-include,
readability-named-parameter,
readability-misplaced-array-index,
readability-redundant*,
readability-simplify-subscript-expr,
readability-static-definition-in-anonymous-namespace
readability-string-compare,
-readability-redundant-access-specifiers,
-readability-redundant-control-flow,
Expand Down
Loading