Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
231 commits
Select commit Hold shift + click to select a range
d91269e
Revert "[ROCm] enable fastSpecializedAtomicAdd for gfx950 (#167661)"
pytorchmergebot Nov 18, 2025
57927a6
[Profiler] Deprecate export_memory_timeline method (#168036)
sraikund16 Nov 18, 2025
20cae80
`ComplexTensor` subclass (#167621)
hameerabbasi Nov 18, 2025
0e13964
[CI] Disable ET tests (again) (#168090)
malfet Nov 18, 2025
5333e51
[CUDA][Thor] Enable CUTLASS matmuls on Thor (#164836)
Aidyn-A Nov 18, 2025
d1f6dd6
distributed/debug: add an HTTP server for debugging running jobs (#16…
d4l3k Nov 18, 2025
aa22d41
[refcycle-logger] Output tensor size in the refcycle visualization (#…
czardoz Nov 18, 2025
14f370f
[xpu][test] port some distributed tensor test files for Intel GPU (#1…
wincent8 Nov 18, 2025
e3c5b78
small changes (#167852)
eellison Nov 18, 2025
4c5042b
Fix all gather bucketing fusion in of dtype casts (#167853)
eellison Nov 18, 2025
dda2cb3
Handled erased hiding nodes from dtype bucketing (#167863)
eellison Nov 18, 2025
7921c0e
[ROCm][CI] Limit caching to ROCm jammy docker images (#168088)
jithunnair-amd Nov 18, 2025
ae85307
huber_loss numerical issue (#166952)
jturney Nov 18, 2025
ebb2001
[codemod][lowrisk] Remove unused exception parameter from caffe2/torc…
r-barnes Nov 18, 2025
41999a5
Fix Tensor use_count check in VariableType.cpp (#168060)
colesbury Nov 18, 2025
e8970ba
[CI] Migrate all gcc9 jobs to gcc11 (#167933)
malfet Nov 18, 2025
dc4f3c7
[MPS] Move `elu` impl to Metal (#166903)
kurtamohler Nov 4, 2025
1efc14a
[ROCm][CI] Update concurrency setting for docker-cache-rocm.yml (#168…
jithunnair-amd Nov 19, 2025
a4e0720
typo corrected in type.cpp (#167907)
RajeshvShiyal Nov 19, 2025
a369a56
[ROCm][CI] forward fix libtorch agnostic tests (#168087)
jeffdaily Nov 19, 2025
878757c
[CI][CUDA] Unskip nvshmem triton tests (#167760)
nWEIdia Nov 19, 2025
c8d790b
[xpu][fix] Fix empty cache on mempool (#168074)
guangyey Nov 18, 2025
8f16199
Fix stable ABI to/from deprecation warnings. Add my_shape test. (#167…
pearu Nov 18, 2025
b8a3165
[2/3][XPU][feature] The implementation of MemPool for XPU (#166833)
majing921201 Nov 19, 2025
cdca10b
[AOTI] Fix a GPU memory leak caused by reference circle (#168063)
desertfire Nov 18, 2025
cea8678
[CD] Add `cuda-bindings` dependency to CUDA wheels (#167769)
malfet Nov 13, 2025
13ec55d
Update AGENTS.md (#168111)
oulgen Nov 19, 2025
65f08ee
[MPS][1/N] Fix unsupported dtypes error checking for some MPS ops (#1…
malfet Nov 19, 2025
d48cae9
Shrink binary size (#168080)
colesbury Nov 19, 2025
6fc4306
Improve build logic in activities for kineto (#167204)
guangyey Nov 6, 2025
28c7602
[vision hash update] update the pinned vision hash (#168130)
pytorchupdatebot Nov 19, 2025
f49833d
[hoo] Invoke subgraph + effect (#167231)
angelayi Nov 18, 2025
789240b
[invoke_subgraph] Don't run the graph twice when autograd enabled (#1…
angelayi Nov 18, 2025
9abc9aa
fix: use grad div factor when fsdp_degree=1 (#167178)
garrett361 Nov 19, 2025
1c0bf2a
[CUDA][Complex] Bump tolerances for `TestFFTCUDA.test_reference_nd__r…
eqy Nov 19, 2025
a5f36a8
[DTensor] Fix deadlock after fast cache clear (#168069)
zpcore Nov 19, 2025
e5a766e
[user-streams] Insert backward syncs (#167747)
mlazos Nov 18, 2025
9f94c7b
[fix] Assign CUDAEvent external member properly (#167711)
guangyey Nov 19, 2025
7a963ff
LocalTensor for random_ops tests (#166540)
dolpm Nov 19, 2025
be33b7f
[DeviceMemory] Add Basic Statistics to Device Memory in OpenReg (#166…
licy666 Nov 19, 2025
8f4dc30
Hide all symbols (except stable/headeronly/shim) if TORCH_STABLE_ONLY…
mikaylagawarecki Nov 18, 2025
a0ccd3e
Error when non stable/headeronly/shim headers are included by stable …
mikaylagawarecki Nov 18, 2025
5abb7bf
Revert "[SymmMem] Skip multicast init if any CUDA call fails (#168049)"
pytorchmergebot Nov 19, 2025
c7cf3fb
Revert "[pytree][compile] Slightly faster TreeSpec init (#168024)"
pytorchmergebot Nov 19, 2025
eefc0f8
Fix link for core maintainers request form (#168089)
albanD Nov 19, 2025
962f13f
[compile][to_local] Support Sequence-like placement user defined obje…
anijain2305 Nov 19, 2025
fb6af11
GroupNorm: include offending values in error message; add test (#167925)
abhitorch81 Nov 19, 2025
0d7ba97
[dynamo][compile time] Special case for torch.utils._pytree._get_node…
anijain2305 Nov 19, 2025
7a92839
[MPS] permute op for sparse tensors (#168154)
Isalia20 Nov 19, 2025
a097e16
Revert "Error when non stable/headeronly/shim headers are included by…
pytorchmergebot Nov 19, 2025
ce9377d
[BE] Remove erroneous `const_cast` (#168165)
malfet Nov 19, 2025
a8ccc4e
[dynamo][pytree][compile time] Specialize tree_is_leaf (#168070)
anijain2305 Nov 19, 2025
2e1821b
Support AC in default partitioner when functionalization is enabled (…
soulitzer Nov 18, 2025
acf5b20
Revert "Hide all symbols (except stable/headeronly/shim) if TORCH_STA…
pytorchmergebot Nov 19, 2025
6c02dde
Introduce missing collectives and small fixes to support local tensor…
dzmitry-huba Nov 19, 2025
f9724db
[torch.onnx.export] Fix onnx export on big endian machines (#167816)
tungld Nov 19, 2025
607e2e7
[Distributed] Fix @parametrize on unordered iterable in distributed t…
eqy Nov 19, 2025
5b35cf1
[DTensor][ops] adding aten.std.correction propagation rule (#168057)
anshul-si Nov 18, 2025
f6cde6e
Fix tensor -> scalar variant swap (#168007)
georgiaphillips Nov 19, 2025
c566552
[DebugMode] wait before hashing collectives by default (#168119)
pianpwk Nov 19, 2025
84a7a34
[FlexFlash] Specify lowering w/ new `BACKEND` kernel option (#168017)
drisspg Nov 19, 2025
3ecc137
[Caffe2] Improve AddMomentsVec and UpdateMomentsVec (#167664)
Nicoshev Nov 19, 2025
9c811b1
[export] Enable context manager returns for dynamo graph capture. (#1…
zhxchen17 Nov 19, 2025
fcc7841
[pytree][compile] Slightly faster TreeSpec init (#168024)
anijain2305 Nov 19, 2025
159aa44
Replace 2**31 with explicit int (#168046)
WongJohnson Nov 19, 2025
7bfe8b0
[codemod][lowrisk] Remove unused exception parameter from caffe2/torc…
r-barnes Nov 19, 2025
a5e9dce
[DTensor] Fix mypy on register_op_strategy (#167673)
wconstab Nov 19, 2025
c9d944b
[DTensor] Document some utils (#168113)
wconstab Nov 19, 2025
a4a5d03
Update linalg.norm to match numpy's handling of degenerate inputs (#1…
rtimpe Nov 19, 2025
90c57aa
conv: refactor for lookup table support (#167179)
coconutruben Nov 18, 2025
6461548
[vLLM] Update xformers and remove flashinfer-python (#168141)
huydhn Nov 20, 2025
cda1b8d
[FlexFlash] Blackwell fwd support (#167040)
drisspg Nov 19, 2025
a6bfe2d
Revert "[invoke_subgraph] Don't run the graph twice when autograd ena…
pytorchmergebot Nov 20, 2025
ca6175c
Revert "[hoo] Invoke subgraph + effect (#167231)"
pytorchmergebot Nov 20, 2025
771be8c
Revert "[inductor] fix the decision of inner reduction (#167697)"
pytorchmergebot Nov 20, 2025
f890837
Revert "dist: add list_keys to Store API (#167883)"
pytorchmergebot Nov 20, 2025
bc8da63
Move MemoryFormat/Layout to headeronly (#168034)
janeyx99 Nov 19, 2025
c055ebe
Change NamedTupleVariable implementation to subclass UserDefinedTuple…
morrison-turnansky Nov 20, 2025
192b96e
Revert "[AOTI] Fix a GPU memory leak caused by reference circle (#168…
pytorchmergebot Nov 20, 2025
9e9e8fa
[torch/utils/data] Update CODEOWNERS (#168172)
divyanshk Nov 20, 2025
bb4009a
[Inductor] Naive foreach autotune support (#162053)
jataylo Nov 20, 2025
c3320ed
[3.14] Add python version adjustment for frame count changes (#168190)
fxdawnn Nov 19, 2025
7a064ed
Revert "Change NamedTupleVariable implementation to subclass UserDefi…
pytorchmergebot Nov 20, 2025
34bb9c4
[AOTI] Fix unknown constant type for device-moved constants (#168138)
sevenEng Nov 20, 2025
9177d6e
[ROCm][CI] Add ROCm noble image caching to docker-cache-rocm.yml (#16…
jithunnair-amd Nov 20, 2025
9bca3c1
[ROCm][CI] Expand trunk.yml coverage for ROCm (#168162)
jithunnair-amd Nov 20, 2025
c614128
[DTensor] support Replicate -> Partial("avg") + support distribute_te…
tianyu-l Nov 20, 2025
6fa7791
Reland"Fix different seq length (#167481)" (#168144)
Microve Nov 20, 2025
25a64df
[ROCm] add torch.version.rocm, distinct from torch.version.hip (#168…
amd-sriram Nov 20, 2025
7ffa511
[Distributed] Optimize ND shard overlap detection (#167073)
mansiag05 Nov 20, 2025
21c11da
Improve OpenReg test coverage (#167819)
hipudding Nov 20, 2025
a6b6383
[ARM] Improve LLM performance & mem usage using int4-bf16 KleidiAI ke…
usamahz Nov 20, 2025
ae142ab
s390x: fix periodic tests build (#168001)
AlekseiNikiforovIBM Nov 20, 2025
6edf2aa
Revert "Improve build logic in activities for kineto (#167204)"
pytorchmergebot Nov 20, 2025
762273e
Move pointwise_scatter optimization to joint_graph stage from post_gr…
vinithakv Nov 20, 2025
bd883bb
Add basic spin linting documentation (#167227)
zklaus Nov 20, 2025
9d7f983
Add workflow regeneration to spin (#167551)
zklaus Nov 20, 2025
43acddb
Move c10/util/Deprecated.h to headeronly (#168173)
pearu Nov 19, 2025
7fff317
Revise stableivalue from/to deprecation (#168155)
pearu Nov 19, 2025
a01e8a2
[BE] Update xpu driver repo for CD used almalinux 8.10 (#157356)
chuanqi129 Nov 20, 2025
ba68238
[Inductor] Freeze layout for potentially padded strides in template a…
PaulZhang12 Nov 17, 2025
f4382d7
Fixes floor divide int min overflow issue (#166127)
arkadip-maitra Nov 20, 2025
dd89d2c
[DTensor] Document fast-path dispatch (#168192)
wconstab Nov 19, 2025
32b9260
Fixes remainder and fmod operation and makes it same as cuda (#165833)
arkadip-maitra Nov 20, 2025
29bd2dd
Fix: Remove incorrect non-negative validation for correction paramete…
parsshar-RH Nov 20, 2025
88d635c
Remove useless super() delegation (#168235)
cyyever Nov 20, 2025
f97c3fc
Re-enable ConvTranspose operator benchmarks for AArch64 (#166731)
fadara01 Oct 31, 2025
53a4b49
[Pipelining] Fix error log (#167668)
wconstab Nov 12, 2025
803d94b
Revert "[dynamo][pytree][compile time] Specialize tree_is_leaf (#1680…
pytorchmergebot Nov 20, 2025
9396e69
Revert "[dynamo][compile time] Special case for torch.utils._pytree._…
pytorchmergebot Nov 20, 2025
7bbbbca
Fix debug assertion in autograd_not_implemented_fallback.cpp (#168280)
colesbury Nov 20, 2025
2eccaf9
[submodule][inductor]Fix an AMD CPU max-autotune breakage (#168079)
desertfire Nov 18, 2025
4887c46
[ROCm] Fix HIP document url. (#168220)
jagadish-amd Nov 20, 2025
02df234
[varlen attn] batch invariance testing (#167865)
liangel-02 Nov 20, 2025
6644fd7
[inductor] make mix order reduction work with dynamic shapes (#168117)
shunting314 Nov 19, 2025
f7fc634
Revert "Allow BlockDescriptorOptions classes to be overridden In Trit…
pytorchmergebot Nov 20, 2025
b4f5472
[BE][Inductor] Move mm templates into separate files (#168179)
NikhilAPatel Nov 20, 2025
da7c609
[pytorch] Make clamp kernel branchless (#167889)
stashuk-olek Nov 20, 2025
e7a8520
Revert "Remove useless super() delegation (#168235)"
pytorchmergebot Nov 20, 2025
05b1119
Revert "conv: refactor for lookup table support (#167179)"
pytorchmergebot Nov 20, 2025
0ea545b
Add support to enable the oneDNN backend for RISC-V (#166602)
zhangfeiv0 Nov 20, 2025
c81f696
Skip _assert_scalar in default partitioner (#168289)
soulitzer Nov 20, 2025
a64613a
[doc] README add cmake prefix for non-conda env (#167714)
ai-easy-cpu Nov 20, 2025
ddde4b7
[user-streams] Refactor out event insertion for record_stream handlin…
mlazos Nov 20, 2025
2ca51b7
[user-streams] Refactor runtime estimation to reuse internal function…
mlazos Nov 20, 2025
3145177
Fix smoke test failure due to numpy import in Local Tensor (#168271)
dzmitry-huba Nov 20, 2025
0bd3f51
[3.14] Fix module 'torch' has no attribute 'f' (#168152)
azahed98 Nov 20, 2025
247f822
[Fix] Add generator and tensor variant signatures for `rand*_like()` …
KarhouTam Nov 20, 2025
4525340
[3.14] Use refcount difference for TestNumPyInterop.test_from_numpy_n…
azahed98 Nov 20, 2025
ed6d5ff
[precompile] nicer error message when caches are disabled (#168274)
bobrenjc93 Nov 20, 2025
b4f3c52
[dynamo][compile time] Special case for torch.utils._pytree._get_node…
anijain2305 Nov 20, 2025
63ce1fb
Improve build logic in activities for kineto (#167204)
guangyey Nov 20, 2025
c4a9414
overlap on non mms (#167864)
eellison Nov 20, 2025
7641553
better use of mem tracking (#168121)
eellison Nov 20, 2025
1328a02
bucketing compile time improve (#168122)
eellison Nov 20, 2025
5cb5718
Add public grouped_mm (#168298)
drisspg Nov 20, 2025
64904c2
[7/N] Use Python 3.10 typing (#167790)
cyyever Nov 21, 2025
9f10cb0
[BugFix] Fix incorrect usage of const_data_ptr in memcpy (#168233)
lingebeng Nov 21, 2025
064f80d
Smoke test numpy coverage in nightlies (#168270)
atalman Nov 21, 2025
7ebca68
[ROCm][CI] Move periodic-rocm-mi300 and inductor-rocm-mi300 to Ubuntu…
jithunnair-amd Nov 21, 2025
a7f3b10
[Full Inductor][Pytorch] Prevent decomposition and enable fallback of…
andyanwang Nov 21, 2025
2ae4b85
[3.14] Update profiler test (#168205)
rtimpe Nov 20, 2025
a60eb2d
fix philoxstate bad cast (#168310)
dolpm Nov 21, 2025
8ad78bb
Revert C++ fastpath dispatch path for DTensor (#168264)
ezyang Nov 20, 2025
d3ccb8f
Remove c10::is_pod (#166383)
cyyever Nov 21, 2025
056d263
Update numpy tests for python 3.11/3.12 (#168299)
rtimpe Nov 20, 2025
65b9892
Replace string with char for output (#168215)
cyyever Nov 21, 2025
6707dc8
Revert #154859 (#168297)
ngimel Nov 21, 2025
b026eb9
Fix EmbeddingBag when input is 2D and include_last_offset is True (#1…
BartlomiejStemborowski Nov 21, 2025
61cdf87
dist: add list_keys to Store API (#167883)
d4l3k Nov 21, 2025
6038c59
[simplefsdp] fix simplefsdp llama3 run (#168311)
ruisizhang123 Nov 21, 2025
2865672
Revert "Revert #154859 (#168297)"
pytorchmergebot Nov 21, 2025
4ee6b3d
[inductor] Use custom triton kernel subclass when available (#167456)
kundaMwiza Nov 21, 2025
8b0314d
Fix edge-data handling in cudaGraphNodeGetDependencies for CUDA 13 in…
eee4017 Nov 21, 2025
265a8bc
adding kwarg inputs handling in register sharding (#168249)
arkadip-maitra Nov 21, 2025
3b19eca
Fix cublasLtMatmul failure (#167873)
gderossi Nov 21, 2025
29eca30
Remove useless super() delegation (#168235)
cyyever Nov 21, 2025
7c57ee3
Add Pylint checks to linterrunner (#167421)
cyyever Nov 21, 2025
cf6d089
[3.14] Add check for __module__ to _SysImporter.whichmodule (#168189)
azahed98 Nov 21, 2025
c23a900
Revert "[Full Inductor][Pytorch] Prevent decomposition and enable fal…
pytorchmergebot Nov 21, 2025
d4de871
Revert #168264 + Python-side LRU cache when native op schema is not s…
ezyang Nov 21, 2025
5d34e5e
Fix unused gradient tracking to respect create_graph (#168295)
dsashidh Nov 21, 2025
a69d3cf
[BE] C++20 template instantiation adjustments (#168132)
malfet Nov 21, 2025
f6fb8dd
Use r7i.4xlarge for B200 build (#167078)
zxiiro Nov 21, 2025
008ac43
[Inductor XPU GEMM] Step 1/N: Refactor cutlass configuration. (#160174)
etaf Nov 21, 2025
7556637
[Inductor XPU GEMM] Step 2/N: Move out cutlass files from torch/_indu…
etaf Nov 21, 2025
f3b0686
Skipping few distributed tests for 2 GPU setups (#168265)
chinmaydk99 Nov 21, 2025
2d7ea6c
Add rocm-navi31 to the upload test stats file (#168359)
amdfaa Nov 21, 2025
1871a24
Add shim for getCurrentBlasHandle (#168276)
janeyx99 Nov 20, 2025
107ab1c
control_plane: add handler for WaitCounters (#167871)
d4l3k Nov 21, 2025
2f90404
[MPS] fix broadcasting issues for mul on sparse tensors (#168112)
Isalia20 Nov 21, 2025
28e8803
[MPS] enable sparse mm test (#168156)
Isalia20 Nov 21, 2025
e13220b
[CUDA] Update minimum NVIDIA driver version requirement in Green Cont…
eqy Nov 21, 2025
7717bba
Add template for add_overflows (#168035)
lucylq Nov 21, 2025
402968e
[cuDNN][TF32][DTensor][TEST] Turn off TF32 for DTensor conv test (#16…
eqy Nov 21, 2025
80b57a6
Add allgather_base and reduce_scatter_base collective implementations…
dzmitry-huba Nov 21, 2025
08bfadf
[DTensor] compute shape and offset for arbitrary _StridedShard (#168146)
weifengpy Nov 21, 2025
8f8082d
Fix memory leak test for SDPA op call (#168040)
jhavukainen Nov 21, 2025
739acb8
[dynamo, nested graph breaks] fix FOR_ITER iterator push and zip stri…
williamwen42 Nov 18, 2025
044143a
Fix `hash(Size([SymInt, ...]))` on Python 3.14+ (#168256)
guilhermeleobas Nov 20, 2025
82e9ae9
Forward fix numpy binary check after #168270 (#168374)
atalman Nov 21, 2025
b8e6823
Fix arg parser one pos arg (#163081)
cleonard530 Nov 21, 2025
9141f03
20x less memory use and 37.25% speedup in min_cut_rematerialization_p…
jmaczan Nov 21, 2025
a2d11eb
[FlexFlash] Add wiring for backwards (#168319)
drisspg Nov 21, 2025
81bfd50
Add warning for clearing profiler events at the end of each cycle (#1…
jiannanWang Nov 21, 2025
b1cd563
Revert #154859 (#168297)
ngimel Nov 21, 2025
c8b265f
[dynamo, nested graph breaks] Fix-nested-graph-break-suppression (#16…
parsshar-RH Nov 21, 2025
38b5a5e
Narrow the return type annotation in 'VariableTracker::call_obj_hasat…
krastogi-in Nov 21, 2025
d419a2f
[inductor] find benchmark scripts for r2r determinism unit test (#168…
shunting314 Nov 21, 2025
d4493c5
Add dynamic config generation for custom op autotuning (#167193)
tianrengao Nov 21, 2025
6c8c03c
Fix aot_compile typing. (#168320)
yyetim Nov 21, 2025
6d22819
[Inductor] Properly enlarge XBLOCK/set num_warps=1 for B200 inner per…
PaulZhang12 Nov 22, 2025
976abd8
[Inductor] Mix Order Reduction Heuristics (#168361)
PaulZhang12 Nov 22, 2025
5e4ca87
feat(pallas): add Pallas TPU backend (#167774)
yarongmu-google Nov 22, 2025
57d4e49
[inductor] Fix a user-defined Triton kernel output + .cpu() correctne…
desertfire Nov 21, 2025
7ec5c16
[inductor] Reduce cold compilation time caused by duplicated user-def…
desertfire Nov 21, 2025
69bcac8
[triton] Enable Triton kernel serialization for AOTI by adding dict a…
XueningXu Nov 22, 2025
24e1958
[dynamo] add torch._dynamo.set_recursion_limit to fix 3.12/3.13 Recur…
williamwen42 Nov 21, 2025
68921ac
[dynamo][guards] Log backend match recompilation reason (#168387)
anijain2305 Nov 21, 2025
95ae5a4
[dynamo][pytree][compile time] Specialize tree_is_leaf (#168070)
anijain2305 Nov 21, 2025
a9184a0
[DTensor] update redistribute_cost, add disable_graph_based_transform…
mori360 Nov 22, 2025
4909fd8
Move CUDAEvent to c10 (#158219)
guangyey Nov 21, 2025
9c5d972
[NativeRT] Fix out_t index handling in TritonKernel (#168384)
minjang Nov 22, 2025
112a4fa
Add string support for ABI stable custom ops (#168370)
janeyx99 Nov 21, 2025
9301432
Fix lints with newer triton (#168340)
jansel Nov 21, 2025
b565593
[dynamo] Add optree.tree_map microbenchmark (#168341)
jansel Nov 21, 2025
322ad30
[Flex] Fix symbolic shapes lowering (#168383)
drisspg Nov 22, 2025
a9cb5bc
[user-streams] Move some estimator utilities outside of distributed (…
mlazos Nov 22, 2025
1048ac9
Fix exit code condition for test_nan_assert (#167971)
Flamefire Nov 22, 2025
a3cc252
[dynamo] Special case handling for tree_map (#168342)
jansel Nov 22, 2025
9fa3e6e
[BugFix] Fix incorrect type hint. (#168892)
lingebeng Nov 22, 2025
2c204e6
Revert "[Inductor XPU GEMM] Step 2/N: Move out cutlass files from tor…
pytorchmergebot Nov 23, 2025
3f0d46c
Revert "[Inductor XPU GEMM] Step 1/N: Refactor cutlass configuration.…
pytorchmergebot Nov 23, 2025
1f34961
Revert "[inductor] Use custom triton kernel subclass when available (…
pytorchmergebot Nov 23, 2025
4fd97b4
Revert "[dynamo] add torch._dynamo.set_recursion_limit to fix 3.12/3.…
pytorchmergebot Nov 23, 2025
a1ab3a0
[audio hash update] update the pinned audio hash (#168315)
pytorchupdatebot Nov 23, 2025
19c34dd
[dynamo] Special case handling for tree_map_only (#168365)
jansel Nov 22, 2025
d3f61c1
[dynamo] Fix local test failures for dynamo/test_repros.py (#168893)
jansel Nov 22, 2025
cb3754f
[DTensor] Refactor strategy/rule registration into dedicated module (…
wconstab Nov 20, 2025
c740e85
[BE] Delete `missing_vXXX_neon headers (#168909)
malfet Nov 23, 2025
c9c8a85
Add optimizer tests in operator microbenchmarks (#168101)
jainapurva Nov 23, 2025
9a38bb8
[CUDA] Fix truncated error messages in cudaMallocAsync Allocator (#16…
galv Nov 24, 2025
dbe6124
[tutorial] typo fix, update torch.compiler_cudagraph_trees.md (#167713)
luyaor Nov 24, 2025
7833690
Removed deprecated `split_cat_fx_passes` (#167738)
hinriksnaer Nov 24, 2025
c91c92f
Replace thrust::tie with structure binding (#168943)
cyyever Nov 24, 2025
265397e
Remove unnecessary uses of thrust::tuple (#168936)
cyyever Nov 24, 2025
f1c49c9
Checking if the input is finite before calculation in lowering of pow…
krastogi-in Nov 24, 2025
1aaedbc
[dynamo][hops] Add xfail tests for side effects (#168394)
anijain2305 Nov 22, 2025
5ff187d
[Intel GPU] Update Intel Triton commit pin (#166436)
anmyachev Nov 24, 2025
654c5fb
Revert "bucketing compile time improve (#168122)"
pytorchmergebot Nov 24, 2025
ecdea86
Merge remote-tracking branch 'upstream/main' into develop_IFU_20251124
github-actions[bot] Nov 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
14 changes: 2 additions & 12 deletions .ci/docker/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -125,10 +125,10 @@ case "$tag" in
UCC_COMMIT=${_UCC_COMMIT}
TRITON=yes
;;
pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks)
pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-inductor-benchmarks)
CUDA_VERSION=12.8.1
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
GCC_VERSION=11
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
Expand All @@ -146,16 +146,6 @@ case "$tag" in
UCC_COMMIT=${_UCC_COMMIT}
TRITON=yes
;;
pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9)
CUDA_VERSION=12.8.1
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
TRITON=yes
;;
pytorch-linux-jammy-py3-clang12-onnx)
ANACONDA_PYTHON_VERSION=3.10
CLANG_VERSION=12
Expand Down
2 changes: 1 addition & 1 deletion .ci/docker/ci_commit_pins/triton-xpu.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1b0418a9a454b2b93ab8d71f40e59d2297157fae
aa01f5c2cd4db2b7bfa53ea98a1a8dfbd6d77c92
15 changes: 7 additions & 8 deletions .ci/docker/common/install_xpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -64,14 +64,13 @@ function install_ubuntu() {

function install_rhel() {
. /etc/os-release
if [[ "${ID}" == "rhel" ]]; then
if [[ ! " 8.8 8.10 9.0 9.2 9.3 " =~ " ${VERSION_ID} " ]]; then
echo "RHEL version ${VERSION_ID} not supported"
exit
fi
elif [[ "${ID}" == "almalinux" ]]; then
# Workaround for almalinux8 which used by quay.io/pypa/manylinux_2_28_x86_64
VERSION_ID="8.8"
if [[ ! " 8.8 8.10 9.0 9.2 9.3 " =~ " ${VERSION_ID} " ]]; then
echo "RHEL version ${VERSION_ID} not supported"
exit
fi
# Using testing channel for CD build
if [[ "${ID}" == "almalinux" ]]; then
XPU_DRIVER_VERSION="/testing"
fi

dnf install -y 'dnf-command(config-manager)'
Expand Down
3 changes: 3 additions & 0 deletions .ci/docker/requirements-ci.txt
Original file line number Diff line number Diff line change
Expand Up @@ -397,3 +397,6 @@ scikit-build==0.18.1
pyre-extensions==0.0.32
tabulate==0.9.0
#Description: These package are needed to build FBGEMM and torchrec on PyTorch CI

Jinja2==3.1.6
#Description: required for torch.distributed.debug
1 change: 0 additions & 1 deletion .ci/lumen_cli/cli/lib/core/vllm/vllm_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,6 @@ def __init__(self, args: Any):
self.VLLM_TEST_WHLS_REGEX = [
"xformers/*.whl",
"vllm/vllm*.whl",
"flashinfer-python/flashinfer*.whl",
]

def prepare(self):
Expand Down
6 changes: 4 additions & 2 deletions .ci/pytorch/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -1763,12 +1763,14 @@ test_operator_microbenchmark() {
mkdir -p "$TEST_REPORTS_DIR"
TEST_DIR=$(pwd)

test_inductor_set_cpu_affinity

cd benchmarks/operator_benchmark/pt_extension
python -m pip install .
python -m pip install . -v --no-build-isolation

cd "${TEST_DIR}"/benchmarks/operator_benchmark

for OP_BENCHMARK_TESTS in matmul mm addmm bmm conv; do
for OP_BENCHMARK_TESTS in matmul mm addmm bmm conv optimizer; do
$TASKSET python -m pt.${OP_BENCHMARK_TESTS}_test --tag-filter long \
--output-json-for-dashboard "${TEST_REPORTS_DIR}/operator_microbenchmark_${OP_BENCHMARK_TESTS}_compile.json" \
--benchmark-name "PyTorch operator microbenchmark" --use-compile
Expand Down
30 changes: 12 additions & 18 deletions .circleci/scripts/binary_linux_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,23 +31,6 @@ if [[ "$PACKAGE_TYPE" != libtorch ]]; then
export PATH="\${python_path}/bin:\$PATH"
fi

EXTRA_CONDA_FLAGS=""
NUMPY_PIN=""
PROTOBUF_PACKAGE="defaults::protobuf"

if [[ "\$python_nodot" = *310* ]]; then
# There's an issue with conda channel priority where it'll randomly pick 1.19 over 1.20
# we set a lower boundary here just to be safe
NUMPY_PIN=">=1.21.2"
PROTOBUF_PACKAGE="protobuf>=3.19.0"
fi

if [[ "\$python_nodot" = *39* ]]; then
# There's an issue with conda channel priority where it'll randomly pick 1.19 over 1.20
# we set a lower boundary here just to be safe
NUMPY_PIN=">=1.20"
fi

# Move debug wheels out of the package dir so they don't get installed
mkdir -p /tmp/debug_final_pkgs
mv /final_pkgs/debug-*.zip /tmp/debug_final_pkgs || echo "no debug packages to move"
Expand All @@ -66,12 +49,23 @@ fi
if [[ "$PACKAGE_TYPE" != libtorch ]]; then
if [[ "\$BUILD_ENVIRONMENT" != *s390x* ]]; then
pip install "\$pkg" --index-url "https://download.pytorch.org/whl/\${CHANNEL}/${DESIRED_CUDA}"
retry pip install -q numpy protobuf typing-extensions

# numpy tests:
# We test 1 version no numpy. 1 version with numpy 1.x and rest with numpy 2.x
if [[ "\$python_nodot" = *311* ]]; then
retry pip install -q numpy==1.23.5 protobuf typing-extensions
elif [[ "\$python_nodot" = *312* ]]; then
retry pip install -q protobuf typing-extensions
else
retry pip install -q numpy protobuf typing-extensions
fi

else
pip install "\$pkg"
retry pip install -q numpy protobuf typing-extensions
fi
fi

if [[ "$PACKAGE_TYPE" == libtorch ]]; then
pkg="\$(ls /final_pkgs/*-latest.zip)"
unzip "\$pkg" -d /tmp
Expand Down
2 changes: 1 addition & 1 deletion .github/ci_commit_pins/audio.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ee1a1350eb37804b94334768f328144f058f14e9
32ce8c011855adb15438ddc9bf6c139d23f8cee5
2 changes: 1 addition & 1 deletion .github/ci_commit_pins/vision.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2d82dc5caa336d179d9b46ac4a0fb8c43d84c5cc
617079d944b0e72632311c30ae2bbdf1168b901e
35 changes: 5 additions & 30 deletions .github/ci_configs/vllm/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG CUDA_VERSION=12.8.1
ARG CUDA_VERSION=12.9.1
ARG PYTHON_VERSION=3.12

# BUILD_BASE_IMAGE: used to setup python build xformers, and vllm wheels, It can be replaced with a different base image from local machine,
Expand Down Expand Up @@ -124,7 +124,7 @@ RUN --mount=type=cache,target=/root/.cache/uv bash - <<'BASH'
git clone https://github.com/facebookresearch/xformers.git

pushd xformers
git checkout v0.0.32.post2
git checkout v0.0.33.post1
git submodule update --init --recursive
python3 setup.py bdist_wheel --dist-dir=../xformers-dist --verbose
popd
Expand Down Expand Up @@ -256,7 +256,7 @@ ENV UV_INDEX_STRATEGY="unsafe-best-match"
# Use copy mode to avoid hardlink failures with Docker cache mounts
ENV UV_LINK_MODE=copy

# Install build and runtime dependencies, this is needed for flashinfer install
# Install build and runtime dependencies
COPY requirements/build.txt requirements/build.txt
COPY use_existing_torch.py use_existing_torch.py
RUN python3 use_existing_torch.py
Expand Down Expand Up @@ -294,33 +294,9 @@ RUN --mount=type=cache,target=/root/.cache/uv \
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --system /wheels/xformers/*.whl --verbose

# Build FlashInfer from source
ARG torch_cuda_arch_list='8.0;8.9;9.0a;10.0a;12.0'
ENV TORCH_CUDA_ARCH_LIST=${torch_cuda_arch_list}

# TODO(elainewy): remove this once vllm commit is updated, and install flashinfer from pip
# see https://github.com/pytorch/pytorch/pull/165274#issuecomment-3408531784
ARG FLASHINFER_GIT_REPO="https://github.com/flashinfer-ai/flashinfer.git"
ARG FLASHINFER_GIT_REF="v0.2.14.post1"

RUN --mount=type=cache,target=/root/.cache/uv \
git clone --depth 1 --recursive --shallow-submodules \
--branch ${FLASHINFER_GIT_REF} \
${FLASHINFER_GIT_REPO} flashinfer \
&& echo "Building FlashInfer with AOT for arches: ${torch_cuda_arch_list}" \
&& cd flashinfer \
&& python3 -m flashinfer.aot \
&& python3 -m build --no-isolation --wheel --outdir ../wheels/flashinfer \
&& cd .. \
&& rm -rf flashinfer

# Install FlashInfer
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --system wheels/flashinfer/*.whl --verbose

# Logging to confirm the torch versions
RUN pip freeze | grep -E 'torch|xformers|vllm|flashinfer'
RUN uv pip freeze | grep -i '^torch\|^torchvision\|^torchaudio\|^xformers\|^vllm\|^flashinfer' > build_summary.txt
RUN pip freeze | grep -E 'torch|xformers|vllm'
RUN uv pip freeze | grep -i '^torch\|^torchvision\|^torchaudio\|^xformers\|^vllm' > build_summary.txt
################### VLLM INSTALLED IMAGE ####################


Expand All @@ -331,4 +307,3 @@ FROM scratch as export-wheels
COPY --from=base /workspace/xformers-dist /wheels/xformers
COPY --from=build /workspace/vllm-dist /wheels/vllm
COPY --from=vllm-base /workspace/build_summary.txt /wheels/build_summary.txt
COPY --from=vllm-base /workspace/wheels/flashinfer /wheels/flashinfer-python
4 changes: 4 additions & 0 deletions .github/scripts/generate_binary_build_matrix.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@

PYTORCH_EXTRA_INSTALL_REQUIREMENTS = {
"12.6": (
"cuda-bindings==12.9.4; platform_system == 'Linux' | "
"nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' | "
"nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' | "
"nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' | "
Expand All @@ -67,6 +68,7 @@
"nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux'"
),
"12.8": (
"cuda-bindings==12.9.4; platform_system == 'Linux' | "
"nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' | "
"nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' | "
"nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' | "
Expand All @@ -84,6 +86,7 @@
"nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux'"
),
"12.9": (
"cuda-bindings==12.9.4; platform_system == 'Linux' | "
"nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' | "
"nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' | "
"nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' | "
Expand All @@ -101,6 +104,7 @@
"nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux'"
),
"13.0": (
"cuda-bindings==13.0.3; platform_system == 'Linux' | "
"nvidia-cuda-nvrtc==13.0.88; platform_system == 'Linux' | "
"nvidia-cuda-runtime==13.0.96; platform_system == 'Linux' | "
"nvidia-cuda-cupti==13.0.85; platform_system == 'Linux' | "
Expand Down
2 changes: 1 addition & 1 deletion .github/scripts/prepare_vllm_wheels.sh
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ repackage_wheel() {
${PYTHON_EXECUTABLE} -mpip install wheel==0.45.1

pushd externals/vllm/wheels
for package in xformers flashinfer-python vllm; do
for package in xformers vllm; do
repackage_wheel $package
done
popd
5 changes: 4 additions & 1 deletion .github/workflows/_linux-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,7 @@ jobs:
SCCACHE_REGION: ${{ !contains(matrix.runner, 'b200') && 'us-east-1' || '' }}
SHM_SIZE: ${{ contains(inputs.build-environment, 'cuda') && '2g' || '1g' }}
DOCKER_IMAGE: ${{ steps.calculate-docker-image.outputs.docker-image }}
DOCKER_IMAGE_S390X: ${{ inputs.docker-image }}
XLA_CUDA: ${{ contains(inputs.build-environment, 'xla') && '0' || '' }}
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: ${{ matrix.mem_leak_check && '1' || '0' }}
Expand Down Expand Up @@ -360,10 +361,12 @@ jobs:
# if for some reason cleanup action doesn't stop container
# when job is cancelled
DOCKER_SHELL_CMD="sleep 12h"
USED_IMAGE="${DOCKER_IMAGE_S390X}"
else
SHM_OPTS="--shm-size=${SHM_SIZE}"
JENKINS_USER="--user jenkins"
DOCKER_SHELL_CMD=
USED_IMAGE="${DOCKER_IMAGE}"
fi

# detached container should get cleaned up by teardown_ec2_linux
Expand Down Expand Up @@ -426,7 +429,7 @@ jobs:
${JENKINS_USER} \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}" \
"${USED_IMAGE}" \
${DOCKER_SHELL_CMD}
)
echo "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}"
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/attention_op_microbenchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ jobs:
uses: ./.github/workflows/_linux-build.yml
with:
runner: linux.12xlarge.memory
build-environment: linux-jammy-cuda12.8-py3.10-gcc9-sm80
build-environment: linux-jammy-cuda12.8-py3.10-gcc11-sm80
docker-image-name: ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11
cuda-arch-list: '8.0 9.0'
test-matrix: |
Expand All @@ -39,7 +39,7 @@ jobs:
needs: attn-microbenchmark-build
with:
timeout-minutes: 500
build-environment: linux-jammy-cuda12.8-py3.10-gcc9-sm80
build-environment: linux-jammy-cuda12.8-py3.10-gcc11-sm80
docker-image: ${{ needs.attn-microbenchmark-build.outputs.docker-image }}
test-matrix: ${{ needs.attn-microbenchmark-build.outputs.test-matrix }}
secrets: inherit
Expand All @@ -51,7 +51,7 @@ jobs:
uses: ./.github/workflows/_linux-build.yml
with:
runner: linux.12xlarge.memory
build-environment: linux-jammy-cuda12.8-py3.10-gcc9-sm100
build-environment: linux-jammy-cuda12.8-py3.10-gcc11-sm100
docker-image-name: ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11
cuda-arch-list: '10.0'
test-matrix: |
Expand All @@ -66,7 +66,7 @@ jobs:
needs: opmicrobenchmark-build-b200
with:
timeout-minutes: 500
build-environment: linux-jammy-cuda12.8-py3.10-gcc9-sm100
build-environment: linux-jammy-cuda12.8-py3.10-gcc11-sm100
docker-image: ${{ needs.opmicrobenchmark-build-b200.outputs.docker-image }}
test-matrix: ${{ needs.opmicrobenchmark-build-b200.outputs.test-matrix }}
aws-role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_s3_and_ecr_read_only
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/b200-distributed.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
needs: get-label-type
with:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
runner: linux.12xlarge.memory
runner: linux.r7i.4xlarge
build-environment: linux-jammy-cuda12.8-py3.10-gcc11-distributed-b200
docker-image-name: ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11
cuda-arch-list: '10.0'
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/b200-symm-mem.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
needs: get-label-type
with:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
runner: linux.12xlarge.memory
runner: linux.r7i.4xlarge
build-environment: linux-jammy-cuda12.8-py3.10-gcc11-sm100-symm
docker-image-name: ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11
cuda-arch-list: '10.0'
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/docker-builds.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,7 @@ jobs:
pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11,
pytorch-linux-jammy-cuda13.0-cudnn9-py3-gcc11,
pytorch-linux-jammy-cuda12.8-cudnn9-py3.12-gcc11-vllm,
pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks,
pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9,
pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-inductor-benchmarks,
pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11,
pytorch-linux-jammy-py3.10-clang12,
pytorch-linux-jammy-py3.11-clang12,
Expand All @@ -75,7 +74,8 @@ jobs:
pytorch-linux-jammy-py3-clang12-onnx,
pytorch-linux-jammy-linter,
pytorch-linux-jammy-cuda12.8-cudnn9-py3.10-linter,
pytorch-linux-jammy-py3-clang12-executorch,
# TODO: Re-enable me when docker pin update happens
# pytorch-linux-jammy-py3-clang12-executorch,
pytorch-linux-jammy-py3.12-triton-cpu,
pytorch-linux-noble-riscv64-py3.12-gcc14
]
Expand Down
Loading