[AUTOGENERATED] develop_IFU_20260211 by pragupta · Pull Request #2969 · ROCm/pytorch

pragupta · 2026-02-11T16:10:38Z

rocm_base: fe101ec

Implements ONNX export for `torch.ops.higher_order.invoke_subgraph`, which is created by `torch.compiler.nested_compile_region`. Actual function preservation needs update in onnxscript optimizer and version converter to prevent inlining. ## Example ```python class Model(torch.nn.Module): def forward(self, x, y): def inner_fn(a, b): return torch.mul(a, b) + a # Function preserved as separate entity in ONNX graph, not inlined (when onnxscript is updated) return torch.compiler.nested_compile_region(inner_fn)(x, y) onnx_program = torch.onnx.export(Model(), (x, y), dynamo=True) ``` Replaces pytorch#172715 Fixes pytorch#172459 Pull Request resolved: pytorch#174283 Approved by: https://github.com/titaiwangms Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

On riscv64, installing lintrunner 0.12.7 from sdist fails because its build dependency maturin<0.13 cannot be installed: pip install "maturin>=0.12,<0.13" fails with: BackendUnavailable: Cannot import 'setuptools.build_meta' This can be reproduced on x86 also. Upgrading to maturin >= 1.0 (as done in lintrunner 0.12.11) resolves the issue. Pull Request resolved: pytorch#173658 Approved by: https://github.com/malfet

…173558) **Context** Previously, list / dict comprehensions were treated as a function call and would add a new frame to the Python stack. As a result, if there was a graph break in the comprehension, Dynamo would only skip tracing the comprehension code. In Python 3.12, comprehensions are inlined into their surrounding function, so when we graph break, the entire function is skipped. This PR handles list / dict comprehensions in Dynamo by only skipping tracing for the bytecode related to the comprehension. References - PEP709: https://peps.python.org/pep-0709/ **Solution** 1. Ops BUILD_LIST and BUILD_MAP are always at the beginning of list or dict comprehensions, respectively. When processing these ops in Dynamo, we check if the preceding instructions indicate a comprehension. This is done by `_is_comprehension_start`. `_is_comprehension_start `dynamically retrieves the bytecode prefix for a comprehension using `get_comprehension_bytecode_prefix`. `get_comprehension_bytecode_prefix` builds a dummy list comprehension and gets the associated instruction opnames. 2. If we identify that we are in a comprehension, we check if we can speculate and if we are not in a nested comprehension. If these checks pass and speculation is not failed, we set a checkpoint via speculation. 3. If a graph break is triggered, we handle it in the normal way by restarting tracing. Once we reach the checkpoint set in BUILD_LIST / BUILD_MAP, we handle the graph break in `_handle_comprehension_graph_break`. 4. At a high level, this function compiles the graph up to the comprehension, adds the comprehension bytecode to be run eagerly, generates code to load any locals created in the comprehension, and creates a resume function for code after the comprehension. 5. Handling the comprehesion graph break involves analysis of the bytecode to determine the instruction that ends the comprehension bytecode, the result variable (if there is one), whether the result should stay on the stack, what happens to the result, iterator variables that need to be restored, other locals produced / modified in the comprehension, and vars read from the outer scope. To help with this, we create dataclass `ComprehensionAnalysis` that is returned by `_analyze_comprehension`. This function also dynamically retrieves example bytecode sequences during analysis, ensuring that it is resilient to bytecode changes across Python versions. 6. Finally, we resume tracing as usual. **Edge Cases Handled** 1. Multiple comprehensions with a graph break in only one 7. Multiple comprehensions with graphs breaks in all 8. Comprehension that calls a function that produces a graph break 9. Nested comprehensions with graph breaks 10. Comprehensions with multiple iterators 11. Comprehensions discarded without usage (as opposed to being assigned to a variable) 12. Comprehensions that are used in an expression before being stored to a variable 13. Comprehensions that are directly returned 14. 1 or more Walrus operators (creating side effects) in comprehension 15. Side effects nested in comprehensions. 16. Comprehensions that mutate or read outer variables 17. Comprehensions that mutate or read global variables 18. Comprehensions that modify closure variables 19. List and dict comprehensions together **Edge Cases Unimplemented** 1. Comprehension graph break in resume function with captured variables (e.g. test_torch.py::TestTorchDeviceTypeCPU::test_cauchy_kstest_cpu_bfloat16) 2. Comprehension with captured tensor not in local slot (e.g. test_autograd.py::TestAutograd::test_pickle) **Test Cases** New test cases are added in test_comprehensions.py. These cases test for the production of the correct number of graphs and the correct number of specific operators in each graph. **Misc Notes** 1. One extension of this system is to skip tracing for arbitrary sequences of bytecode such as in loops, try blocks, generic context managers, etc. This code is currently highly specific to comprehensions and would need significant refactoring for this purpose. **Next Steps** 1. Add support for torch._dynamo.config.nested_graph_breaks=True. In the currently implementation, we fall back to skipping the entire frame when nested_graph_breaks=True. As a follow up, we would like to have this functionality supported. 5. Add support for set comprehensions. We currently only support list and dict comprehensions. Fixes pytorch#171822 Pull Request resolved: pytorch#173558 Approved by: https://github.com/williamwen42

Also updated test logic, as OpSchema was replaced by OpSignature for onnx functions. Required for pytorch#165083 Pull Request resolved: pytorch#173828 Approved by: https://github.com/titaiwangms, https://github.com/malfet

Pull Request resolved: pytorch#174213 Approved by: https://github.com/liangel-02

Pull Request resolved: pytorch#174214 Approved by: https://github.com/liangel-02 ghstack dependencies: pytorch#174213

Pull Request resolved: pytorch#174215 Approved by: https://github.com/liangel-02 ghstack dependencies: pytorch#174213, pytorch#174214

Pull Request resolved: pytorch#174216 Approved by: https://github.com/liangel-02 ghstack dependencies: pytorch#174213, pytorch#174214, pytorch#174215

Pull Request resolved: pytorch#174217 Approved by: https://github.com/bdhirsh, https://github.com/atalman ghstack dependencies: pytorch#174213, pytorch#174214, pytorch#174215, pytorch#174216

…els (pytorch#174316) Pull Request resolved: pytorch#174316 Approved by: https://github.com/albanD

…t a security issues (pytorch#174318) Pull Request resolved: pytorch#174318 Approved by: https://github.com/albanD ghstack dependencies: pytorch#174316

Add a test for pytorch#158029 Pull Request resolved: pytorch#174225 Approved by: https://github.com/eqy, https://github.com/galv, https://github.com/eellison, https://github.com/BoyuanFeng

…172160) The `same_meta` function was missing checks for `is_conj()` and `is_neg()` tensor flags. This caused `remove_noop_ops` to incorrectly remove `clone` operations that were resolving conjugation (from `resolve_conj()`). When complex convolution is compiled, the C++ implementation calls `resolve_conj()` before `view_as_real()`. The `resolve_conj()` traces to a `clone` operation. Without the conjugate bit check, this clone was being removed as a "no-op", causing `view_as_real` to be called on a still-conjugated tensor, which fails with: "view_as_real doesn't work on unresolved conjugated tensors" Added regression tests: - test_complex_real_imag_conj: tests real/imag extraction from conj tensors - test_complex_conv2d_conj: tests complex convolution with conj inputs Fixes pytorch#171665 Pull Request resolved: pytorch#172160 Approved by: https://github.com/eellison

Fixes pytorch#134173 NOTE: Uncommenting the following https://github.com/pytorch/pytorch/blob/d8039170f00cf084e4af91f1db84497bfccdf149/test/inductor/test_compiled_autograd.py#L5215 https://github.com/pytorch/pytorch/blob/d8039170f00cf084e4af91f1db84497bfccdf149/test/inductor/test_compiled_autograd.py#L5313 and running `python test/inductor/test_compiled_autograd.py TestAutogradWithCompiledAutograd.test_graph_save_on_cpu` fails for a different reason ``` torch._dynamo.exc.Unsupported: Attempted to call function marked as skipped Explanation: Dynamo developers have intentionally marked that the function `save_on_cpu.__init__.<locals>.unpack_from_cpu` in file `/opt/pytorch/pytorch/torch/autograd/graph.py` should not be traced. Hint: Avoid calling the function `save_on_cpu.__init__.<locals>.unpack_from_cpu`. Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `save_on_cpu.__init__.<locals>.unpack_from_cpu` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function. Hint: Please file an issue to PyTorch. ``` Pull Request resolved: pytorch#172578 Approved by: https://github.com/ezyang

Changes: Add launch_pdl: True to combo kernel triton_meta when PDL is enabled Fix missing shape=() parameter in _handle_pdl_after_load() Add tests for PDL + combo kernel integration See, example kernel: https://gist.github.com/eellison/50fea54d1096b0ece3c97f6e8ee02d5b written with claude Pull Request resolved: pytorch#174232 Approved by: https://github.com/karthickai, https://github.com/v0i0

# Motivation Move EmptyTensor to PyTorch for better maintenance. # Additional Context The pin commit intel/torch-xpu-ops@83c9813 is from a viable strict [branch](https://github.com/intel/torch-xpu-ops/commits/viable/strict/). The flow is to first land this PR, then land intel/torch-xpu-ops#2836, and finally update the pin commit from the main branch. Pull Request resolved: pytorch#174194 Approved by: https://github.com/EikanWang

Fixes pytorch#173995 Pull Request resolved: pytorch#174009 Approved by: https://github.com/malfet, https://github.com/ngimel, https://github.com/albanD

lintrunner now provides official riscv64 wheels from 0.13.0, so it can be safely enabled on riscv64 Pull Request resolved: pytorch#173993 Approved by: https://github.com/Skylion007, https://github.com/cyyever

… for external template buffers (pytorch#174148) Design doc: pytorch/helion#1346. Add two extension points for external template buffers (e.g. Helion kernel): - `codegen_template_override()` in SIMDKernel - allows custom template code generation - `emit_kernel_override()` in Kernel - allows custom kernel emission to wrapper These hooks enable external template buffers to integrate with Inductor's template fusion without modifying core Inductor code. After this PR, we will add Helion dynamo variable and HOP handling in pytorch/helion#1351. Pull Request resolved: pytorch#174148 Approved by: https://github.com/jansel

…#174077) Addresses the TODO in `test_local_tensor.py` by adding view ops testing for LocalTensor Pull Request resolved: pytorch#174077 Approved by: https://github.com/dzmitry-huba

fixes pytorch#166387 As pytorch moved to [new API](https://docs.pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices) for tf32, many like transformer started using them. It seems pytorch inductor is still using old allow_tf32, so when new API is invoked and read happens for old API we see error like ERROR: PyTorch is checking whether allow_tf32_new is enabled for cuBlas matmul,Current status indicate that you have used mix of the legacy and new APIs to set the TF32 status for cublas matmul. We suggest only using the new API to set the TF32 flag. See also: https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices My PR is addressing this by using new API in inductor. currently I have only made changes pytorch issue lined above and transofrmer, you can see traceback in [comment](huggingface/transformers#42371 (comment)) section here, huggingface/transformers#42371 Can I can start changing more allow allow_tf32 in inductor Pull Request resolved: pytorch#173731 Approved by: https://github.com/jansel, https://github.com/isuruf Co-authored-by: Isuru Fernando <isuruf@gmail.com>

This fixes pytorch#173879 by using the proposed formula and indicating the shape of the result. Examples showing that the shape indication is correct: ```python >>> import torch >>> a = torch.randn(2, 3, 4, 5, 6) >>> b = torch.randn(5, 6, 7) >>> torch.tensordot(a, b, dims=2).shape torch.Size([2, 3, 4, 7]) >>> a.shape[:-2] torch.Size([2, 3, 4]) >>> b.shape[2:] torch.Size([7]) >>> a = torch.randn(2, 3, 4) >>> b = torch.randn(2, 3, 4, 5, 6) >>> torch.tensordot(a, b, dims=3).shape torch.Size([5, 6]) >>> a.shape[:-3] torch.Size([]) >>> b.shape[3:] torch.Size([5, 6]) ``` Pull Request resolved: pytorch#173893 Approved by: https://github.com/mikaylagawarecki

As the title Pull Request resolved: pytorch#173199 Approved by: https://github.com/atalman, https://github.com/malfet

Use torch._check instead of direct comparison in squareCheckInputs to defer validation to runtime for unbacked symbolic dimensions. Also use sym_min/sym_max in linalg_lu_factor_ex_meta and make_contiguous_strides_for to handle symbolic dimensions properly. This enables the following 18 ops to work with unbacked symbolic dimensions: - cholesky_inverse - linalg.cholesky, linalg.cholesky_ex - linalg.det, linalg.slogdet - linalg.eig, linalg.eigh, linalg.eigvals, linalg.eigvalsh - linalg.inv, linalg.inv_ex - linalg.ldl_factor, linalg.ldl_factor_ex - linalg.lu_factor, linalg.lu_factor_ex - lu, triangular_solve - matrix_exp Pull Request resolved: pytorch#173399 Approved by: https://github.com/aorenste

…ests (pytorch#171625) Otherwise some paddings seem to fail pattern-match Pull Request resolved: pytorch#171625 Approved by: https://github.com/ngimel, https://github.com/eellison

…k_size (pytorch#174285) # Motivation Fix pytorch#174268 introduced by pytorch#171671, which breaks XPU CI. Pull Request resolved: pytorch#174285 Approved by: https://github.com/desertfire, https://github.com/jansel

… inputs (pytorch#174334) When caching an AOTAutograd entry for a model where an output is a view of an input with dynamic shapes, pickle fails because view_meta_sequence contains SymInt references that create a chain to unpicklable objects (WeakValueDictionary). The fix clears view_meta_sequence in make_runtime_safe() when it has symbolic inputs. This is safe because gen_alias_from_base() already skips view replay for symbolic inputs and falls back to as_strided(). This PR was authored with Claude. Fixes: pytorch#174299 Pull Request resolved: pytorch#174334 Approved by: https://github.com/aorenste

Reduced Dynamo compile time from 14.71 seconds to 13.896 seconds. 1) Cache only on Source object - makes lookup faster 2) Extend the variable tracker cache to lazy variable trackers. Earlier, we were creating duplicate copies of VT, and unnecessary calling the __call__ method of LazyVT to construct the variable tracker many times. Now, the cache just returns the cached lazy VT, and if its realized, we just use the realized VT. Pull Request resolved: pytorch#174242 Approved by: https://github.com/Lucaskabela, https://github.com/williamwen42

Fixes pytorch#174296. Pull Request resolved: pytorch#174300 Approved by: https://github.com/atalman, https://github.com/malfet

Differential Revision: D92291036 Pull Request resolved: pytorch#174302 Approved by: https://github.com/zhxchen17

Purely claude-coded using metal-kernel writing skill. Performance comparison collected using `python test/bench_mps_ops.py grid_sampler_2d` | Benchmark | MPSGraph (us) | Metal Shader (us) | |---|---|---| | grid_sample-bilinear-64x64 (torch.float16) | 114.7 | 120.0 | | grid_sample-bilinear-128x128 (torch.float16) | 180.4 | 151.1 | | grid_sample-bilinear-256x256 (torch.float16) | 423.5 | 364.9 | | grid_sample-bilinear-512x512 (torch.float16) | 2393.1 | 1145.3 | | grid_sample-nearest-64x64 (torch.float16) | 107.7 | 112.3 | | grid_sample-nearest-128x128 (torch.float16) | 131.6 | 124.3 | | grid_sample-nearest-256x256 (torch.float16) | 215.1 | 204.2 | | grid_sample-nearest-512x512 (torch.float16) | 1089.2 | 565.0 | | grid_sample-bilinear-64x64 (torch.float32) | 117.4 | 139.5 | | grid_sample-bilinear-128x128 (torch.float32) | 165.4 | 188.9 | | grid_sample-bilinear-256x256 (torch.float32) | 462.0 | 398.8 | | grid_sample-bilinear-512x512 (torch.float32) | 4311.3 | 1483.5 | | grid_sample-nearest-64x64 (torch.float32) | 113.6 | 100.3 | | grid_sample-nearest-128x128 (torch.float32) | 134.6 | 122.1 | | grid_sample-nearest-256x256 (torch.float32) | 263.4 | 208.6 | | grid_sample-nearest-512x512 (torch.float32) | 2289.0 | 896.6 | | grid_sample-bilinear-64x64 (torch.bfloat16) | 114.3 | 132.9 | | grid_sample-bilinear-128x128 (torch.bfloat16) | 152.4 | 182.5 | | grid_sample-bilinear-256x256 (torch.bfloat16) | 343.4 | 369.3 | | grid_sample-bilinear-512x512 (torch.bfloat16) | 2333.9 | 1155.2 | | grid_sample-nearest-64x64 (torch.bfloat16) | 107.5 | 106.1 | | grid_sample-nearest-128x128 (torch.bfloat16) | 130.4 | 114.0 | | grid_sample-nearest-256x256 (torch.bfloat16) | 211.9 | 190.3 | | grid_sample-nearest-512x512 (torch.bfloat16) | 795.9 | 540.7 | TODOs: - Code sharing for interpolation mode between upsample and grid-sampler Fixes pytorch#174339 and pytorch#125098 Pull Request resolved: pytorch#174343 Approved by: https://github.com/manuelcandales ghstack dependencies: pytorch#174676, pytorch#174677, pytorch#174678

Pull Request resolved: pytorch#174606 Approved by: https://github.com/jansel

As the title. Pull Request resolved: pytorch#174059 Approved by: https://github.com/EikanWang, https://github.com/albanD

Pull Request resolved: pytorch#173481 Approved by: https://github.com/soulitzer, https://github.com/fegin

Add XPU_DRIVER activity to the profiler so it reports XPU L0 driver activities. It is counterpart to CUDA_DRIVER activity. Updates the third_party/kineto submodule. Add test. Pull Request resolved: pytorch#172940 Approved by: https://github.com/guangyey, https://github.com/sraikund16

As the title suggests, for better documentation. Pull Request resolved: pytorch#174453 Approved by: https://github.com/EikanWang

…ch#174705) This is to fix the CI: [https://github.com/pytorch/pytorch/actions/runs/21844168344/job/63036440860?pr=174628](https://www.google.com/url?sa=D&q=https%3A%2F%2Fgithub.com%2Fpytorch%2Fpytorch%2Factions%2Fruns%2F21844168344%2Fjob%2F63036440860%3Fpr%3D174628) Pull Request resolved: pytorch#174705 Approved by: https://github.com/oulgen

fix pytorch#168329 Pull Request resolved: pytorch#174675 Approved by: https://github.com/Skylion007

More optimizations will follow ! This one is simple: if we are evaluating a+b+c+... >0 and all terms are symbols/constants with var range >0 then return true before calling into expensive static evaluator. ***results*** export time 5m4.868s -> 3m4.165s (two minutes saved) Pull Request resolved: pytorch#174615 Approved by: https://github.com/Lucaskabela

…4610) There is an interesting use case I need to call out here: FlexAttention BlockMask's pytree registration contains arbitrary user defined mask_mod function. This gets problematic when we are exporting via dynamo_graph_capture_for_export because we re-run the model code multiple times where the output bytecode contains a logic to reconstruct user defined mask_mod. This doesn't work with aot_export's pytree thunkify logic as it would receive an spec that has different id for the mask_mod (because we reconstructed multiple times). This was not a problem for torch.compile because we always just re-run the inner graph module without inp/out processing. I think this is a result of our independent API's working correctly but the integration point between them is little awkward. (torch IR API + aot_autograd) The way we fix it is we wrap the user defined function with _MaskMod wrapper that does value based checking instead of identity so that two different reconstructions of mask_mod still returns True. I had to special case _MaskMod for the old export path since torch.export.export is still on the _dynamo_graph_capture_for_export. Pull Request resolved: pytorch#174610 Approved by: https://github.com/zhxchen17, https://github.com/drisspg

fix hipify import Differential Revision: D92366141 Pull Request resolved: pytorch#174706 Approved by: https://github.com/drisspg, https://github.com/shunting314, https://github.com/mlazos

Update the torch-xpu-ops commit to [intel/torch-xpu-ops@077a6c](intel/torch-xpu-ops@077a6ce), includes: - Adjust layer_norm_backward_kernel interface to match that of PyTorch - Fix incorrect Tensor Size for NestedTensor QKV Transform - Support calling oneCCL AllToAll API directly - Add NaN input checks to prevent false singular matrix errors in oneMKL linear algebra operations Pull Request resolved: pytorch#174591 Approved by: https://github.com/EikanWang

Pull Request resolved: pytorch#174466 Approved by: https://github.com/Skylion007, https://github.com/zpcore, https://github.com/wconstab

The recursion limit has to be unset before exitting `subTest` or it may fail inside of pytest due to the low limit set in the test. Pull Request resolved: pytorch#174693 Approved by: https://github.com/Lucaskabela, https://github.com/Skylion007

copies the step in _sharding_prop.py to properly cache? https://github.com/pytorch/pytorch/blob/ed0b1fec7e3b9e3b8d767506696506059b1ad2b0/torch/distributed/tensor/_sharding_prop.py#L500 Pull Request resolved: pytorch#174616 Approved by: https://github.com/wconstab

…74447) Pull Request resolved: pytorch#174447 Approved by: https://github.com/jansel ghstack dependencies: pytorch#173685

Pull Request resolved: pytorch#174155 Approved by: https://github.com/jansel ghstack dependencies: pytorch#174154

This PR fixes `RuntimeError: CUDA driver error: invalid argument` when combo kernels have large ynumels that exceed grid.y limit. Added y/z grid overflow handling similar to `Grid2DWithYZOverflow` This issue happens when `combo_kernel_per_subkernel_blocks = False` (which is False by default). After the flatten dispatch PR pytorch#172527 is added, `combo_kernel_per_subkernel_blocks = True` will make this issue obsolete. Pull Request resolved: pytorch#174354 Approved by: https://github.com/mlazos

…170575) Pull Request resolved: pytorch#170575 Approved by: https://github.com/eellison

…#174533) Differential Revision: D92629416 Support tlparse's fx_graph_runnable with nested user defined triton kernels and constexprs. Also fixes some edge cases with user defined triton kernels. Pull Request resolved: pytorch#174533 Approved by: https://github.com/eellison

This converts NanCheck into an op so it can be used from outside of ProcessGroupNCCL. This can be used from torchcomms. Misc changes: * add CPU implementation * use CUDA_KERNEL_ASSERT macro so it logs a more helpful message when nancheck fires Test plan: CI ``` $ python -c "import torch; torch.ops.c10d.check_for_nan(torch.tensor(float('nan'), device='cuda')); torch.cuda.synchronize()" (pytorch-3.12) /home/tristanr/pytorch/torch/csrc/distributed/c10d/NanCheck.cu:217: checkForNaN: block: [0,0,0], thread: [0,0,0] Assertion `!isnan(tailPtr[threadIdx.x])` failed. Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/tristanr/pytorch/torch/cuda/__init__.py", line 1165, in synchronize return torch._C._cuda_synchronize() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.AcceleratorError: CUDA error: device-side assert triggered Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. ``` Pull Request resolved: pytorch#174736 Approved by: https://github.com/kwen2501, https://github.com/allenwang28

Changes: Add launch_pdl: True to combo kernel triton_meta when PDL is enabled Fix missing shape=() parameter in _handle_pdl_after_load() Add tests for PDL + combo kernel integration See, example kernel: https://gist.github.com/eellison/50fea54d1096b0ece3c97f6e8ee02d5b written with claude Pull Request resolved: pytorch#174232 Approved by: https://github.com/karthickai, https://github.com/v0i0

# Conflicts: # .ci/docker/requirements-ci.txt # requirements-build.txt # torch/utils/hipify/cuda_to_hip_mappings.py

rocm-repo-management-api · 2026-02-11T16:16:50Z

Jenkins build for 241aa87f0fde758bc85bd988fb3812d02a1f43a2 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

rocm-repo-management-api · 2026-02-12T04:02:00Z

Jenkins build for 3ee04a9830bea722779f6591ffb9a2386afcfc14 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

justinchuby and others added 30 commits February 4, 2026 22:23

[CI] Update onnxscript version to 0.6.0 (pytorch#173828)

7567287

Also updated test logic, as OpSchema was replaced by OpSignature for onnx functions. Required for pytorch#165083 Pull Request resolved: pytorch#173828 Approved by: https://github.com/titaiwangms, https://github.com/malfet

Start removing assert in benchmark folder (pytorch#174213)

0153c72

Pull Request resolved: pytorch#174213 Approved by: https://github.com/liangel-02

More benchmark assert removal (pytorch#174214)

217d7d4

Pull Request resolved: pytorch#174214 Approved by: https://github.com/liangel-02 ghstack dependencies: pytorch#174213

Finish benchmark and start tests (pytorch#174215)

a0bddb9

Pull Request resolved: pytorch#174215 Approved by: https://github.com/liangel-02 ghstack dependencies: pytorch#174213, pytorch#174214

More test migration (pytorch#174216)

b062ca7

Pull Request resolved: pytorch#174216 Approved by: https://github.com/liangel-02 ghstack dependencies: pytorch#174213, pytorch#174214, pytorch#174215

More assert removal for test (pytorch#174217)

66badbb

Pull Request resolved: pytorch#174217 Approved by: https://github.com/bdhirsh, https://github.com/atalman ghstack dependencies: pytorch#174213, pytorch#174214, pytorch#174215, pytorch#174216

[Docs] Clarification that PTL models are as secure as TorchScript mod…

34db629

…els (pytorch#174316) Pull Request resolved: pytorch#174316 Approved by: https://github.com/albanD

[Docs] Clarify that numerical stability/incorrect calculations are no…

ff20609

…t a security issues (pytorch#174318) Pull Request resolved: pytorch#174318 Approved by: https://github.com/albanD ghstack dependencies: pytorch#174316

Add unittest test_nccl_cudagraph_multisegment (pytorch#174225)

460a3f6

Add a test for pytorch#158029 Pull Request resolved: pytorch#174225 Approved by: https://github.com/eqy, https://github.com/galv, https://github.com/eellison, https://github.com/BoyuanFeng

Fix: index.Tensor behavior with empty indices (pytorch#174009)

94e84d8

Fixes pytorch#173995 Pull Request resolved: pytorch#174009 Approved by: https://github.com/malfet, https://github.com/ngimel, https://github.com/albanD

[RISCV] enable lintrunner on riscv64 build (pytorch#173993)

a578571

lintrunner now provides official riscv64 wheels from 0.13.0, so it can be safely enabled on riscv64 Pull Request resolved: pytorch#173993 Approved by: https://github.com/Skylion007, https://github.com/cyyever

[distributed][LocalTensor] add view ops test for LocalTensor (pytorch…

85628a9

…#174077) Addresses the TODO in `test_local_tensor.py` by adding view ops testing for LocalTensor Pull Request resolved: pytorch#174077 Approved by: https://github.com/dzmitry-huba

[CD] Update xpu support package version to 2025.3.2 (pytorch#173199)

3f82318

As the title Pull Request resolved: pytorch#173199 Approved by: https://github.com/atalman, https://github.com/malfet

[CUDA][Inductor] Disable pad_mm fx pass in test_int8_woq_mm_gpu t…

0e87273

…ests (pytorch#171625) Otherwise some paddings seem to fail pattern-match Pull Request resolved: pytorch#171625 Approved by: https://github.com/ngimel, https://github.com/eellison

[ROCm] forward fix pytorch#174087 (pytorch#174300)

bbc2f76

Fixes pytorch#174296. Pull Request resolved: pytorch#174300 Approved by: https://github.com/atalman, https://github.com/malfet

[find_triton_kernels] update logging level (pytorch#174302)

9a0c104

Differential Revision: D92291036 Pull Request resolved: pytorch#174302 Approved by: https://github.com/zhxchen17

malfet and others added 23 commits February 11, 2026 01:08

support strobelight profiling export (pytorch#174606)

eb2dc8b

Pull Request resolved: pytorch#174606 Approved by: https://github.com/jansel

[XPU] Add XPUGraph related stubs to __init__.pyi.in (pytorch#174059)

5ef2e50

As the title. Pull Request resolved: pytorch#174059 Approved by: https://github.com/EikanWang, https://github.com/albanD

[Flex] Dont materialize lse grad (pytorch#173481)

e0719de

Pull Request resolved: pytorch#173481 Approved by: https://github.com/soulitzer, https://github.com/fegin

add xpu previous installation commands (pytorch#174453)

022108d

As the title suggests, for better documentation. Pull Request resolved: pytorch#174453 Approved by: https://github.com/EikanWang

[FSDP2] improve _get_param_to_fqns from O(N^2) to O(N) (pytorch#174675)

acc10bf

fix pytorch#168329 Pull Request resolved: pytorch#174675 Approved by: https://github.com/Skylion007

[pytorch] Fix hipify import for non-HIP CUDA builds (pytorch#174706)

5b53948

fix hipify import Differential Revision: D92366141 Pull Request resolved: pytorch#174706 Approved by: https://github.com/drisspg, https://github.com/shunting314, https://github.com/mlazos

[DTensor] tests for uneven/zero-size shards (pytorch#174466)

37eb151

Pull Request resolved: pytorch#174466 Approved by: https://github.com/Skylion007, https://github.com/zpcore, https://github.com/wconstab

Upgrade Troubleshooting GuardOnDataDependentSymNode Errors (pytorch#1…

e94a63e

…74447) Pull Request resolved: pytorch#174447 Approved by: https://github.com/jansel ghstack dependencies: pytorch#173685

use optimizaiton_hint to choose ReductionHint (pytorch#174155)

76922d5

Pull Request resolved: pytorch#174155 Approved by: https://github.com/jansel ghstack dependencies: pytorch#174154

[inductor] bucketing prioritize bucketing during scheduling (pytorch#…

6b27eca

…170575) Pull Request resolved: pytorch#170575 Approved by: https://github.com/eellison

Merge remote-tracking branch 'upstream/main' into develop_IFU_20260211

241aa87

# Conflicts: # .ci/docker/requirements-ci.txt # requirements-build.txt # torch/utils/hipify/cuda_to_hip_mappings.py

pragupta requested review from jeffdaily and jithunnair-amd as code owners February 11, 2026 16:10

Fix merge conflicts

3ee04a9

pragupta merged commit cc3acaf into develop Feb 12, 2026
79 of 83 checks passed

pragupta deleted the develop_IFU_20260211 branch February 12, 2026 16:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AUTOGENERATED] develop_IFU_20260211#2969

[AUTOGENERATED] develop_IFU_20260211#2969
pragupta merged 1017 commits into
developfrom
develop_IFU_20260211

pragupta commented Feb 11, 2026

Uh oh!

rocm-repo-management-api Bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

rocm-repo-management-api Bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

pragupta commented Feb 11, 2026

Uh oh!

rocm-repo-management-api Bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api Bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

rocm-repo-management-api Bot commented Feb 11, 2026 •

edited

Loading

rocm-repo-management-api Bot commented Feb 12, 2026 •

edited

Loading