[pull] master from tensorflow:master by pull[bot] · Pull Request #2575 · Mu-L/tensorflow

pull · 2026-01-14T13:23:25Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

PiperOrigin-RevId: 856063702

Reverts 68f1213 PiperOrigin-RevId: 856065178

PiperOrigin-RevId: 856068335

This change updates the `opaque` field in `CustomCallThunkProto` and the `str` fields in XLA FFI attribute protos from `string` to `bytes`. This allows these fields to contain arbitrary byte sequences, including non-UTF8 data, without causing proto parsing errors. New tests are added to verify that parsing succeeds with non-UTF8 content. PiperOrigin-RevId: 856069026

The functionality of the RealImagExpander, which simplifies `real(x)` and `imag(x)` when the input `x` is not a complex type, has been integrated into the AlgebraicSimplifier. PiperOrigin-RevId: 856075359

Imported from GitHub PR openxla/xla#34715 📝 Summary of Changes Update custom_call op_name with target to easily differentiate diff onednn custom ops. 🎯 Justification This will help debugging and viewing the ops in profile traceview. So, instead of `custom-call.2993.clone` it will show as `custom-call.2993.clone__onednn$matmul`. It will be easier to view the timeline trace. 📊 Benchmark (for Performance Improvements) This doesn't affect performance. Copybara import of the project: -- cec648197096c80c694b41247b973b2747e1e45e by Gauri Deshpande <gauri1.deshpande@intel.com>: Update onednn custom call name Merging this change closes #34715 PiperOrigin-RevId: 856080728

Have autotuner control the register spilling strategy rather than the ptx compiler This is a continuation of the previous work to add register spilling information into Executable to be accessed by the caller. This CL now uses this information in order to discard (or keep) executable candidates. This approach is both more logical from the caller's perspective and allows for more fine-tuning of the field in the future. Changed: fix in autotuner_compile_util.cc - accessing out.value() before checking for an error status. Added a return on failure before the register spilling check. Reverts 31a591a PiperOrigin-RevId: 856086795

Adds function to check if a `riegeli::Reader` points to a split proto file. This will be used to determine if this is a old or new format serialized ExecutableAndOptionsProto or GpuExecutable when deserealizing. PiperOrigin-RevId: 856099943

The code was assuming that if the output shape is a tuple, there will be users and those users will be GetTupleElement ops. However all that is needed is to determine the buffer slices, and we can do that via the tuple by passing the right ShapeIndex. This bug was found when also calling RunBackend() for a gpu_compiler_test. This modified test failed before and passes now. PiperOrigin-RevId: 856108591

The CL enhances the error reporting in Triton support checks by providing more specific messages when a conversion or operation is not supported. This includes detailing the types involved and the reasons for the unsupported decision. Additionally, test failure messages are improved by including the explanation from the CodegenDecision when an instruction is unexpectedly supported. PiperOrigin-RevId: 856110556

The table contains info we cannot get from the CUDA API. It is used to fill the recently added execution unit description in the device description. This change also includes immediate usage of this table: * The FPU count is updated to use it if the info is present in the table * Performance model base gets a new method to estimate the peak scalar performance for datatype. PiperOrigin-RevId: 856117428

This change adds support for scaled-dot operations where the input operands are of type F4E2M1FN and the scales are of type F8E8M0FNU. It includes: - Adding a test case in JAX's scaled_dot_test.py for F4 types. - Adding a device test in Triton's fusion_emitter_device_test.cc to verify Triton's handling of F4 scaled-dot on Hopper GPUs. - Disabling cuBLAS autotuning for F4 types in gemm_fusion_autotuner.cc as cuBLAS does not support this. - Updating composite_rewriter.cc to recognize F4E2M1FN as a valid operand type for scaled-dot when paired with F8E8M0FNU scales. - Improved logging in composite_rewriter.cc for unsupported scaled-dot cases. FP4 has an error in MMAv2 lowering path. The line auto dotOpA = cast<DotOperandEncodingAttr>(aTensorTy.getEncoding()); is wrong because the attr is not DotOperandEncodingAttr type. as a result of that FP4 scaled dot lowering crashes on some tile sizes on B200 and always crash on H100. PiperOrigin-RevId: 856123857

…ackedTranspose Imported from GitHub PR openxla/xla#34633 Replace per-transpose loops with a single unified loop that processes all transposes simultaneously, computing indices once and reusing them across all operations. Update `packed_transpose_multiple_heroes.hlo` test to verify the single-loop structure with multiple iter_args. It reduces the execution time for `fused_convert_transpose_3.hlo` ([attached](https://github.com/user-attachments/files/23861262/fused_convert_transpose_3.txt)) from Llama 3 8B FP8 by ~30% for MI300 and MI355. Copybara import of the project: -- 37e4ed1dd96afb39bb2b4a958800842d12545fa5 by Aleksei Nurmukhametov <anurmukh@amd.com>: [XLA:GPU] Fuse shmem write loops for transposes in PackedTranspose Replace per-transpose loops with a single unified loop that processes all transposes simultaneously, computing indices once and reusing them across all operations. Update packed_transpose_multiple_heroes.hlo test to verify the single-loop structure with multiple iter_args. Merging this change closes #34633 PiperOrigin-RevId: 856127072

PiperOrigin-RevId: 856130233

We had to infer TC clock scales since the info is not officially available. This change also corrects the B200 test device info based on available sources. A couple of interesting points are lower throughput on F64 and non-TC F16 vs H100. gpu_fusible_test has been updated to account for corrected larger SM number in the test data (cores*threads: 132*2048=270336 -> 148*2048=303104) PiperOrigin-RevId: 856144053

Imported from GitHub PR openxla/xla#36346 Hi, I was testing TensorFlow code with Svace static analyzer and found possible null dereference in XLA. Possible null dereference may occur because of missing return in MsaAlgorithm::UpdateAllocationRequirementForUseAliases(). There is a case when `aliased_allocation` variable could be null, and it could be dereferenced in AddAliasedRequiredAssignment(). #99907 Copybara import of the project: -- 969d905e81751fdf92581ad5fc5289ecc798d727 by Daniil Kutz <kutz@ispras.ru>: [XLA] Add missing return to prevent nullptr dereference Merging this change closes #36346 PiperOrigin-RevId: 856151503

akuegel and others added 16 commits January 13, 2026 23:51

Migrate int4_test to PJRT.

6a2752a

PiperOrigin-RevId: 856063702

[XLA] Fix stack frame metadata propagation in Shardy.

4ed8b56

Reverts 68f1213 PiperOrigin-RevId: 856065178

Automated Code Change

66efacf

PiperOrigin-RevId: 856068335

[XLA:GPU] Fold RealImagExpander logic into AlgebraicSimplifier.

2bf1600

The functionality of the RealImagExpander, which simplifies `real(x)` and `imag(x)` when the input `x` is not a complex type, has been integrated into the AlgebraicSimplifier. PiperOrigin-RevId: 856075359

Migrate gpu_test_correctness to use PJRT.

86c35dd

PiperOrigin-RevId: 856130233

pull bot locked and limited conversation to collaborators Jan 14, 2026

pull bot added the ⤵️ pull label Jan 14, 2026

pull bot merged commit 6055f7b into Mu-L:master Jan 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from tensorflow:master#2575

[pull] master from tensorflow:master#2575
pull[bot] merged 16 commits intoMu-L:masterfrom
tensorflow:master

pull bot commented Jan 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

pull bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

pull bot commented Jan 14, 2026 •

edited

Loading