Skip to content

[pull] master from tensorflow:master#775

Merged
pull[bot] merged 14 commits intobarkpixels:masterfrom
tensorflow:master
Nov 5, 2025
Merged

[pull] master from tensorflow:master#775
pull[bot] merged 14 commits intobarkpixels:masterfrom
tensorflow:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Nov 5, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

shawnwang18 and others added 14 commits November 5, 2025 01:50
… by --xla-dump-to through dump.h

Imported from GitHub PR openxla/xla#33505

📝 Summary of Changes
This PR migrates the command buffer content's dump through dump.h

Copybara import of the project:

--
0b870f229a40435766613743d47f274e06af4763 by Shawn Wang <shawnw@nvidia.com>:

Dump command buffer contents to folder specified by --xla-dump-to through dump.h

--
123c2541701f4d65771b522fbb68e1e0edb7a6e4 by Shawn Wang <shawnw@nvidia.com>:

clang format fix

Merging this change closes #33505

PiperOrigin-RevId: 828357446
Imported from GitHub PR openxla/xla#32812

Co-author: @kxxt

📝 Summary of Changes:

This pull request adds support for RISC-V 64 architecture across the build system, code generation, and Python packaging infrastructure.

🎯 Justification:

The changes ensure that riscv64 is recognized as a valid target in Bazel build configurations, LLVM toolchain selection, Python manylinux compliance checks, and related tests and patches. This allows the project to build and test components for riscv64 alongside other supported architectures.

🚀 Kind of Contribution: ✨ New Feature

Copybara import of the project:

--
0d02393a6335fb43d67678d0cd15d671e77dc089 by gns <root@infi.wang>:

[XLA:CPU] Add support for riscv64

Co-authored-by: Levi Zim <rsworktech@outlook.com>

--
5d95fb479e45524299ff4193b99bb4db0d74483b by gns <root@infi.wang>:

Refresh `rules_python` riscv64 patch

Co-authored-by: Levi Zim <rsworktech@outlook.com>

Merging this change closes #32812

PiperOrigin-RevId: 828379922
The previous logic emitted code that was specific to triton, the CPU -> vector lowering works directly so move this code to triton specific lowering.

It also means that we add another op that supports 0D tensors.

PiperOrigin-RevId: 828381998
The kernel could be unstable and produce NaN values.
When one want to detect such cases they need to set the flag xla_gpu_experimental_enable_nan_counter_on_thunks to true.
In this case the counter for NaN values will be triggered
and if there is any then the execution will be crashed.

PiperOrigin-RevId: 828389430
…roadcast and dot helpers.

PiperOrigin-RevId: 828398536
We can't easily depend on SYCL, so for now let's disable all SYCL specific targets in the internal build.

PiperOrigin-RevId: 828414906
….34GB)

Imported from GitHub PR openxla/xla#33504

  📝 Summary of Changes

  Improved memory allocation error messages in tf_allocator_adapter.cc to display byte counts with comma separators
  and human-readable units.

  Before:
  Out of memory while trying to allocate 7450374152 bytes.

  After:
  Out of memory while trying to allocate 7,450,374,152 bytes (6.94GiB).

  Changes:
  - Added FormatByteSize() helper function that formats byte counts with commas for readability
  - Leverages existing tsl::strings::HumanReadableNumBytes() utility to append human-readable size (MiB/GiB/TiB)
  - Updated MemoryAllocationError() to use the new formatting

  🎯 Justification

  When debugging out-of-memory errors, users need to quickly understand allocation sizes. Without formatting, it's
  difficult to distinguish between millions and billions at a glance (e.g., is 7450374152 closer to 7 million or 7
  billion?).

  This change improves the developer experience by:
  1. Making large numbers easier to parse with comma separators
  2. Providing immediate intuition about size magnitude (MB vs GB vs TB)
  3. Maintaining exact byte precision for detailed debugging

  This benefits all workloads that encounter memory allocation failures, making error messages more actionable.

  🚀 Kind of Contribution

  ♻️ Cleanup

  🧪 Unit Tests

  No new unit tests added. This change only affects error message formatting and does not alter program logic or
  behavior. The existing allocation failure paths remain unchanged.

  🧪 Execution Tests

  No new execution tests needed. This is a cosmetic improvement to error messages that does not affect execution
  correctness or trigger any new code paths.
Copybara import of the project:

--
f738426970e6f067df3dde2f93bd2736294e7e5d by Ram Rachum <ram@rachum.com>:

Use human-readable units in "Out of memory" errors (e.g. 7.34GB)

Merging this change closes #33504

PiperOrigin-RevId: 828416974
Also adding a platform_name parameter to the `DeserializeThunkProto` function since we need to to deserialize this thunk (and it doesn't make sense to store in the proto).

PiperOrigin-RevId: 828417473
…xing and in the gather emitter

PiperOrigin-RevId: 828439076
Also removes an unused import in `gpu_executable.proto`

PiperOrigin-RevId: 828445423
Imported from GitHub PR openxla/xla#33212

This PR enables the HloEvaluator to handle complex numbers in more operations (trigonometric and hyperbolic). Tests are added to check if folding works for these operations.

It also disables `constant_folding` in `complex_unary_op_test.cc` test, as it was intended to check the accuracy of backends implementations. If constant folding remains enabled, the test ends up checking the accuracy of `libstdc++` implementations for `tan`, `asin`, and `asinh` instead
Copybara import of the project:

--
af6ae1d31b7d6bab2401a99dc397945c824e8631 by Aleksei Nurmukhametov <anurmukh@amd.com>:

Do not run constant_folding on complex_unary_op_test.cc

--
d3074cc139f07ef6e91723f6290b8294dcc38717 by Aleksei Nurmukhametov <anurmukh@amd.com>:

Enable HloEvaluator for more complex ops

Merging this change closes #33212

PiperOrigin-RevId: 828446740
…cast.

This will enable us to rewrite the lowering patterns at a higher level if needed and also allow us to reuse buffers. It will also make vectorizing using target length vectors possible e.g vector<8xf32> rather than the current scheme of emitting super-vectors and relying on LLVM to split.

PiperOrigin-RevId: 828448837
@pull pull bot locked and limited conversation to collaborators Nov 5, 2025
@pull pull bot added the ⤵️ pull label Nov 5, 2025
@pull pull bot merged commit 1f95e41 into barkpixels:master Nov 5, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants