Skip to content

Conversation

@rnitin1908
Copy link
Member

No description provided.

tensorflower-gardener and others added 30 commits May 9, 2025 12:12
PiperOrigin-RevId: 756853517
…ngs::StrAppend`.

`strings::StrCat` should eventually forward to `absl::StrCat`. Some references need to be rewritten as `absl::StrCat(absl::LegacyPrecision(...))` to avoid loss of precision.

PiperOrigin-RevId: 756865936
PiperOrigin-RevId: 756867042
…ation environments

PiperOrigin-RevId: 756881629
`.size()` and `operator[]` have a race condition. With this fix, the
threads won't access the container itself; they just writes the elements
they need to modify.

PiperOrigin-RevId: 756902174
PiperOrigin-RevId: 756907571
…tion given.

Currently, you have to specify the return type on ArrayTypeSwitch, but it is often redundant as it can be inferred from the functor provided.

PiperOrigin-RevId: 756909206
…t::Compile` before MLIR -> XlaComputation conversion

PjRt GPU applies additional transformations to preserve input/output layout, which is only implemented in `StreamExecutorGpuClient::CompileAndLoad` and not in `StreamExecutorGpuCompiler::Compile`. Doing the MLIR -> XlaComputation conversion without this logic causes custom layouts to be dropped.

PiperOrigin-RevId: 756914304
CpuClient::CreateUninitializedBuffer.

PiperOrigin-RevId: 756916747
The version of `upb` used in tensorflow and XLA is incompatible with Clang.
In particular, it generates a warning that the code uses a non-standard
C++ feature. Since this version of `upb` has `-Werror` in its build opts,
the warning breaks the build.

We want to be able to compile PyTorch/XLA with clang, and PyTorch/XLA depends
on `upb`. Therefore we need to make `upb` buildable with Clang.

In this change, we remove `-Werror` from `upb`'s build opts to prevent the
warnings generated by Clang to break the build. In general, we should never
use `-Werror` on code that we don't directly control, as our ability for fixing
the warnings in such code is limited.

PiperOrigin-RevId: 756927870
We should never crash when printing an XLA construct, even when it's invalid.

PiperOrigin-RevId: 756968212
…ow the initial fusion worklist is formed for a current computation.

PiperOrigin-RevId: 756997596
…fs using `tsl::SerializeToStringDeterministic`

PiperOrigin-RevId: 757021740
PiperOrigin-RevId: 757035793
PiperOrigin-RevId: 757044032
PiperOrigin-RevId: 757044520
PiperOrigin-RevId: 757054230
default memory type.

The configuration option `legacy_memory_space_behavior`, which currently
defaults to true, controls whether the old or the new behavior is followed.

PiperOrigin-RevId: 757057567
PiperOrigin-RevId: 757084827
PiperOrigin-RevId: 757091461
…mic shape

The previous CL that changed this logic to use the on-device shape from the device buffer generates literals with invalid sizes when the PjRt buffer has a dynamic shape.

PiperOrigin-RevId: 757091876
PiperOrigin-RevId: 757110017
PiperOrigin-RevId: 757152089
PiperOrigin-RevId: 757198443
PiperOrigin-RevId: 757209184
akuegel and others added 30 commits May 14, 2025 23:53
With Triton multi-output fusions, we can have tuple results for fusions. Adjust
the buffer comparison logic accordingly.

PiperOrigin-RevId: 759002766
PiperOrigin-RevId: 759010423
PiperOrigin-RevId: 759024702
PiperOrigin-RevId: 759027121
PiperOrigin-RevId: 759045624
PiperOrigin-RevId: 759046243
Also fix the BUILD file, so we do not skip testing this on H100.

PiperOrigin-RevId: 759050327
propagate broadcast multiplier upwards through all the ops up to the parameter.

The broadcast adds a new dim or with a bitcast expand an old dim.
When the expansion happens we set the broadcast multiplier to the source instruction. But as of now if before the broadcast we have more than one instruction we reset the broadcast multiplier back to one. Lets not do that.

PiperOrigin-RevId: 759051178
PiperOrigin-RevId: 759054070
PiperOrigin-RevId: 759068702
We weren't handling them correctly meaning you couldn't use a `shard_map`/`ManualComputationOp` which has callbacks inside.

PiperOrigin-RevId: 759072597
The autotuner compile util does not run any HLO passes. So disabling the triton
softmax pass is a no-op. We get rid of the triton fusion by taking the fusion
computation and running just some dedicated passes (like PriorityFusion).

PiperOrigin-RevId: 759075358
PiperOrigin-RevId: 759082316
We fixed the underlying issue with the subchannel dequantize ops sequence like below:
param->transpose->broadcast->bitcast->multiply->dot

Now we could remove the flag-flip from the tests

PiperOrigin-RevId: 759095250
…ne_parallelism_opt_level

PiperOrigin-RevId: 759104549
…y on zero termination

There's nothing guaranteeing that these references are terminated.

PiperOrigin-RevId: 759111351
PiperOrigin-RevId: 759118994
PiperOrigin-RevId: 759127256
This should make the swizzle mode more readable and does not mislead the reader by having it set to "0" before it has even considered which one it should be doing. swizzle_mode is specifically left out when it is unset.

PiperOrigin-RevId: 759134652
PiperOrigin-RevId: 759137056
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.