Tfci debug part2 #11

rnitin1908 · 2025-11-03T09:50:59Z

No description provided.

PiperOrigin-RevId: 756853517

…ngs::StrAppend`. `strings::StrCat` should eventually forward to `absl::StrCat`. Some references need to be rewritten as `absl::StrCat(absl::LegacyPrecision(...))` to avoid loss of precision. PiperOrigin-RevId: 756865936

PiperOrigin-RevId: 756867042

…ation environments PiperOrigin-RevId: 756881629

`.size()` and `operator[]` have a race condition. With this fix, the threads won't access the container itself; they just writes the elements they need to modify. PiperOrigin-RevId: 756902174

PiperOrigin-RevId: 756907571

…tion given. Currently, you have to specify the return type on ArrayTypeSwitch, but it is often redundant as it can be inferred from the functor provided. PiperOrigin-RevId: 756909206

…t::Compile` before MLIR -> XlaComputation conversion PjRt GPU applies additional transformations to preserve input/output layout, which is only implemented in `StreamExecutorGpuClient::CompileAndLoad` and not in `StreamExecutorGpuCompiler::Compile`. Doing the MLIR -> XlaComputation conversion without this logic causes custom layouts to be dropped. PiperOrigin-RevId: 756914304

CpuClient::CreateUninitializedBuffer. PiperOrigin-RevId: 756916747

PiperOrigin-RevId: 756921029

The version of `upb` used in tensorflow and XLA is incompatible with Clang. In particular, it generates a warning that the code uses a non-standard C++ feature. Since this version of `upb` has `-Werror` in its build opts, the warning breaks the build. We want to be able to compile PyTorch/XLA with clang, and PyTorch/XLA depends on `upb`. Therefore we need to make `upb` buildable with Clang. In this change, we remove `-Werror` from `upb`'s build opts to prevent the warnings generated by Clang to break the build. In general, we should never use `-Werror` on code that we don't directly control, as our ability for fixing the warnings in such code is limited. PiperOrigin-RevId: 756927870

We should never crash when printing an XLA construct, even when it's invalid. PiperOrigin-RevId: 756968212

PiperOrigin-RevId: 756980155

…ow the initial fusion worklist is formed for a current computation. PiperOrigin-RevId: 756997596

…fs using `tsl::SerializeToStringDeterministic` PiperOrigin-RevId: 757021740

PiperOrigin-RevId: 757035793

PiperOrigin-RevId: 757044032

PiperOrigin-RevId: 757044520

PiperOrigin-RevId: 757054230

default memory type. The configuration option `legacy_memory_space_behavior`, which currently defaults to true, controls whether the old or the new behavior is followed. PiperOrigin-RevId: 757057567

PiperOrigin-RevId: 757084827

PiperOrigin-RevId: 757085928

…attening. PiperOrigin-RevId: 757090767

PiperOrigin-RevId: 757091461

…mic shape The previous CL that changed this logic to use the on-device shape from the device buffer generates literals with invalid sizes when the PjRt buffer has a dynamic shape. PiperOrigin-RevId: 757091876

PiperOrigin-RevId: 757110017

PiperOrigin-RevId: 757152089

PiperOrigin-RevId: 757181155

PiperOrigin-RevId: 757198443

PiperOrigin-RevId: 757209184

With Triton multi-output fusions, we can have tuple results for fusions. Adjust the buffer comparison logic accordingly. PiperOrigin-RevId: 759002766

PiperOrigin-RevId: 759010423

PiperOrigin-RevId: 759024702

PiperOrigin-RevId: 759027121

PiperOrigin-RevId: 759045624

PiperOrigin-RevId: 759046188

PiperOrigin-RevId: 759046243

Also fix the BUILD file, so we do not skip testing this on H100. PiperOrigin-RevId: 759050327

propagate broadcast multiplier upwards through all the ops up to the parameter. The broadcast adds a new dim or with a bitcast expand an old dim. When the expansion happens we set the broadcast multiplier to the source instruction. But as of now if before the broadcast we have more than one instruction we reset the broadcast multiplier back to one. Lets not do that. PiperOrigin-RevId: 759051178

PiperOrigin-RevId: 759054070

PiperOrigin-RevId: 759068702

We weren't handling them correctly meaning you couldn't use a `shard_map`/`ManualComputationOp` which has callbacks inside. PiperOrigin-RevId: 759072597

The autotuner compile util does not run any HLO passes. So disabling the triton softmax pass is a no-op. We get rid of the triton fusion by taking the fusion computation and running just some dedicated passes (like PriorityFusion). PiperOrigin-RevId: 759075358

PiperOrigin-RevId: 759075627

PiperOrigin-RevId: 759082316

We fixed the underlying issue with the subchannel dequantize ops sequence like below: param->transpose->broadcast->bitcast->multiply->dot Now we could remove the flag-flip from the tests PiperOrigin-RevId: 759095250

PiperOrigin-RevId: 759095404

…ne_parallelism_opt_level PiperOrigin-RevId: 759104549

PiperOrigin-RevId: 759106677

…y on zero termination There's nothing guaranteeing that these references are terminated. PiperOrigin-RevId: 759111351

PiperOrigin-RevId: 759112955

PiperOrigin-RevId: 759118994

PiperOrigin-RevId: 759127256

PiperOrigin-RevId: 759133213

…_device_test.cc PiperOrigin-RevId: 759133817

This should make the swizzle mode more readable and does not mislead the reader by having it set to "0" before it has even considered which one it should be doing. swizzle_mode is specifically left out when it is unset. PiperOrigin-RevId: 759134652

PiperOrigin-RevId: 759135428

PiperOrigin-RevId: 759137056

tensorflower-gardener and others added 30 commits May 9, 2025 12:12

Update some tests.

1d07214

PiperOrigin-RevId: 756853517

Use low latency thread pool for async PjRT

c6d1c09

PiperOrigin-RevId: 756867042

Add an API to compilation environments to initialize all known compil…

0786172

…ation environments PiperOrigin-RevId: 756881629

Fix race condition in tensorflow/python/data/flat_map_utils.cc.

dcdf9b2

`.size()` and `operator[]` have a race condition. With this fix, the threads won't access the container itself; they just writes the elements they need to modify. PiperOrigin-RevId: 756902174

Reverts 745718a

fdde76d

PiperOrigin-RevId: 756907571

[XLA] Simplify ArrayTypeSwitch to infer the return type from the func…

67c3c16

…tion given. Currently, you have to specify the return type on ArrayTypeSwitch, but it is often redundant as it can be inferred from the functor provided. PiperOrigin-RevId: 756909206

Add CommonPjRtClient::CreateUninitializedBuffer and remove

67df22b

CpuClient::CreateUninitializedBuffer. PiperOrigin-RevId: 756916747

[XLA:benchmarks] Add a workflow file to generate GHA input matrices

b545f0b

PiperOrigin-RevId: 756921029

Fix a crash when printing an invalid Layout.

5e66c26

We should never crash when printing an XLA construct, even when it's invalid. PiperOrigin-RevId: 756968212

[PJRT:GPU] Remove unnecessary synchronization in donating a buffer.

dd50c44

PiperOrigin-RevId: 756980155

Provide flexibility to derived classes for specific backends w.r.t h…

54be9ea

…ow the initial fusion worklist is formed for a current computation. PiperOrigin-RevId: 756997596

Explicitly return an error when attempting to serialize >2GiB protobu…

73360c9

…fs using `tsl::SerializeToStringDeterministic` PiperOrigin-RevId: 757021740

Automated Code Change

500425c

PiperOrigin-RevId: 757035793

Automated Code Change

43624eb

PiperOrigin-RevId: 757044032

Automated Code Change

aad4df2

PiperOrigin-RevId: 757044520

Automated Code Change

02df8ab

PiperOrigin-RevId: 757054230

XLA:CPU: If configured, use more memory types and change the

bb6f9e4

default memory type. The configuration option `legacy_memory_space_behavior`, which currently defaults to true, controls whether the old or the new behavior is followed. PiperOrigin-RevId: 757057567

Update GraphDef version to 2223.

c994700

PiperOrigin-RevId: 757084827

compat: Update forward compatibility horizon to 2025-05-10

c2fd2dd

PiperOrigin-RevId: 757085928

Update CustomCallApiVersion for NopReturnToken in hlo_control_flow_fl…

7f03283

…attening. PiperOrigin-RevId: 757090767

Reverts changelist 714988843

e0a1ef5

PiperOrigin-RevId: 757091461

Fall back to the CPU tensor's shape when the device buffer has a dyna…

79c97f6

…mic shape The previous CL that changed this logic to use the on-device shape from the device buffer generates literals with invalid sizes when the PjRt buffer has a dynamic shape. PiperOrigin-RevId: 757091876

Automated Code Change

83b97e9

PiperOrigin-RevId: 757110017

Automated Code Change

e9c3902

PiperOrigin-RevId: 757152089

Emit a metric for unified model id of loaded models

26a5c2c

PiperOrigin-RevId: 757181155

[async-pjrt] Make H2D callback nonblocking

d41fd6c

PiperOrigin-RevId: 757198443

Remove the semaphore in ExecuteHelper

781baea

PiperOrigin-RevId: 757209184

akuegel and others added 30 commits May 14, 2025 23:53

[XLA:GPU] Let TritonFusionNumericsVerifier handle tuples correctly.

ae9a006

With Triton multi-output fusions, we can have tuple results for fusions. Adjust the buffer comparison logic accordingly. PiperOrigin-RevId: 759002766

Automated Code Change

24ba711

PiperOrigin-RevId: 759010423

Reverts 88a1218

37483e1

PiperOrigin-RevId: 759024702

Reverts dbd8b00

628eaad

PiperOrigin-RevId: 759027121

Update GraphDef version to 2228.

fdcab34

PiperOrigin-RevId: 759045624

compat: Update forward compatibility horizon to 2025-05-15

1dee7f3

PiperOrigin-RevId: 759046188

Automated Code Change

808bc63

PiperOrigin-RevId: 759046243

Fix determinism test to work with dynamic search space

39242d3

Also fix the BUILD file, so we do not skip testing this on H100. PiperOrigin-RevId: 759050327

Automated Code Change

2896df5

PiperOrigin-RevId: 759054070

Automated Code Change

ce308a4

PiperOrigin-RevId: 759068702

#sdy Properly handle token types in JAX and ManualComputationOp.

80ace93

We weren't handling them correctly meaning you couldn't use a `shard_map`/`ManualComputationOp` which has callbacks inside. PiperOrigin-RevId: 759072597

Performance nit: Call reserve in absl::Span constructor.

2697201

PiperOrigin-RevId: 759075627

Automated Code Change

68d52a2

PiperOrigin-RevId: 759082316

[XLA:GPU] Remove flag-flip that is not necessary anymore.

50a111a

We fixed the underlying issue with the subchannel dequantize ops sequence like below: param->transpose->broadcast->bitcast->multiply->dot Now we could remove the flag-flip from the tests PiperOrigin-RevId: 759095250

Add test for [de]serialization of BufferAllocationSlice.

c7df113

PiperOrigin-RevId: 759095404

[XLA:GPU] Remove p2p rewriter in favor of xla_gpu_experimental_pipeli…

5e518a1

…ne_parallelism_opt_level PiperOrigin-RevId: 759104549

Log the number of threads that joined the rendezvous on time.

59e82e1

PiperOrigin-RevId: 759106677

[XLA] Fix absl::string_view -> llvm::StringRef conversions to not rel…

bd72f79

…y on zero termination There's nothing guaranteeing that these references are terminated. PiperOrigin-RevId: 759111351

[XLA:GPU] Reduce test time by reducing the tensor dimensions.

e21acad

PiperOrigin-RevId: 759112955

[Cleanup] Use CHECK_NOTNULL and CHECK_OK

5ba8da2

PiperOrigin-RevId: 759118994

[XLA:GPU] Move collective_select_folder.cc

4c83114

PiperOrigin-RevId: 759127256

[xla:python] Restrict visibility of public targets.

acc83b4

PiperOrigin-RevId: 759133213

[XLA:GPU] Reduce the batch size of test tensor in fusion_emitter_int4…

1a13c97

…_device_test.cc PiperOrigin-RevId: 759133817

Replace uses of zero-sized parameters with constants

97f83e5

PiperOrigin-RevId: 759135428

Automated Code Change

81cc34b

PiperOrigin-RevId: 759137056

Add debug to TFCI

5a809c2

Update debug_tfci.sh

ba86426

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tfci debug part2 #11

Tfci debug part2 #11

Uh oh!

rnitin1908 commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

69 participants

Tfci debug part2 #11

Are you sure you want to change the base?

Tfci debug part2 #11

Uh oh!

Conversation

rnitin1908 commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

69 participants