Skip to content

Conversation

@github-actions
Copy link

Daily sync with upstream

mar-muel and others added 30 commits March 14, 2025 19:02
…ighted sampling without replacement in `jax.random.choice`
This is an exact port of the current Python implementation to C++ for speed.

I am being careful not to change the topological order we return in any way in this change, although we may do so in a future change.

PiperOrigin-RevId: 737014989
PiperOrigin-RevId: 737051146
There's no reason why not two custom vmappable types cannot share the same spec_type. However, spec_types was a set, which can cause bugs / exceptions.

Suppose that I register two vmappable data_types sharing the same spec_type, and then unregister one of the two. Then, the spec_type is no longer in the set to support the second data_type. Also, an exception will be raised if I try to unregister the two vmappable types (the second call to spec_types.remove).

When unregistering a data type, instead of removing its spec_type from the set, we regenerate the set from the remaining vmappable types.

PiperOrigin-RevId: 737280270
This lets us save on 2 ALU instructions (3x select becomes 1x prmt).

PiperOrigin-RevId: 737550598
These docstrings do not make the tests any more clear and typically just duplicate the test module name.

PiperOrigin-RevId: 737611977
…o vector length

We can now perform the conversion in groups of 2, 4 or even 8 elements at a time.

PiperOrigin-RevId: 737626600
This no longer appears to be used.

PiperOrigin-RevId: 737715578
Small cleanup, no functional changes intended.

PiperOrigin-RevId: 737727727
PiperOrigin-RevId: 737727935
This reduces the chances of overflowing a 32-bit integer when computing tile indices.
Add unit test to reproduce the overflow with the previous implementation of `blocked_fold_in`.

PiperOrigin-RevId: 737778853
The mesh is necessary to add support for clusters to the Mosaic GPU backend.

PiperOrigin-RevId: 737792129
emilyfertig and others added 10 commits March 17, 2025 16:49
…AndLoad()`.

This is to prepare for updating `PjRtClient::Compile()` to return an unloaded executable [1/N]

PiperOrigin-RevId: 737805623
XLA:GPU recently changed its endianness to little endian to better match LLVM
and the rest of the CUDA ecosystem, so we can lift the earlier restrictions.
PiperOrigin-RevId: 737934373
With default flushing, it is possible for events to be missed. We should only unsubscribe after we are finished with cupti.

PiperOrigin-RevId: 737939327
…MA friendly layouts

PiperOrigin-RevId: 737956598
This allows us to significantly simplify the generated PTX/SASS,
which is currently cluttered with LLVM trying to align slices to
start at bit 0 and failing to CSE the right shifts.

PiperOrigin-RevId: 737967890
Unswizzled MMAs don't lower correctly, and are not currently intended to be
supported.

PiperOrigin-RevId: 737981373
@github-actions github-actions bot requested a review from a team as a code owner March 18, 2025 15:18
@github-actions github-actions bot enabled auto-merge March 18, 2025 15:18
@charleshofer charleshofer disabled auto-merge March 18, 2025 15:19
@charleshofer charleshofer merged commit c46b4fc into rocm-main Mar 18, 2025
8 checks passed
@charleshofer charleshofer deleted the ci-upstream-sync-151_1 branch March 18, 2025 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.