Skip to content

Bundle specific and generic arch build targets together.#780

Open
tsrw2048 wants to merge 1 commit into
rocm-jaxlib-v0.10.0from
bundle-specific-arch
Open

Bundle specific and generic arch build targets together.#780
tsrw2048 wants to merge 1 commit into
rocm-jaxlib-v0.10.0from
bundle-specific-arch

Conversation

@tsrw2048
Copy link
Copy Markdown

This PR is needed in order to build release wheels with generic targets while also passing all unit tests. ROCPrim currently has an issue with building cub sort for generic arch targets so the specific arch target should be used here instead.

This PR keeps the specific arch build targets used in upstream wheel building but also adds the generic arch build targets that were used in rocm-jax for release wheels.

This is needed in order to build release wheels with generic targets
while also passing all unit tests. ROCPrim currently has an issue with
building cub sort for generic arch targets so the specific arch target
should be used here instead.
Copy link
Copy Markdown

@magaonka-amd magaonka-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay to me. just keep an eye on plugin wheel size.

Comment thread build/rocm/rocm.bazelrc
common:rocm --copt=-Wno-gnu-offsetof-extensions
common:rocm --copt=-Qunused-arguments
common:rocm --repo_env=TF_ROCM_AMDGPU_TARGETS="gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1200,gfx1201"
common:rocm --repo_env=TF_ROCM_AMDGPU_TARGETS="gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1200,gfx1201,gfx9-4-generic,gfx10-3-generic,gfx11-generic,gfx12-generic"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we need all of these. Doesn't for example gfx9-4-generic cover gfx950 already?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this most lilkely fix sort kernel issues , with max_k unit tests???

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this most lilkely fix sort kernel issues , with max_k unit tests???

if the sort kernel issues are there, why do we add generic targets? why don't we keep specific targets until those issues are fixed?

Copy link
Copy Markdown
Author

@tsrw2048 tsrw2048 May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gulsumgudukbay We can drop generic arch build targets if needed. I was under the impression that we wanted to add them in and test with them so we would have a smoother transition to pure generic build targets in the future. As mentioned earlier, the main issue with generic build targets right now is the max_k tests that use the sort kernel.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gulsumgudukbay We can drop generic arch build targets if needed. I was under the impression that we wanted to add them in and test with them so it would we would have a smoother transition to pure generic build targets in the future. As mentioned earlier, the main issue with generic build targets right now is the max_k tests that use the sort kernel.

maybe it is better to keep specific targets until the issue is fixed. did we have any doc to track theRock issue @magaonka-amd?
if there's a bug we shouldn't use the generic targets. originally I proposed the generic targets to rocm-jax to reduce wheel size. but if they are buggy with TheRock and since this is the downstream repo (changes like these are candidates to upstream) we should be careful (and wait) in what we upstream.

Or if our goal is to test/track if the bug is fixed in downstream CI, we can keep only generic targets. Right now it will bloat the wheel size because we have duplicate targets

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gulsumgudukbay That makes sense. I will leave these as specific targets for now and close the PR then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants