Skip to content

Conversation

@jerrymannil
Copy link
Collaborator

@jerrymannil jerrymannil commented Jul 15, 2025

cherry-pick of pytorch@e4adf5d

We need -fgpu-rdc for projects such as DeepEP + rocSHMEM. The default of -no-gpu-rdc doesn't work for such cases.

As per pytorch#152432 (comment): "rocshmem shares the same global variable in different files, as deepEP uses CUDAExtention to build the project https://github.com/deepseek-ai/DeepEP/blob/65e2a700f0330f3fb1c26f49a0250d1f9d0ac1e3/setup.py#L51 and depends on rocshmem, this -fgpu-rdc is needed. The current logic in Pytorch prevents users from overriding this flag."

Pull Request resolved: pytorch#152432
Approved by: https://github.com/jeffdaily

Cherry-picked to release/2.6 branch via #2378

Cherry-picked to rocm7.0_internal_testing branch via #2379

…2432)

We need -fgpu-rdc for projects such as DeepEP + rocSHMEM. The default of -no-gpu-rdc doesn't work for such cases.

As per pytorch#152432 (comment):
"rocshmem shares the same global variable in different files, as deepEP uses CUDAExtention to build the project https://github.com/deepseek-ai/DeepEP/blob/65e2a700f0330f3fb1c26f49a0250d1f9d0ac1e3/setup.py#L51 and depends on rocshmem, this -fgpu-rdc is needed. The current logic in Pytorch prevents users from overriding this flag."

Pull Request resolved: pytorch#152432
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
@rocm-repo-management-api
Copy link

Jenkins build for 3e255dfa8335cc9c40699571b5afbdc586331274 commit is in progress
Links: Blue Ocean view / Build artifacts

@jithunnair-amd jithunnair-amd merged commit cd0f7aa into release/2.7 Jul 16, 2025
0 of 2 checks passed
@jithunnair-amd jithunnair-amd deleted the 2.7_gpu_rdc_flag_support branch July 16, 2025 03:10
@jithunnair-amd jithunnair-amd changed the title [ROCm] cpp_extension allow user to override default flags (#152432) [release/2.7] [ROCm] cpp_extension allow user to override default flags (#152432) Jul 16, 2025
@jerrymannil
Copy link
Collaborator Author

! cherry-pick --onto release/2.6 rocm7.0_internal_testing

okakarpa pushed a commit that referenced this pull request Jul 16, 2025
…2432) (#2374)

cherry-pick of
pytorch@e4adf5d

We need -fgpu-rdc for projects such as DeepEP + rocSHMEM. The default of
-no-gpu-rdc doesn't work for such cases.

As per
pytorch#152432 (comment):
"rocshmem shares the same global variable in different files, as deepEP
uses CUDAExtention to build the project
https://github.com/deepseek-ai/DeepEP/blob/65e2a700f0330f3fb1c26f49a0250d1f9d0ac1e3/setup.py#L51
and depends on rocshmem, this -fgpu-rdc is needed. The current logic in
Pytorch prevents users from overriding this flag."

Pull Request resolved: pytorch#152432
Approved by: https://github.com/jeffdaily

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
okakarpa pushed a commit that referenced this pull request Jul 16, 2025
…2432) (#2374)

cherry-pick of
pytorch@e4adf5d

We need -fgpu-rdc for projects such as DeepEP + rocSHMEM. The default of
-no-gpu-rdc doesn't work for such cases.

As per
pytorch#152432 (comment):
"rocshmem shares the same global variable in different files, as deepEP
uses CUDAExtention to build the project
https://github.com/deepseek-ai/DeepEP/blob/65e2a700f0330f3fb1c26f49a0250d1f9d0ac1e3/setup.py#L51
and depends on rocshmem, this -fgpu-rdc is needed. The current logic in
Pytorch prevents users from overriding this flag."

Pull Request resolved: pytorch#152432
Approved by: https://github.com/jeffdaily

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
@okakarpa
Copy link
Collaborator

jerrymannil added a commit that referenced this pull request Jul 16, 2025
…2432) (#2374)

cherry-pick of
pytorch@e4adf5d

We need -fgpu-rdc for projects such as DeepEP + rocSHMEM. The default of
-no-gpu-rdc doesn't work for such cases.

As per
pytorch#152432 (comment):
"rocshmem shares the same global variable in different files, as deepEP
uses CUDAExtention to build the project
https://github.com/deepseek-ai/DeepEP/blob/65e2a700f0330f3fb1c26f49a0250d1f9d0ac1e3/setup.py#L51
and depends on rocshmem, this -fgpu-rdc is needed. The current logic in
Pytorch prevents users from overriding this flag."

Pull Request resolved: pytorch#152432
Approved by: https://github.com/jeffdaily

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
jerrymannil added a commit that referenced this pull request Jul 16, 2025
…2432) (#2374)

cherry-pick of
pytorch@e4adf5d

We need -fgpu-rdc for projects such as DeepEP + rocSHMEM. The default of
-no-gpu-rdc doesn't work for such cases.

As per
pytorch#152432 (comment):
"rocshmem shares the same global variable in different files, as deepEP
uses CUDAExtention to build the project
https://github.com/deepseek-ai/DeepEP/blob/65e2a700f0330f3fb1c26f49a0250d1f9d0ac1e3/setup.py#L51
and depends on rocshmem, this -fgpu-rdc is needed. The current logic in
Pytorch prevents users from overriding this flag."

Pull Request resolved: pytorch#152432
Approved by: https://github.com/jeffdaily

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants