Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add supports for AMD GPU #6161

Merged
merged 7 commits into from
May 24, 2023
Merged

Add supports for AMD GPU #6161

merged 7 commits into from
May 24, 2023

Conversation

yaomingamd
Copy link
Contributor

@yaomingamd yaomingamd commented Mar 16, 2023

Fixes # .

Description

Add supports for AMD GPU:
1. add docker file "Dockerfile.amd " for creating docker image to support AMD GPU. The file is based on "Dockerfile" for NVIDIA GPU
2. In monai/_extensions/gmm/gmm_cuda.cu , replaces __shfl_down_sync and __shfl_xor_sync with __shfl_down and __shfl_xor when build for AMD GPU.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

yaomingamd and others added 3 commits March 15, 2023 22:18
     1. replace __shfl_down_sync(0xffffffff, ...) with __shfl_down(...)
                __shfl_xor_sync(0xffffffff,...) with __shfl_xor(...)
        for AMD ROCm HIP
     2. replace dim3 integer vector initialization { , , } with dim3( , , )
        for AMD ROCm HIP
@wyli
Copy link
Member

wyli commented Mar 21, 2023

thanks for the PR, we don't currently have the ci/cd resources to verify/test the changes, is there any ci runners we can use to test and for future maintenance?

@yaomingamd
Copy link
Contributor Author

Thanks for the quick response. Currently we are considering different options to run the ci/cd on AMD hardware. We would like to discuss with you at your convenience, so that we can setup ci to fit current ci pipeline of MONAI project better.

@japarada
Copy link

japarada commented May 8, 2023

thanks for the PR, we don't currently have the ci/cd resources to verify/test the changes, is there any ci runners we can use to test and for future maintenance?

HI @wyli Would it be possible for us to setup a call to discuss how we can integrate this PR ? We would like to understand how we can help in setting up the CI/CD resources for validating the changes. Thank you.

@wyli
Copy link
Member

wyli commented May 15, 2023

/black

HI @wyli Would it be possible for us to setup a call to discuss how we can integrate this PR ? We would like to understand how we can help in setting up the CI/CD resources for validating the changes. Thank you.

sure @yaomingamd @japarada, let's have a call, please send your contact details to wenqil@nvidia.com I'll make a meeting time poll and invite colleagues.

@ericspod
Copy link
Member

As we discussed let's remove the Dockerfile from this PR for now since we'll refactor the existing file to allow for selection of image by variable which will let us choose AMD images. The changes otherwise are fine since they pass our tests and we'll have to assume for now our notional AMD tests will pass also. One thing that does have to be done is to sign your commits to get DCO to pass, clicking on details will explain how to do a remedial commit for that. Thanks!

yaomingamd and others added 4 commits May 24, 2023 10:48
I, Yaoming Mu <yaoming.mu@amd.com>, hereby add my Signed-off-by to this commit: 018609c
I, Yaoming Mu <yaoming.mu@amd.com>, hereby add my Signed-off-by to this commit: a8cfcf6

As discussed with MONAI team, moves DOckerfile.amd from this PR. The existing file "Dockerfile" will be refactored to allow build images for different GPUs.

Signed-off-by: Yaoming Mu <yaoming.mu@amd.com>
…duce the amount of codes. Currently AMD HIP does not support __shfl_xxx_sync functions as NVIDIA CUDA, but when mask is 0xffffffff, __shfl_xxx_sync would be replaced by __shfl_xxx for the codes in gmm_cuda.cu

run unittest tests/test_gmm.py for NVIDIA and AMD GPUs and tests passed.

Signed-off-by: Yaoming Mu <yaoming.mu@amd.com>
Signed-off-by: monai-bot <monai.miccai2019@gmail.com>
@wyli
Copy link
Member

wyli commented May 24, 2023

/build

Copy link
Member

@wyli wyli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it looks good to me, merging for more tests..

@wyli wyli enabled auto-merge (squash) May 24, 2023 20:41
@wyli wyli merged commit b60f69e into Project-MONAI:dev May 24, 2023
33 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants