Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMPTarget: Use OpenMP atomics for atomic_add. #6970

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

rgayatri23
Copy link
Contributor

The PR uses atomic operation from OpenMP spec instead of DESUL atomics for llvm/18 as they cause compiler segfaults.

@rgayatri23 rgayatri23 added Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) Backend - OpenMPTarget labels Apr 30, 2024
@rgayatri23 rgayatri23 self-assigned this Apr 30, 2024
Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm that the issue is only affecting atomic_add and none of the other atomic functions we provide.
Also please elaborate how you got Clang to segfault. What were you building? Was it Kokkos test suite? Are you able to reproduce with a simple Kokkos::parallel_for? etc.

Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense to fix this in desul directly?

@rgayatri23
Copy link
Contributor Author

Please confirm that the issue is only affecting atomic_add and none of the other atomic functions we provide. Also please elaborate how you got Clang to segfault. What were you building? Was it Kokkos test suite? Are you able to reproduce with a simple Kokkos::parallel_for? etc.

Did not try anything other than atomic_add yet.
I am trying to come up with a small example where this fails but of-course it passes for simple cases :-/
This might also be due to the use of new dynamic shared memory extensions that are used with llvm/18. The failing kernel was using that feature to atomically update main memory by reading value from scratch memory.

@rgayatri23
Copy link
Contributor Author

Wouldn't it make more sense to fix this in desul directly?

Yeah that can also work. I am fine with either.

@rgayatri23
Copy link
Contributor Author

I guess the first issue is do we want to do this if the problem only arises in very specific cases and with just one version of the compiler?
As mentioned before, the upstream llvm seems to work fine which will be the basis for llvm/19

@masterleinad
Copy link
Contributor

I guess the first issue is do we want to do this if the problem only arises in very specific cases and with just one version of the compiler?

We should try fixing for the minimum compiler version supported for OpenMPTarget or raise the minimum compiler version.

@rgayatri23
Copy link
Contributor Author

Yeah we are setting the minimum version to 17 as of now https://github.com/kokkos/kokkos/blob/develop/cmake/kokkos_enable_devices.cmake#L95-L98

17 passes most of the enabled tests on both V100 and A100. With 18, I saw a few failing tests on A100.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend - OpenMPTarget Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants