Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce KOKKOS_DEDUCTION_GUIDE macro to allow user-defined deduction guide in device code for clang compiler #6954

Merged
merged 7 commits into from
May 31, 2024

Conversation

tpadioleau
Copy link
Contributor

@tpadioleau tpadioleau commented Apr 18, 2024

We recently came across a compiler error when using the deduction guide of TeamThreadMDRange on a HIP compilation error

error: reference to __host__ function '<deduction guide for TeamThreadMDRange><Kokkos::Impl::HIPTeamMember, int, int>' in __host__ __device__ function

I could reproduce it on godbolt https://godbolt.org/z/GTdeY77KT. My understanding is that clang requires the user-defined deduction guide to be annotated to be used inside a device function.

Based on the suggestions on slack, this PR annotates the user-defined deduction guides using a new macro KOKKOS_DEDUCTION_GUIDE. I have used this macro to annotate guides associated to constructors annotated KOKKOS_FUNCTION.

Should we introduce tests for that ?

cc @etiennemlb

@dalg24 dalg24 changed the title Fix user-defined deduction guide used in device code Introduce KOKKOS_DEDUCTION_GUIDE macro to suppress warnings about user-defined deduction guide used in device code in HIP Apr 18, 2024
@@ -27,6 +27,8 @@
#define KOKKOS_LAMBDA [=] __host__ __device__
#define KOKKOS_CLASS_LAMBDA [ =, *this ] __host__ __device__

#define KOKKOS_DEDUCTION_GUIDE __host__ __device__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation here is the same as for KOKKOS_FUNCTION and the deduction guides changed here are for KOKKOS_FUNCTIONs anyway. In my opinion, using KOKKOS_FUNCTION instead of introducing a new macro matches more closely the intent of deduction guides (by replicating the constructor's signature).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point, just wondering if annotating something that is not a function by KOKKOS_FUNCTION would be confusing ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just follows the syntax for deduction guides so it's as confusing and more consistent.

edit: fixed typo syntax vs semantics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we should limit the number of (public) macros we define since they are a pain to deal with later if we consider removing them since we can't deprecate them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, using KOKKOS_FUNCTION instead of introducing a new macro matches more closely the intent of deduction guides (by replicating the constructor's signature).

I agree with @masterleinad on this.

I don't really see how introducing a new macro improves the code. Worse it makes a bug more likely. Both this PR and mdspan fix the issue for HIP but not for CUDA clang...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe though to assume that compilers need the same annotations between deduction guides and constructors ?

That's what we are observing, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far yes i agree

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just follows the semantics for deduction guides

I don't understand that argument. Did you mean syntax? Please clarify.

To the consistency argument, I would like to point out there is prior art in Kokkos, with KOKKOS_DEFAULTED_FUNCTION and KOKKOS_INLINE_DELETED_FUNCTION

(Blame game for naming that `KOKKOS_INLINE_DELETED_FUNCTION` macro) https://github.com/kokkos/kokkos/commit/3a439317142573f0f6b39e2cd61934fb5e56ca72

You could argue that these are not needed any more and we are stuck with them, but I also want to look back and see that these were also special syntaxes for the compiler for which it was also not clear what to do about __host__ __device__ annotations (especially for deleted functions!). Experience has shown that compilers behavior changed and we had to discriminate workarounds based on vendor toolchain versions.

If we use KOKKOS_FUNCTION to decorate our deduction guides, teach our users to do so, and there is even only one compiler that we need to support and that does not like the host device annotation, then we will have to do the distinction. We will be able to fix it in our codebase but it will be harder to propagate downstream.
That said I have no evidence it will break and I haven't surveyed what currently works.

Worse it makes a bug more likely. Both this PR and mdspan fix the issue for HIP but not for CUDA clang...

I don't see how it makes it more bug prone. Please elaborate.
Regarding Clang+CUDA, did you check that we see the issue when the annotation are missing. I would be surprised if that was not considered in the reference implementation of mdspan. That is possible but I expect someone would have asked the question. @crtrott do you remember?

I don't like the proliferation of public macros either but I tend to think it is justified in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how it makes it more bug prone. Please elaborate.

Different people wrote different versions of this PR and they both forgot to fix CUDA clang. If you didn't introduce the new macro and instead use KOKKOS_FUNCTION, the PR would have fixed the issue.

Regarding Clang+CUDA, did you check that we see the issue when the annotation are missing

If you look at the godbolt code that @tpadioleau posted. You can see that he is using CUDA clang to reproduce the error not HIP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding Clang+CUDA, did you check that we see the issue when the annotation are missing

If you look at the godbolt code that @tpadioleau posted. You can see that he is using CUDA clang to reproduce the error not HIP.

Haha damn. Missed the compiler explorer link, I was on my cell.

@tpadioleau
Copy link
Contributor Author

tpadioleau commented Apr 18, 2024

@dalg24 Note that it is not just a warning but an error, I have updated the compiler message.

@dalg24
Copy link
Member

dalg24 commented Apr 18, 2024 via email

@tpadioleau tpadioleau changed the title Introduce KOKKOS_DEDUCTION_GUIDE macro to suppress warnings about user-defined deduction guide used in device code in HIP Introduce KOKKOS_DEDUCTION_GUIDE macro to allow user-defined deduction guide in device code in HIP Apr 18, 2024
@tpadioleau tpadioleau marked this pull request as draft April 18, 2024 13:50
@tpadioleau
Copy link
Contributor Author

Back in draft the time to add tests

@crtrott
Copy link
Member

crtrott commented Apr 19, 2024

I believe a new macro is warranted. NVCC currently does not require the markup on constructors (though it works with them too). But my experience with markup on defaulted ctors is a thing which I believe warrants caution here. NVCC changed its requirements regarding requiring/allowing markup on defaulted functions over time. And generally speaking: deduction guides are NOT functions. In fact the signatures don't even need to match the signatures of any existing CTOR.

E.g. look at this: https://godbolt.org/z/fYPhK5Ec5

template <class T>
struct Foo {
    __host__ __device__ Foo(float ,int = 0) {}
};

template <class T>
__host__ __device__ Foo(T) -> Foo<T>; 

The deduction guide looks nothing like the ctor, but it will work as long as you try to construct from something convertible to float.

So it's not a function: it's a rule. Introducing a new macro for that seems warranted to me.

@Rombur
Copy link
Member

Rombur commented Apr 19, 2024

In fact the signatures don't even need to match the signatures of any existing CTOR.

In that case, I agree that we should use two different macros.

@tpadioleau tpadioleau force-pushed the fix-device-deduction-guide-error branch 6 times, most recently from 40f8bed to da61c12 Compare April 23, 2024 12:49
@tpadioleau tpadioleau marked this pull request as ready for review April 23, 2024 15:30
@tpadioleau tpadioleau changed the title Introduce KOKKOS_DEDUCTION_GUIDE macro to allow user-defined deduction guide in device code in HIP Introduce KOKKOS_DEDUCTION_GUIDE macro to allow user-defined deduction guide in device code for clang compiler Apr 23, 2024
@crtrott
Copy link
Member

crtrott commented Apr 23, 2024

There is a real issue with OpenMPTarget @rgayatri23 can you take a look too?

@crtrott
Copy link
Member

crtrott commented Apr 23, 2024

@tpadioleau the OpenMPTarget failure looks real. Not sure when @rgayatri23 will have time so if you can track it down that would be great. Possibly we can just disable the test for now for OpenMPTarget.

Comment on lines 159 to 167
using ReducerTypeExpected =
Kokkos::MaxFirstLocCustomComparator<ScalarType, IndexType,
FakeComparator, MemorySpace>;
using ValueType = ReducerTypeExpected::value_type;
Kokkos::View<ValueType, MemorySpace> view;
FakeComparator comparator;
Kokkos::MaxFirstLocCustomComparator reducer(view, comparator);
check_types<ReducerTypeExpected>(team_handle, reducer);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenMPTarget only supports

struct OpenMPTargetReducerWrapper<Sum<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<Prod<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<Min<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<Max<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<LAnd<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<LOr<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<BAnd<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<BOr<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<MinLoc<Scalar, Index, Space>> {
struct OpenMPTargetReducerWrapper<MaxLoc<Scalar, Index, Space>> {
struct OpenMPTargetReducerWrapper<MinMax<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<MinMaxLoc<Scalar, Index, Space>> {
struct OpenMPTargetReducerWrapper<MaxFirstLoc<Scalar, Index, Space>> {
struct OpenMPTargetReducerWrapper<MinFirstLoc<Scalar, Index, Space>> {
struct OpenMPTargetReducerWrapper<MinMaxFirstLastLoc<Scalar, Index, Space>> {
struct OpenMPTargetReducerWrapper<FirstLoc<Index, Space>> {
struct OpenMPTargetReducerWrapper<LastLoc<Index, Space>> {
struct OpenMPTargetReducerWrapper<StdIsPartitioned<Index, Space>> {
struct OpenMPTargetReducerWrapper<StdPartitionPoint<Index, Space>> {

at this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we will have to implement MaxFirstLocCustomComparator for the OpenMPTarget backend.

@rgayatri23
Copy link
Contributor

@tpadioleau - Can you disable that test for the OpenMPTarget backend for now?

@tpadioleau
Copy link
Contributor Author

@tpadioleau - Can you disable that test for the OpenMPTarget backend for now?

Done in d954dc4

I also notice a failure on the OpenACC backend, should I disable it ?

Comment on lines 59 to 61
#if defined(__clang__)
#define KOKKOS_DEDUCTION_GUIDE __host__ __device__
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if we removed the guard. nvcc seems to be fine with it as well, see https://godbolt.org/z/fY1T7TE6T.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with both solutions though this introduces a change for a working compiler.

@masterleinad
Copy link
Contributor

masterleinad commented May 9, 2024

I also notice a failure on the OpenACC backend, should I disable it ?

The errors look like

/var/jenkins/workspace/Kokkos_PR-6954/core/src/OpenACC/Kokkos_OpenACC_ParallelReduce_Team.hpp", line 242: error: no instance of constructor "Kokkos::Sum<Scalar, Space>::Sum [with Scalar=<unnamed>::TestNestedReducerCTAD::ScalarType, Space=<unnamed>::TestNestedReducerCTAD::MemorySpace]" matches the argument list
    ValueType tmp = ValueType();
                    ^
          detected during:
            instantiation of "void Kokkos::parallel_reduce(const Kokkos::Impl::TeamVectorRangeBoundariesStruct<iType, Kokkos::Impl::OpenACCTeamMember> &, const Lambda &, ValueType &) [with iType=int, Lambda=<unnamed>::TestNestedReducerCTAD::FakeFunctor<int>, ValueType=const Kokkos::Sum<<unnamed>::TestNestedReducerCTAD::ScalarType, <unnamed>::TestNestedReducerCTAD::MemorySpace>]" at line 46 of "/var/jenkins/workspace/Kokkos_PR-6954/core/unit_test/TestNestedReducerCTAD.cpp"
            instantiation of "void <unnamed>::TestNestedReducerCTAD::check_types<ReducerTypeExpected,TeamHandle,ReducerTypeToCheck>(const TeamHandle &, const ReducerTypeToCheck &) [with ReducerTypeExpected=Kokkos::Sum<<unnamed>::TestNestedReducerCTAD::ScalarType, <unnamed>::TestNestedReducerCTAD::MemorySpace>, TeamHandle=<unnamed>::TestNestedReducerCTAD::TeamHandle, ReducerTypeToCheck=Kokkos::Sum<<unnamed>::TestNestedReducerCTAD::ScalarType, <unnamed>::TestNestedReducerCTAD::MemorySpace>]" at line 55 of "/var/jenkins/workspace/Kokkos_PR-6954/core/unit_test/TestNestedReducerCTAD.cpp"

and we have to do something about it. If we can't figure it out, we could consider disabling the test for OpenACC.

@nliber
Copy link
Contributor

nliber commented May 29, 2024

@masterleinad @tpadioleau I think the error in #6954 (comment) may be a legitimate bug unrelated to deduction guides.

ValueType is deducing to Sum<...> (which doesn't have a default constructor), but there is no OpenACC parallel_reduce for TeamVectorRangeBoundariesStruct with a constrained overload for a reducer (at least not one that I found in core/src/OpenACC/Kokkos_OpenACC_ParallelReduce_Team.hpp).

Contrast this with the parallel_reduce overloads for TeamThreadRangeBoundariesStruct and ThreadVectorRangeBoundariesStruct which do have different overloads for reducers and non-reducers.

(I'm approving the P/R as I don't think the error is related to deduction guides, but it still needs to be addressed before merging.)

@masterleinad
Copy link
Contributor

@masterleinad @tpadioleau I think the error in #6954 (comment) may be a legitimate bug unrelated to deduction guides.

I was sure that it was unrelated to this pull request. 🙂 We still have to fix the CI by disabling the test with a FIXME for OpenACC.

@tpadioleau
Copy link
Contributor Author

@masterleinad @nliber Sorry for my late reply and thank you for looking at the issue.

In the meantime the OpenACC issue is solved I could have a look at how to slightly change the test. I have in mind to remove the inner parallel_reduce. I know that I could not remove the outer parallel_for without breaking the test.

@masterleinad
Copy link
Contributor

In the meantime the OpenACC issue is solved I could have a look at how to slightly change the test. I have in mind to remove the inner parallel_reduce. I know that I could not remove the outer parallel_for without breaking the test.

I don't think anyone is working on a fix for the OpenACC issue.

@tpadioleau tpadioleau force-pushed the fix-device-deduction-guide-error branch from ca5a08a to 99134c6 Compare May 30, 2024 07:06
@tpadioleau tpadioleau force-pushed the fix-device-deduction-guide-error branch from 2f9e42a to 2910db4 Compare May 30, 2024 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants