[cudax] Implement _this_ hierarchy groups#7603
Conversation
552790d to
f7ed1c5
Compare
This comment has been minimized.
This comment has been minimized.
|
|
||
| _CCCL_TEMPLATE(class _HierarchyLike) | ||
| _CCCL_REQUIRES(__is_or_has_hierarchy_member_v<_HierarchyLike>) | ||
| [[nodiscard]] _CCCL_DEVICE_API auto this_thread(const _HierarchyLike& __hier_like) noexcept |
There was a problem hiding this comment.
Stream of thoughts: Technically we don't have to carry over the choice to use this_X functions.
Options include:
- this_group(level), level.group(), some other variations of similar things
- Just constructors
But there is also no issue with it staying as is
There was a problem hiding this comment.
Even though I'd love to use something more generic (as the options you listed), I find
auto warp = cuda::this_warp(hierarchy);more readable than any of:
// sounds like we are grouping all warps in the hierarchy
auto warp = cuda::warp.group(hierarchy);
// I don't know what's going on at first glance
auto warp = cuda::this_group(cuda::warp, hierarchy);
// it's not clear that it's the group of the current warp
cuda::warp_group warp{hierarchy};I would love to make the level.group(h) work, but I don't find it easy to read and understand. Even level.this_group(h) doesn't seem better to me.
I would keep it as is for now :)
| }; | ||
|
|
||
| template <class _Hierarchy> | ||
| class thread_group<_Hierarchy, __this_hierarchy_group_kind> : __this_hierarchy_group_base<thread_level, _Hierarchy> |
There was a problem hiding this comment.
Technically thread_group in CG was a general polymorphic type for all types of groups. It fits better with the naming scheme for it to be a single thread, but it will be confusing. We don't need to resolve it right now, but I wanted to note that down.
In general I don't know if we need that polymorphic type anymore or if we need a single thread group
There was a problem hiding this comment.
That's a good point. But we will need the thread_group for cub::ThreadMeow algorithms in the future.
As you say, I would deal with this later :)
This comment has been minimized.
This comment has been minimized.
🥳 CI Workflow Results🟩 Finished in 14m 24s: Pass: 100%/48 | Total: 3h 24m | Max: 14m 03s | Hits: 99%/23529See results here. |
|
|
||
| _CCCL_DEVICE_API inline void __grid_sync() | ||
| { | ||
| const auto __bar_ptr = &::cuda::experimental::__cg_imported::__grid_workspace_ptr()->__barrier_; |
There was a problem hiding this comment.
Should we store this in the group?
CG was also validating it in case someone forgot to use the cooperative launch, so it wouldn't just fault on the device. We could also do validation, but maybe skip it if we construct the group from a config object with the cooperative launch option?
There was a problem hiding this comment.
I don't see the reason, why we should store the pointer when it's available through the envregs.
Regarding the validation, I moved it to the place where we obtain the pointer from the envregs :)
There was a problem hiding this comment.
We would need to check the generated code, but I believe the compiler can't store the result in registers and will need to generate a full constant bank read every time you call a sync, also do the validity check every time. I am pretty sure storing it in a register will end up being faster at the cost of extra registers
* [cudax] Implement this hierarchy groups * fix grid sync * fixes
Closes #7618.