Generalizing GPU Indexing PR changes #1422

MrBurmark · 2023-01-10T18:19:00Z

This is a capture of some of the discussion in our meeting on 1/10.
This refers to #1334.
Please add anything I forgot to mention.

Break into separate PRs for forall and kernel changes
Add separate policies for scan and sort because new forall policies contain thread mapping information that is not relevant to scan and sort. Though I suppose that it could be relevant if we ever implemented them ourselves.
kernel policies can specify thread mapping without block mapping, but both should always be mapped.
- use global with grid_size set to 1 to specify 1 block instead of just a thread policy.
kernel launch policy can be inconsistent with use
- require higher block size in the policies inside the launch.
kernel policies can contradict each other leading to incorrect mapping behavior
- This policy snippet would launch a kernel with block size 256 but the first for would map incorrectly because it expected a block size of 128, for< direct_thread_x<128>, lambda<0> >, for< direct_thread_x<256>, lambda<0> >
optimize policies with grid_size or block_size set to 1, index is always 0

The text was updated successfully, but these errors were encountered:

rhornung67 · 2023-01-10T20:15:15Z

This covers everything I can think of. Thanks.

rhornung67 · 2023-07-14T17:34:18Z

@MrBurmark can this be closed?

rhornung67 added performance cuda support hip support labels Mar 15, 2023

rhornung67 added this to the April 2023 Release milestone Apr 3, 2023

rhornung67 assigned MrBurmark Apr 18, 2023

rhornung67 modified the milestones: April 2023 Release Candidates, April-May 2023 Release Final Apr 18, 2023

MrBurmark closed this as completed Jul 14, 2023

Provide feedback