Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalizing GPU Indexing PR changes #1422

Closed
MrBurmark opened this issue Jan 10, 2023 · 2 comments
Closed

Generalizing GPU Indexing PR changes #1422

MrBurmark opened this issue Jan 10, 2023 · 2 comments

Comments

@MrBurmark
Copy link
Member

MrBurmark commented Jan 10, 2023

This is a capture of some of the discussion in our meeting on 1/10.
This refers to #1334.
Please add anything I forgot to mention.

  • Break into separate PRs for forall and kernel changes
  • Add separate policies for scan and sort because new forall policies contain thread mapping information that is not relevant to scan and sort. Though I suppose that it could be relevant if we ever implemented them ourselves.
  • kernel policies can specify thread mapping without block mapping, but both should always be mapped.
    • use global with grid_size set to 1 to specify 1 block instead of just a thread policy.
  • kernel launch policy can be inconsistent with use
    • require higher block size in the policies inside the launch.
  • kernel policies can contradict each other leading to incorrect mapping behavior
    • This policy snippet would launch a kernel with block size 256 but the first for would map incorrectly because it expected a block size of 128, for< direct_thread_x<128>, lambda<0> >, for< direct_thread_x<256>, lambda<0> >
  • optimize policies with grid_size or block_size set to 1, index is always 0
@rhornung67
Copy link
Member

This covers everything I can think of. Thanks.

@rhornung67
Copy link
Member

@MrBurmark can this be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants