Skip to content

Implement register stealing #3584

@zasdfgbnm

Description

@zasdfgbnm

Change our warp specialized kernel to something like:

if load-warp:
  asm volatile("{setmaxnreg.dec.sync.aligned.u32 56; \n\t}");
  do the work
  return; # Super important!
else:
  asm volatile("{setmaxnreg.inc.sync.aligned.u32 224; \n\t}");
  do the work

This seems to improve performance, see experiment at #3566

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions