Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align shared memory in fold & scan (only shuffle) #96

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ivogabe
Copy link
Contributor

@ivogabe ivogabe commented Nov 17, 2023

Description

This PR ensures that allocations of shared memory are properly aligned.

Motivation and context

Previously, shared memory was allocated without any padding. This caused that reads and stores may be misaligned, for instance when scanning an array containing (Bool, Int).

In particular, this may occur in the implementation of segmented scans. Segmented scans are typically implemented by pairing a value with a flag, as (Bool, a). However, if one implements it as (a, Bool), and the size of the allocated array is not a multiple of the alignment of a, then this bug will trigger. Reads into the array of as will be misaligned.

This PR only fixes this issue for folds and scans using shuffle instructions.
Fixing this for folds and scans on onlder hardware is possible, but probably not worth it given the age of that hardware and complexity of the fix. I would thus propose to drop support for compute capabilities before 3.0.

How has this been tested?

Using various applications of scans, including segmented scans defined with (a, Bool), on our RTX 4090.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Previously, shared memory was allocated without any padding.
This caused that reads and stores may be misaligned,
for instance when scanning an array containing (Bool, Int).
This commit only fixes this issue for folds and scans using shuffle
instructions.
Fixing this for folds and scans on onlder hardware is possible,
but probably not worth it given the age of that hardware and
complexity of the fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant