Feature/byte array #1573

maddyscientist · 2025-06-10T19:06:03Z

Small feature PR that adds a new class byte_array which is an indexable array type for 8-bit types (signed / unsigned chars) of length 8. This is useful, as it's exactly what we use for the various kernels that do path tracing around the links of the lattice.

With this class, we can use it instead of thread_array, the advantages of byte_array being that it is lower latency, and does not consume any shared memory. E.g., saving more L1 cache which can be helpful for performance. On the kernels I tested, I saw either performance neutrality or a perf uplift.

Also included here:

Remove use of KernelOps where possible due to removal of shared-memory storage from using byte_array instead of thread_array
Fix stack frame induced in DeGrandRossi contraction kernel (was due to some run-time indexing)
Fix stack frame induced in pure-gauge heath bath kernel (was due to run-time indexing in local array)
Fix stack frame in plaquette rectangle action (run-time indexing and running out of registers)

maddyscientist · 2025-06-10T22:21:43Z

@jcosborn FYI, this PR removes the use of thread_array (and thus KernelOps) from a whole much of kernels. This approach is faster though, and takes less resources.

include/kernels/contraction.cuh

weinbe2

This looks good and passes my various tests (including correctness) with large (> 255) local dimensions, focusing on fat/long link construction. I'm seeing ~perf parity on H100-80GB-SXM5.

This has my approval conditional on csci tests passing, but (last I checked) it seems like it's having domain resolution issues...

weinbe2 · 2025-06-16T15:59:23Z

Update, I see that these issues are identical to the issues being tracked in #1572 and orthogonal to this PR. All other tests are passing. For this reason I'm going to merge this PR.

maddyscientist added 7 commits June 10, 2025 11:36

Add new byte_array type which is a indexable length-4 array of bytes.

b03f241

Replace use of thread_array with byte_array

9ceec1f

Use byte_array for gauge_heatbath: this fixes the stackframe

3c3be12

Fix stack frame with DeGrandRossi contraction

e2b3118

llfat compute staple kernel now uses byte_array

e9e76c5

Fix clang warning

5ae55ed

Fix clang warning

55bb2d5

maddyscientist added this to the QUDA 2.0 milestone Jun 10, 2025

maddyscientist assigned weinbe2 Jun 10, 2025

maddyscientist requested a review from a team as a code owner June 10, 2025 19:06

weinbe2 reviewed Jun 16, 2025

View reviewed changes

include/kernels/contraction.cuh Show resolved Hide resolved

weinbe2 approved these changes Jun 16, 2025

View reviewed changes

weinbe2 merged commit ffc5b94 into develop Jun 16, 2025
5 of 6 checks passed

weinbe2 deleted the feature/byte_array branch June 16, 2025 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/byte array #1573

Feature/byte array #1573

Uh oh!

maddyscientist commented Jun 10, 2025

Uh oh!

maddyscientist commented Jun 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

weinbe2 left a comment

Uh oh!

weinbe2 commented Jun 16, 2025

Uh oh!

Uh oh!

Uh oh!

Feature/byte array #1573

Feature/byte array #1573

Uh oh!

Conversation

maddyscientist commented Jun 10, 2025

Uh oh!

maddyscientist commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

weinbe2 left a comment

Choose a reason for hiding this comment

Uh oh!

weinbe2 commented Jun 16, 2025

Uh oh!

Uh oh!

Uh oh!

maddyscientist commented Jun 10, 2025 •

edited

Loading