Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What should the memory operand size be for SVE predicated contiguous loads and stores #6561

Open
jackgallagher-arm opened this issue Jan 12, 2024 · 0 comments

Comments

@jackgallagher-arm
Copy link
Collaborator

jackgallagher-arm commented Jan 12, 2024

Unpredicated loads and stores use the full size of access (the size of the full vector register) but scatter/gather instructions use the per-element transfer size.

We treat the predicated contiguous load/store instructions the same way as scatter/gather instructions in drx_expand_scatter_gather() and they are handled in a similar way in drcachesim so it is more convenient if they follow the same convention as the scatter/gather instructions, but other tools might have different needs.

@derekbruening commented:

Hmm, it's seeming like we want dr_opnd_query_flags_t for the size now. For predicated contiguous, a taint-tracking tool (such as Dr. Memory) might want the max size for loads when checking taint bits (on a fastpath anyway) but would have to loop over the per-element for stores when setting taint bits. Hmm. I guess the slowpath would loop too. Maybe file an issue on this predicated size problem covering all platforms, and go ahead w/ your current plan for now?

This is similar to the half-register and holes-in-register complex SIMD interleaving operations: xref #1382, #6218.

Xref #5638.

Originally posted by @derekbruening in #6544 (comment)

jackgallagher-arm added a commit that referenced this issue Jan 23, 2024
This makes the IR consistent with x86 which already uses the per-element
transfer size for the scatter/gather memory operand size.

Issues: #5365, #5036, #6561
jackgallagher-arm added a commit that referenced this issue Jan 24, 2024
#6574)

Make the AArch64 IR consistent with x86 which already uses the
per-element transfer size for the scatter/gather memory operand
size.

This changes the AArch64 codec for the scatter/gather and predicated
contiguous load/store instructions to use the per-element access size
for the memory operand instead of the maximum total transfer size
that it used previously, and updates the tests accordingly.
    
Issues: #5365, #5036, #6561
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant