Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#6760 AArch64: Use smaller data types for SVE P and FFR registers #6774

Merged
merged 9 commits into from
Apr 15, 2024

Conversation

jackgallagher-arm
Copy link
Collaborator

PR #6757 fixed the way we read/write SVE register slots but unfortunately it is now broken on systems with 128-bit vector length.

Both SVE vector and predicate registers use dr_simd_t slots which is a 64-byte type meant to store up to 512-bit vector registers. SVE predicate registers are always 1/8 the size of the vector register so for 512-bit vector length systems we only really need 64 / 8 = 8 bytes to store predicate registers.

The ldr/str instructions we use to read and write the predicate register slots have a base+offset memory operand where the offset is a value in the range [-256, 255] scaled based by predicate register length. We read and write the registers by setting the base address to the address of the first slot, and setting the offset to n * sizeof(dr_simd_t) for each register Pn.
For systems with 128-bit vector length, this means the predicate registers are 16 / 8 = 2 bytes so the maximum offset we can reach is 2 * 255 = 510 bytes. This means on 128-bit VL systems we can only reach the first 8 predicate registers (8 * sizeof(dr_simd_t) = 512).

By changing the predicate register and FFR slots to use a new type dr_svep_t which is 1/8 the size of dr_simd_t we can fix this bug and save space.

dr_svep_t is currently 8 bytes to correspond to 64 byte vectors, but even if we extend DynamoRIO to support the maximum SVE vector length of 2048-bits (256 bytes) dr_svep_t will only need to be increased to 256 / 8 = 32 bytes so the maximum offset (15 * 32 = 480 bytes) will always be in range.

As this changes the size of the predicate register and FFR slots, this changes the size of the dr_mcontext_t structure and breaks backwards compatibility with earlier versions of DynamoRIO so the version number is increased to 10.90.

Issues: #6760, #5365
Fixes: #6760

PR #6757 fixed the way we read/write SVE register slots but
unfortunately it is now broken on systems with 128-bit vector length.

Both SVE vector and predicate registers use dr_simd_t slots which is a
64-byte type meant to store up to 512-bit vector registers. SVE
predicate registers are always 1/8 the size of the vector register so
for 512-bit vector length systems we only really need 64 / 8 = 8 bytes
to store predicate registers.

The ldr/str instructions we use to read and write the predicate
register slots have a base+offset memory operand where the offset is
a value in the range [-256, 255] scaled based by predicate register
length. We read and write the registers by setting the base address
to the address of the first slot, and setting the offset to
n * sizeof(dr_simd_t) for each register Pn.
For systems with 128-bit vector length, this means the predicate
registers are 16 / 8 = 2 bytes so the maximum offset we can reach
is 2 * 255 = 510 bytes. This means on 128-bit VL systems we can only
reach the first 8 predicate registers (8 * sizeof(dr_simd_t) = 512).

By changing the predicate register and FFR slots to use a new type
dr_svep_t which is 1/8 the size of dr_simd_t we can fix this bug and
save space.

dr_svep_t is currently 8 bytes to correspond to 64 byte vectors, but
even if we extend DynamoRIO to support the maximum SVE vector length of
2048-bits (256 bytes) dr_svep_t will only need to be increased to
256 / 8 = 32 bytes so the maximum offset (15 * 32 = 480 bytes) will
always be in range.

As this changes the size of the predicate register and FFR slots, this
changes the size of the dr_mcontext_t structure and breaks backwards
compatibility with earlier versions of DynamoRIO so the version number
is increased to 10.90.

Issues: #6760, #5365
Fixes: #6760
@jackgallagher-arm
Copy link
Collaborator Author

vs2019-32 failure looks like #6764

Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most important thing IMHO is having a test that reproduces the 1st-8-pred-reg bug and shows that this PR fixes it (and will be a regression test for future changes).

.github/workflows/ci-docs.yml Show resolved Hide resolved
core/arch/aarch64/emit_utils.c Outdated Show resolved Hide resolved
core/arch/arch.c Outdated Show resolved Hide resolved
core/arch/arch.c Outdated Show resolved Hide resolved
core/arch/arch.c Outdated Show resolved Hide resolved
core/arch/arm/arm.asm Outdated Show resolved Hide resolved
core/unix/signal_linux_aarch64.c Outdated Show resolved Hide resolved
core/unix/signal_linux_aarch64.c Outdated Show resolved Hide resolved
suite/tests/client-interface/cleancall-opt-shared.h Outdated Show resolved Hide resolved
@jackgallagher-arm jackgallagher-arm merged commit f45eeba into master Apr 15, 2024
16 checks passed
@jackgallagher-arm jackgallagher-arm deleted the i6760-aarch64-svep-ffr-data-type branch April 15, 2024 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AArch64: Fix P register save/restore on 128-bit vector length systems
2 participants