Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AArch64: Fix P register save/restore on 128-bit vector length systems #6760

Closed
AssadHashmi opened this issue Apr 5, 2024 · 1 comment · Fixed by #6774
Closed

AArch64: Fix P register save/restore on 128-bit vector length systems #6760

AssadHashmi opened this issue Apr 5, 2024 · 1 comment · Fixed by #6774
Assignees

Comments

@AssadHashmi
Copy link
Contributor

AssadHashmi commented Apr 5, 2024

Currently predicate and FFR state is stored in dr_simd_t.
This wastes space as predicates and FFR are an 1/8th of the size of the implementation's scalable vector register length.

Issue raised during review of #6757.

SVE master issue #5365.

@AssadHashmi AssadHashmi changed the title AArch64: Use smaller data types for SVE predicate and FFR registers AArch64: use smaller data types for SVE predicate and FFR registers Apr 5, 2024
jackgallagher-arm added a commit that referenced this issue Apr 12, 2024
PR #6757 fixed the way we read/write SVE register slots but
unfortunately it is now broken on systems with 128-bit vector length.

Both SVE vector and predicate registers use dr_simd_t slots which is a
64-byte type meant to store up to 512-bit vector registers. SVE
predicate registers are always 1/8 the size of the vector register so
for 512-bit vector length systems we only really need 64 / 8 = 8 bytes
to store predicate registers.

The ldr/str instructions we use to read and write the predicate
register slots have a base+offset memory operand where the offset is
a value in the range [-256, 255] scaled based by predicate register
length. We read and write the registers by setting the base address
to the address of the first slot, and setting the offset to
n * sizeof(dr_simd_t) for each register Pn.
For systems with 128-bit vector length, this means the predicate
registers are 16 / 8 = 2 bytes so the maximum offset we can reach
is 2 * 255 = 510 bytes. This means on 128-bit VL systems we can only
reach the first 8 predicate registers (8 * sizeof(dr_simd_t) = 512).

By changing the predicate register and FFR slots to use a new type
dr_svep_t which is 1/8 the size of dr_simd_t we can fix this bug and
save space.

dr_svep_t is currently 8 bytes to correspond to 64 byte vectors, but
even if we extend DynamoRIO to support the maximum SVE vector length of
2048-bits (256 bytes) dr_svep_t will only need to be increased to
256 / 8 = 32 bytes so the maximum offset (15 * 32 = 480 bytes) will
always be in range.

As this changes the size of the predicate register and FFR slots, this
changes the size of the dr_mcontext_t structure and breaks backwards
compatibility with earlier versions of DynamoRIO so the version number
is increased to 10.90.

Issues: #6760, #5365
Fixes: #6760
@jackgallagher-arm jackgallagher-arm self-assigned this Apr 15, 2024
@jackgallagher-arm
Copy link
Collaborator

It turns out that #6757 broke slot save/restore on systems with 128-bit vector length because sizeof(dr_simd_t)*8 is out of range for Pn register ldr/str instructions (see #6774 for details) so this is a bug fix in addition to saving space.

@jackgallagher-arm jackgallagher-arm changed the title AArch64: use smaller data types for SVE predicate and FFR registers AArch64: Fix P register save/restore on 128-bit vector length systems Apr 15, 2024
jackgallagher-arm added a commit that referenced this issue Apr 15, 2024
…6774)

PR #6757 fixed the way we read/write SVE register slots but
unfortunately it is now broken on systems with 128-bit vector length.

Both SVE vector and predicate registers use dr_simd_t slots which is a
64-byte type meant to store up to 512-bit vector registers. SVE
predicate registers are always 1/8 the size of the vector register so
for 512-bit vector length systems we only really need 64 / 8 = 8 bytes
to store predicate registers.

The ldr/str instructions we use to read and write the predicate register
slots have a base+offset memory operand where the offset is a value in
the range [-256, 255] scaled based by predicate register length. We read
and write the registers by setting the base address to the address of
the first slot, and setting the offset to n * sizeof(dr_simd_t) for each
register Pn.
For systems with 128-bit vector length, this means the predicate
registers are 16 / 8 = 2 bytes so the maximum offset we can reach is 2 *
255 = 510 bytes. This means on 128-bit VL systems we can only reach the
first 8 predicate registers (8 * sizeof(dr_simd_t) = 512).

By changing the predicate register and FFR slots to use a new type
dr_svep_t which is 1/8 the size of dr_simd_t we can fix this bug and
save space.

dr_svep_t is currently 8 bytes to correspond to 64 byte vectors, but
even if we extend DynamoRIO to support the maximum SVE vector length of
2048-bits (256 bytes) dr_svep_t will only need to be increased to 256 /
8 = 32 bytes so the maximum offset (15 * 32 = 480 bytes) will always be
in range.

As this changes the size of the predicate register and FFR slots, this
changes the size of the dr_mcontext_t structure and breaks backwards
compatibility with earlier versions of DynamoRIO so the version number
is increased to 10.90.

Issues: #6760, #5365
Fixes: #6760
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants