Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#3801,#3823: Make drreg state restoration more robust #4944

Merged
merged 24 commits into from
Jun 14, 2021

Conversation

abhinav92003
Copy link
Contributor

@abhinav92003 abhinav92003 commented Jun 9, 2021

Revamps the drreg state restoration logic in drreg_event_restore_state to correctly
handle various corner cases like multiple restores (i#4939), aflags in reg (i#4933)
and multi-phase use (i#3823).

Makes the reconstructed ilist available to drreg_event_restore_state. This is needed
for determining whether some instr is app or tool.

The new logic looks at the complete ilist to track the movement of the app value of
gpr/aflags across spills and restores. The native value can be in the gpr/aflags
itself or one or more spill slots.
We do not forget a spill slot on seeing one restore (which caused i#4939), but
remember that the app value can still be found in the spill slot, in addition to
the gpr itself. Instead, a spill slot is forgotten when it is used to spill some
other reg.
We also track whether gprs/aflags contain their native app value or not. This helps
in handling cases of overlapping or nested spill regions arising out of multi-phase
use of drreg. Tool writes (except restores) to gprs/aflags clobber the native value.
App writes to gprs/aflags install a native value and invalidate the value in the
reg's spill slot. Restores from spill slots also install a native value.
We also track movement of aflags to registers more robustly (to fix i#4933).

Adds tests for the i#4939, i#4933, i#3823 (gpr case, not aflags). Test for i#3823
aflags case will be added in PR #4917.

Issue: #4933, #4939, #3823, #3801
Fixes: #4933
Fixes: #4939

Revamps the drreg state restoration logic in drreg_event_restore_state to correctly
handle various corner cases like multiple restores (i#4939), aflags in reg (i#4933)
and multi-phase use (i#3823).

Makes the reconstructed ilist available to drreg_event_restore_state. This is needed
for determining whether some instr is app or tool.

The new logic looks at the complete ilist to track the movement of the app value of
gpr/aflags across spills and restores.
We do not forget a spill slot on seeing one restore (which caused i#4939), but
remember that the app value can still be found in the spill slot, in addition to
the gpr itself. Instead, a spill slot is forgetten when it is used to spill some
other reg.
We also track whether gprs/aflags contain their native app value or not. This helps
in handling cases of overlapping or nested spill regions arising out of multi-phase
use of drreg. Tool writes (except restores) to gprs/aflags clobbers the native value.
App writes to gprs/aflags installs a native value and clobbers the value in the
reg's spill slot.
We also track movement of aflags to registers more robustly (to fix i#4933).

Adds tests for the i#4939 and i#4933 cases. Test for i#3823 will be added in a
separate PR.

Issue: #4933, #4939, #3823
Fixes: #4933
Fixes: #4939
Also fix uninit variable issue and an assert condition.
@abhinav92003
Copy link
Contributor Author

Still working on root-causing the failures on drx-test in AArch64:

<Internal Error: Failed to encode instruction: 'b      $0x0000ffff90bd9da0'
>

This seems to occur during the instr_get_eflags(inst, DR_QUERY_INCLUDE_ALL) call in drreg_event_restore_state, for the above-mentioned instr. The instr is part of the syscall mangling jumps, and the target looks slightly different than other branch instrs: b @0x0000fffd40a0bec0[8byte] vs b $0x0000ffff846aa22c

@derekbruening
Copy link
Contributor

Still marked as Draft

ext/drreg/drreg.c Outdated Show resolved Hide resolved
ext/drreg/drreg.c Outdated Show resolved Hide resolved
@abhinav92003 abhinav92003 marked this pull request as ready for review June 10, 2021 14:31
Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments from 1st two files (didn't look at drreg yet).

core/lib/dr_events.h Show resolved Hide resolved
core/lib/dr_events.h Outdated Show resolved Hide resolved
core/lib/dr_events.h Outdated Show resolved Hide resolved
core/lib/dr_events.h Outdated Show resolved Hide resolved
core/lib/dr_events.h Outdated Show resolved Hide resolved
core/lib/dr_events.h Show resolved Hide resolved
core/lib/dr_events.h Show resolved Hide resolved
ext/drreg/drreg.c Show resolved Hide resolved
ext/drreg/drreg.c Outdated Show resolved Hide resolved
ext/drreg/drreg.c Outdated Show resolved Hide resolved
ext/drreg/drreg.c Outdated Show resolved Hide resolved
ext/drreg/drreg.c Outdated Show resolved Hide resolved
ext/drreg/drreg.c Outdated Show resolved Hide resolved
ext/drreg/drreg.c Outdated Show resolved Hide resolved
ext/drreg/drreg.c Show resolved Hide resolved
suite/tests/client-interface/drreg-test.c Outdated Show resolved Hide resolved
suite/tests/client-interface/drreg-test.c Outdated Show resolved Hide resolved
@abhinav92003
Copy link
Contributor Author

Still working on root-causing the failures on drx-test in AArch64:

<Internal Error: Failed to encode instruction: 'b      $0x0000ffff90bd9da0'
>

This seems to occur during the instr_get_eflags(inst, DR_QUERY_INCLUDE_ALL) call in drreg_event_restore_state, for the above-mentioned instr. The instr is part of the syscall mangling jumps, and the target looks slightly different than other branch instrs: b @0x0000fffd40a0bec0[8byte] vs b $0x0000ffff846aa22c

The encoding error itself seems because the immediate is too big for b (which can handle 28-bit offset only). private_instr_encode tries to encode the branch instr in some buf and the offset (target - buf-address) is perhaps more than 28-bit.

byte *nxt = instr_encode_check_reachability(dcontext, instr, buf, &has_instr_opnds);

The branch shouldn't affect state restoration at all. For state restoration purpose, maybe we'll need to somehow recognise that this branch is the syscall skip one, and then set its target to some nearby instr?

@derekbruening
Copy link
Contributor

This seems to occur during the instr_get_eflags(inst, DR_QUERY_INCLUDE_ALL) call in drreg_event_restore_state, for the above-mentioned instr. The instr is part of the syscall mangling jumps, and the target looks slightly different than other branch instrs: b @0x0000fffd40a0bec0[8byte] vs b $0x0000ffff846aa22c

The encoding error itself seems because the immediate is too big for b (which can handle 28-bit offset only). private_instr_encode tries to encode the branch instr in some buf and the offset (target - buf-address) is perhaps more than 28-bit.

byte *nxt = instr_encode_check_reachability(dcontext, instr, buf, &has_instr_opnds);

Not sure I understand: what is the callstack? Somebody calls instr_length() and it fails b/c of a temp buffer reachability issue? I thought that kind of thing was all handled, using a flag or param saying to ignore reachability problems for temp encodes.

@derekbruening
Copy link
Contributor

Not sure I understand: what is the callstack? Somebody calls instr_length() and it fails b/c of a temp buffer reachability issue? I thought that kind of thing was all handled, using a flag or param saying to ignore reachability problems for temp encodes.

instr_encode_ignore_reachability(), that's it.

api/docs/bt.dox Outdated Show resolved Hide resolved
api/docs/bt.dox Outdated Show resolved Hide resolved
api/docs/bt.dox Outdated Show resolved Hide resolved
api/docs/bt.dox Outdated Show resolved Hide resolved
api/docs/release.dox Outdated Show resolved Hide resolved
core/lib/dr_events.h Outdated Show resolved Hide resolved
core/lib/dr_events.h Outdated Show resolved Hide resolved
core/lib/dr_events.h Outdated Show resolved Hide resolved
core/lib/dr_events.h Outdated Show resolved Hide resolved
@abhinav92003
Copy link
Contributor Author

Thanks for the review! Apologies for the many doc style issues at the end.

instr_encode_ignore_reachability(), that's it.

I see that private_instr_encode indeed uses instr_encode_ignore_reachability to work around this issue. The encoding failure log was essentially a warning that gets printed when the first attempt fails. I considered skipping the warning completely for check_reachable (6aaef52) but decided against it as encoding may fail due to some other issue too. Instead, I just added the warning to expected output pattern for AArchXX.

@derekbruening
Copy link
Contributor

derekbruening commented Jun 11, 2021 via email

This is to handle the case where multiple phases spill the native app
value of some reg. In this case, the state restoration logic should
remember both slots, instead of just the latest one, because the
latest one may be recycled later.

Also adds test for this scenario.

Also adds test for the scenario when some app instr uses the stolen
reg x28 on AArch64 for some store, which looks like a spill and
corrupts the state restoration logic. Added a check that the
instr should be a meta-instr to be considered a spill/restore.
@abhinav92003
Copy link
Contributor Author

abhinav92003 commented Jun 12, 2021

Ah this needs to be documented if the restore event is passed a post-mangling instrlist. Normally clients never see mangling and this may confuse someone and cause errors as they won’t see the same thing as in the bb event. I guess decoding from the cache is similar but we should make this clear.

Added the additional information in documentation -- that the reconstructed ilist may contain DR's own mangling too.

@abhinav92003
Copy link
Contributor Author

I found another corner case -- when there are two spills from different phases that spill the native app value without any write in between them. E.g.:

<app2app spills r1 to s1>
<insertion spills r1 to s2>
<insertion restores r1 from s2>
<insertion spills r2 to s2>
<fault>

This second spill may also be from outside-drreg spills/restores like restore_app_value_to_stolen_reg

In this case, the restore state logic should remember s1 and s2 both for r1. If it remembers only s2, then it won't be able to restore r1 as s2 is overwritten later. To handle this, I simply maintain a spill_slot -> reg mapping instead of reg -> spill_slot. Added the fix and a test case too.

I think I have handled all cases. I'll give it another look and re-request review when ready.

@abhinav92003 abhinav92003 changed the title i#4939,#4933: Make drreg state restoration more robust i#3801,#3823: Make drreg state restoration more robust Jun 14, 2021
ext/drreg/drreg.c Outdated Show resolved Hide resolved
ext/drreg/drreg.c Show resolved Hide resolved
@derekbruening
Copy link
Contributor

I suspect that this PR caused a regression in drreg restoration: #4963

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants