Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drcachesim.invariants test failing sometimes on win32 #5636

Closed
derekbruening opened this issue Sep 1, 2022 · 3 comments · Fixed by #5643
Closed

drcachesim.invariants test failing sometimes on win32 #5636

derekbruening opened this issue Sep 1, 2022 · 3 comments · Fixed by #5643

Comments

@derekbruening
Copy link
Contributor

The drcachesim.invariants test failed in PR #5633 on 32-bit Windows with this error:

222: Trace invariant failure in T3248 at ref # 46841: Non-explicit control flow has no marker

On a re-run it went away.

@derekbruening
Copy link
Contributor Author

This has started showing up on 64-bit Windows too. All the recent changes are to offline tracing, not online. This is happening more and more frequently so I'm going to try to repro and figure it out.

@derekbruening derekbruening self-assigned this Sep 8, 2022
@derekbruening
Copy link
Contributor Author

I did repro and it looks like this interleaved:

02:18 PM ~/dr/git/build_x64_dbg_tests
% tail OUT
::10084:9428::  @00007FFDFFC09601 instr x3
IPC RECV: type=13, size=2, addr=0x7ffdffc09604
::10084.9428::  @00007FFDFFC09604 instr x2
::10084:9428::  @00007FFDFFC09604 instr x2
IPC RECV: type=10, size=7, addr=0x7ffdffc09606
::10084.9428::  @00007FFDFFC09606 instr x7
::10084:9428::  @00007FFDFFC09606 instr x7
IPC RECV: type=22, size=8, addr=0x298c
IPC RECV: type=28, size=2, addr=0x2f46c4bbd9b003
IPC RECV: type=28, size=3, addr=0x1a
IPC RECV: type=30, size=3, addr=0x7ffdffed288f
::10084.10636::  @00007FFDFFED288F instr x3
::10084:10636::  @00007FFDFFED288F instr x3
Trace invariant failure in T10636 at ref # 2106737: Non-explicit control flow has no marker
abort() has been calledFatal error: failed to write to pipe
Fatal error: failed to write to pipe

02:19 PM ~/dr/git/build_x64_dbg_tests
% tail -5000 OUT | grep 10636 | tail -20
::10084:10636::  type 1
::10084.10636::  @00007FFDFFED288F non-fetched instr x3
::10084:10636::  @00007FFDFFED288F non-fetched instr x3
::10084.10636::  @00007FFDFFED288F write 0000029CC8ABCD70 x8
::10084:10636::  type 1
::10084.10636::  @00007FFDFFED288F non-fetched instr x3
::10084:10636::  @00007FFDFFED288F non-fetched instr x3
::10084.10636::  @00007FFDFFED288F write 0000029CC8ABCD78 x8
::10084:10636::  type 1
::10084.10636::  @00007FFDFFED288F non-fetched instr x3
::10084:10636::  @00007FFDFFED288F non-fetched instr x3
::10084.10636::  @00007FFDFFED288F write 0000029CC8ABCD80 x8
::10084:10636::  type 1
::10084.10636::  @00007FFDFFED288F non-fetched instr x3
::10084:10636::  @00007FFDFFED288F non-fetched instr x3
::10084.10636::  @00007FFDFFED288F write 0000029CC8ABCD88 x8
::10084:10636::  type 1
::10084.10636::  @00007FFDFFED288F instr x3
::10084:10636::  @00007FFDFFED288F instr x3
Trace invariant failure in T10636 at ref # 2106737: Non-explicit control flow has no marker

Looks like it's the part of the check that tries to exclude the online bug of marking the 1st non-fetched after a context switch as fetched:
#4915 (comment)

invariant_checker should have printed out the timestamp and cpuid markers:
but it's not even seeing them.
Could it be a bug in the chunk dup timestamp skipping?

@derekbruening
Copy link
Contributor Author

Two identical timestamps?
Yes:

% grep 10636 OUT | grep timestamp | tail
::10084:10636::  timestamp 13307134641650235
::10084:10636::  timestamp 13307134641759604
::10084:10636::  timestamp 13307134648806403
::10084:10636::  timestamp 13307134648837652
::10084:10636::  timestamp 13307134655947087
::10084:10636::  timestamp 13307134655947087
::10084:10636::  timestamp 13307134663197049
::10084:10636::  timestamp 13307134670197072
::10084:10636::  timestamp 13307134670197072
::10084:10636::  timestamp 13307134677306409
::10084:10636::  timestamp 13307134684540931

(gdb) p/x 13307134684540931
$1 = 0x2f46c4bbd9b003

I had assumed that would never happen. A very small buffer
could explain it.

derekbruening added a commit that referenced this issue Sep 8, 2022
PR #5633 added code to skip the duplicate timestamps at the top of
each chunk: but its logic assumed there would never be two legitimate
timestamps with identical values.  That does happen, in particular in
our online-drcachesim tests on Windows.  This resulted in the
invariant_checker not seeing some timestamp entries, causing its
exception for non-fetched instrs across thread switches to not apply
and resulting in invariant error reports.

We fix this by skipping the first timestamp in each chunk by
instruction count instead.  We'll want the insruction and chunk
counts for #5538 and I was about to add those fields in any case.

Fixes #5636
derekbruening added a commit that referenced this issue Sep 9, 2022
PR #5633 added code to skip the duplicate timestamps at the top of
each chunk: but its logic assumed there would never be two legitimate
timestamps with identical values.  That does happen, in particular in
our online-drcachesim tests on Windows.  This resulted in the
invariant_checker not seeing some timestamp entries, causing its
exception for non-fetched instrs across thread switches to not apply
and resulting in invariant error reports.

We fix this by skipping the first timestamp in each chunk by
instruction count instead.  We'll want the insruction and chunk
counts for #5538 and I was about to add those fields in any case.

Fixes #5636
dolanzhao pushed a commit that referenced this issue Sep 12, 2022
PR #5633 added code to skip the duplicate timestamps at the top of
each chunk: but its logic assumed there would never be two legitimate
timestamps with identical values.  That does happen, in particular in
our online-drcachesim tests on Windows.  This resulted in the
invariant_checker not seeing some timestamp entries, causing its
exception for non-fetched instrs across thread switches to not apply
and resulting in invariant error reports.

We fix this by skipping the first timestamp in each chunk by
instruction count instead.  We'll want the insruction and chunk
counts for #5538 and I was about to add those fields in any case.

Fixes #5636
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant