Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drmemtrace -max_global_trace_refs fails to flush threads when the max is hit #5021

Closed
derekbruening opened this issue Jul 21, 2021 · 0 comments · Fixed by #5029
Closed

drmemtrace -max_global_trace_refs fails to flush threads when the max is hit #5021

derekbruening opened this issue Jul 21, 2021 · 0 comments · Fixed by #5029
Assignees

Comments

@derekbruening
Copy link
Contributor

The -max_global_trace_refs feature in drcachesim's drmemtrace fails to flush thread buffers when the max is hit. Since execution continues, what happens is that much later when the app exits we then have thread exits and unflushed data in thread buffers emitted into the trace with timestamps from that much later time, producing a confusing time gap in the trace.

For our use case, what we really want is to detach when the max is hit, but we need #2644 for that. Barring that, we could put in a synchall when the max is hit and go and flush and exit all the thread buffers: if we add logic for accessing another thread's buffer. Another choice is to record the timestamp when the max is hit and use that when the app finally exits.

If we implement -detach_after_tracing: would we make -max_global_trace_refs an alias of it (with compatibility breakage notes), or go and fix it in addition??

@derekbruening derekbruening self-assigned this Jul 21, 2021
derekbruening added a commit that referenced this issue Jul 28, 2021
Adds a new drmemtrace mechanism to set a frozen timestamp for all
future entries, to avoid huge time gaps when -max_global_trace_refs is
reached but existing thread buffers and exits are not emitted until
much later when the app exits.

Adding a small regression test seems difficult without flakiness as
this involved real-time gaps.  Tested manually:

Pre-fix we see a 2s gap before the thread exit:

    $ ninja && ctest -V -R max-global
    $ bin64/drrun -t drcachesim -simulator_type view -indir suite/tests/tool.drcacheoff.max-global*.dir 2>&1 | grep timestamp
    T3427537 <marker: timestamp 13271354817417504>
    T3427537 <marker: timestamp 13271354817418955>
    T3427537 <marker: timestamp 13271354817624561>
    T3427537 <marker: timestamp 13271354817657186>
    T3427537 <marker: timestamp 13271354817659466>
    T3427537 <marker: timestamp 13271354817664998>
    T3427537 <marker: timestamp 13271354817667972>
    T3427537 <marker: timestamp 13271354817701187>
    T3427537 <marker: timestamp 13271354819175717>

After adding a frozen timestamp:

    $ bin64/drrun -t drcachesim -simulator_type view -indir suite/tests/tool.drcacheoff.max-global*.dir 2>&1 | grep timestamp
    T3429223 <marker: timestamp 13271355614195474>
    T3429223 <marker: timestamp 13271355614196983>
    T3429223 <marker: timestamp 13271355614399149>
    T3429223 <marker: timestamp 13271355614432167>
    T3429223 <marker: timestamp 13271355614434675>
    T3429223 <marker: timestamp 13271355614438892>
    T3429223 <marker: timestamp 13271355614441260>
    T3429223 <marker: timestamp 13271355614474587>
    T3429223 <marker: timestamp 13271355614475843>

Fixes #5021
derekbruening added a commit that referenced this issue Jul 28, 2021
Adds a new drmemtrace mechanism to set a frozen timestamp for all
future entries, to avoid huge time gaps when -max_global_trace_refs is
reached but existing thread buffers and exits are not emitted until
much later when the app exits.

Adding a small regression test seems difficult without flakiness as
this involved real-time gaps.  Tested manually:

Pre-fix we see a 2s gap before the thread exit:

    $ ninja && ctest -V -R max-global
    $ bin64/drrun -t drcachesim -simulator_type view -indir suite/tests/tool.drcacheoff.max-global*.dir 2>&1 | grep timestamp
    T3427537 <marker: timestamp 13271354817417504>
    T3427537 <marker: timestamp 13271354817418955>
    T3427537 <marker: timestamp 13271354817624561>
    T3427537 <marker: timestamp 13271354817657186>
    T3427537 <marker: timestamp 13271354817659466>
    T3427537 <marker: timestamp 13271354817664998>
    T3427537 <marker: timestamp 13271354817667972>
    T3427537 <marker: timestamp 13271354817701187>
    T3427537 <marker: timestamp 13271354819175717>

After adding a frozen timestamp:

    $ bin64/drrun -t drcachesim -simulator_type view -indir suite/tests/tool.drcacheoff.max-global*.dir 2>&1 | grep timestamp
    T3429223 <marker: timestamp 13271355614195474>
    T3429223 <marker: timestamp 13271355614196983>
    T3429223 <marker: timestamp 13271355614399149>
    T3429223 <marker: timestamp 13271355614432167>
    T3429223 <marker: timestamp 13271355614434675>
    T3429223 <marker: timestamp 13271355614438892>
    T3429223 <marker: timestamp 13271355614441260>
    T3429223 <marker: timestamp 13271355614474587>
    T3429223 <marker: timestamp 13271355614475843>

Fixes #5021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant