Consider removing non-fetched instrs for repstr #4915

abhinav92003 · 2021-05-18T05:42:55Z

Xref #2051 and discussion on PR #4912: #4912 (comment)

We should consider removing non-fetched instrs for repstr. Today, if the string loop is executed N times, we'll have the rep stringop instr N times in the instr stream. The last N-1 instances are marked as TRACE_TYPE_INSTR_NO_FETCH so that cache simulators can choose to ignore these duplicate entries.

The thinking today is that maybe core simulators also don't need these duplicate instrs, and we should just have only one single instr entry and then multiple memref entries based on loop count.

The text was updated successfully, but these errors were encountered:

derekbruening · 2021-06-16T16:50:53Z

If we implemented #4948 and changed the underlying rep string expansion, the non-fetched instrs would disappear on their own and we'd just change the docs for the update.

derekbruening · 2021-06-28T20:47:01Z

Online traces incorrectly mark a continuing rep string loop as "fetched" for the first instr after a thread switch. Lumping here rather than a separate issue we want to fix b/c we plan to remove non-fetched. I hit this in PR #4974:

::14112:13616::  @76F68FA5 non-fetched instr x2
::14112.13616::  @76F68FA5 write 00EA9FBC x4
::14112:13616::  @76F68FA5 non-fetched instr x2
::14112.13616::  @76F68FA5 write 00EA9FC0 x4
::14112.14428:: marker type 2 value 3083420504
::14112.14428:: marker type 3 value 2
::14112.14428::  @76F62D1C instr x3
...
::14112.14428::  @754D3B02 write 00CFFE48 x4
::14112.13616:: marker type 2 value 3083420504
::14112.13616:: marker type 3 value 10
::14112.13616::  @76F68FA5 instr x2
Assertion failed: TESTALL(OFFLINE_FILE_TYPE_FILTERED, file_type_) || (prev_instr_[memref.data.tid].instr.addr + prev_instr_[memref.data.tid].instr.size == memref.instr.addr) || (prev_instr_[memref.data.tid].instr.addr == memref.instr.addr && memref.instr.type == TRACE_TYPE_INSTR_NO_FETCH) || (prev_xfer_marker_[memref.data.tid].instr.tid != 0 && (prev_xfer_marker_[memref.data.tid].instr.tid != memref.instr.tid || prev_xfer_marker_[memref.data.tid].marker.marker_type == TRACE_MARKER_TYPE_KERNEL_EVENT || prev_xfer_marker_[memref.data.tid].marker.marker_type == TRACE_MARKER_TYPE_KERNEL_XFER)) || prev_instr_[memref.data.tid].instr.type == TRACE_TYPE_INSTR_SYSENTER, file D:\derek\dr\git\src\clients\drcachesim\tests\trace_invariants.cpp, line 167

derekbruening · 2022-03-10T16:19:49Z

Trying to re-write the choices here:

A) On entry if TLS flag not set, record instr fetch and set flag; on final
iter, clear flag. I assume have to try to clear flag on abnormal loop
break too: a signal. So adds overhead and extra branches.

B) Keep entire loop inside block and unroll 1st iter to do fetch. Has all
the downsides of a loop inside a block: messes up DR's trace building,
signal delivery delay bounds, unlink flushing, shared fragment deletion
promptness, etc.

C) Try to split across two blocks, one for 1st iter and 2nd for rest. Use
PC+1 (skipping rep byte) for tag for 2nd and hope app never targets that
PC. When see a plain string op have to decode backward I guess or have
some recorded state so know it's this artificial loop body.

D) #4915: Remove all but 1st fetch in raw2trace, and add analysis to do the same
when dumping a trace buffer to get the live instr counts to match too --
which also fixes online. But then instead of a single contiguous buffer
being written/transmitted you have to do it in many pieces to skip the
extra fetches, or waste time memcpying it on top of itself back to
contiguous.

derekbruening · 2022-03-10T16:23:50Z

Whoops that last comment was meant for #4948 with this issue being choice D

abhinav92003 · 2022-03-14T16:01:14Z

We had an offline chat where we discussed that option (A) is doable. Note that ensuring a single instr entry gets trickier if the repstr execution is interrupted by a signal; this is because we'll need to save and restore the mentioned TLS flag on entry and exit from the signal handler, and there can be nested signals too. But we decided that it's actually better to emit a new instr entry when the repstr expansion resumes after such an interruption; this is because the repstr instruction would actually be re-fetched by the processor pipeline. So all we need to do on an signal is to clear the mentioned TLS flag.

Question: do we want the same behavior on interruption of the scatter/gather expansion by a fault? Note that the scatter/gather expansion cannot be interrupted by an async signal (DR would delay the async signal, as the expansion is a single fragment that's executed just once per dynamic scatter/gather instruction, unlike the repstr where the same fragment is executed N times), so this is only for a scatter/gather instr resuming after a fault. Today we add a single pc entry for the scatter/gather instruction, even if some store/load faults. Note that the scatter/gather expansion does not use a loop, unlike the repstr expansion. So, it is actually more complex to add a new pc entry on resumption after a fault. We'll need to have a TLS-flag-protected instr-entry emit before each memref-emit.

derekbruening · 2022-03-14T18:00:20Z

Not sure I understand the concern about scatter-gather: there is no problem today. We don't have to do anything extra for proper resumption of a rep-string or scatter-gather after a signal that is delivered to the app: that will always re-enter the original block from the start but now with a different resumption state (rcx for rep string; mask for scatter-gatther) because we translate the expansion to the original instruction.

abhinav92003 · 2022-03-14T18:07:37Z

that will always re-enter the original block from the start but now with a different resumption state

Oh, so after the fault, DR will not resume at the faulting cache pc, but at the faulting app pc (which will take it to the top of the scatter/gather fragment). Yes in that case we're fine, as the scatter/gather instr entry will indeed be emitted again.

derekbruening · 2023-09-01T23:29:25Z

The online failure to identify non-fetched after a thread switch lumped into this issue at #4915 (comment) can result in a PC discontinuity error:

     1184144      760077:        9972 ifetch       3 byte(s) @ 0x76266f9f non-branch
     1184145      760078:        9972 ifetch       2 byte(s) @ 0x76266fa2 non-branch
     1184146      760078:        9972 write        4 byte(s) @ 0x009bfb08 by PC 0x76266fa2
     1184147      760078:        9972 ifetch       2 byte(s) @ 0x76266fa2 non-fetched instruction
     1184148      760078:        9972 write        4 byte(s) @ 0x009bfb0c by PC 0x76266fa2
     1184149      760078:        9972 ifetch       2 byte(s) @ 0x76266fa2 non-fetched instruction
     1184150      760078:        9972 write        4 byte(s) @ 0x009bfb10 by PC 0x76266fa2
2
------------------------------------------------------------
     1184151      760079:        4424 ifetch       5 byte(s) @ 0x773beaf7 non-branch
<...>
     1184206      760107:        4424 write        4 byte(s) @ 0x02c2f454 by PC 0x773beb0f
------------------------------------------------------------
     1184207      760108:        9972 ifetch       2 byte(s) @ 0x76266fa2 non-branch
Trace invariant failure in T9972 at ref # 1184207 (69 instrs since timestamp 1660902515): Non-explicit control flow has no marker

Adds a timestamp+cpuid pair at the end of normal buffers, to help separate trace output i/o time. Adds a timestamp+cpuid pair before and after each application syscall. It appears around the syscall marker. The trace buffer is no longer output pre-syscall (this addresses #3113) except for i-filtered traces or small-window traces where we need a frequent trace size check. Adds a timestamp+cpuid pair when a kernel transfer event occurs. The trace buffer is no longer output here. Fixes the existing behavior where the post-syscall-buffer's timestamp contains the pre-syscall time. Bumps the trace version for this change as the contents of post-syscall timestamps are now different. Augments the documentation around timestamps and adds a missing entry on syscall markers. Adds some invariant checks, and improve existing syscall marker checks. Further checks are more difficult as the post-syscall time is not present when there is no post-syscall event, such as with exit or sigreturn. Did some manual testing with the view tool. Applies several auxiliary changes required to get all modes to work with these additional timestamps: + Removes the timestamp+cpuid from the split-pipe header, as that timestamp is delayed by i/o and results in misleading out-of-order timestamps. + Prevents pipe-splitting before a cpuid, to avoid a confusing sequence. + Fixes a bug where a syscall marker is missing when the prior block wrote out its buffer (#6291). + Fixes #6245 by re-instating the footer for non-split window files and removing the template `.*` from PR #6165 which was hiding the warning (this one could possibly be split into its own PR). + Adjusts the PC discontinuity relaxation for online unfetched instrs (#4915) to check the size now that there's no timestamp. Issue: #4915 Fixes #6289 Fixes #3113 Fixes #6291 Fixes #6245

abhinav92003 added the Component-DrMemtrace label May 18, 2021

abhinav92003 mentioned this issue May 18, 2021

i#2985 scatter-gather: Fix drcachesim and raw2trace issues. #4912

Merged

derekbruening mentioned this issue Jun 16, 2021

Redo rep string expansion to avoid the instruction fetch on subsequent iterations #4948

Open

derekbruening mentioned this issue Jun 21, 2022

Add fast-forward feature to skip N instructions in drmemtrace analysis tools #5538

Open

derekbruening mentioned this issue Sep 8, 2022

drcachesim.invariants test failing sometimes on win32 #5636

Closed

derekbruening mentioned this issue Aug 18, 2023

i#5036 A64 scatter/gather, part 1: Expand scalar+vector loads #6267

Merged

derekbruening mentioned this issue Sep 1, 2023

i#6289: Add timestamps at buffer end and around syscalls #6290

Merged

derekbruening mentioned this issue Oct 12, 2023

seeing more than one cpu id on one output stream with MAP_TO_RECORDED_OUTPUT #6354

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider removing non-fetched instrs for repstr #4915

Consider removing non-fetched instrs for repstr #4915

abhinav92003 commented May 18, 2021

derekbruening commented Jun 16, 2021

derekbruening commented Jun 28, 2021

derekbruening commented Mar 10, 2022

derekbruening commented Mar 10, 2022

abhinav92003 commented Mar 14, 2022

derekbruening commented Mar 14, 2022

abhinav92003 commented Mar 14, 2022

derekbruening commented Sep 1, 2023

Consider removing non-fetched instrs for repstr #4915

Consider removing non-fetched instrs for repstr #4915

Comments

abhinav92003 commented May 18, 2021

derekbruening commented Jun 16, 2021

derekbruening commented Jun 28, 2021

derekbruening commented Mar 10, 2022

derekbruening commented Mar 10, 2022

abhinav92003 commented Mar 14, 2022

derekbruening commented Mar 14, 2022

abhinav92003 commented Mar 14, 2022

derekbruening commented Sep 1, 2023