-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider removing non-fetched instrs for repstr #4915
Comments
If we implemented #4948 and changed the underlying rep string expansion, the non-fetched instrs would disappear on their own and we'd just change the docs for the update. |
Online traces incorrectly mark a continuing rep string loop as "fetched" for the first instr after a thread switch. Lumping here rather than a separate issue we want to fix b/c we plan to remove non-fetched. I hit this in PR #4974:
|
Trying to re-write the choices here: A) On entry if TLS flag not set, record instr fetch and set flag; on final B) Keep entire loop inside block and unroll 1st iter to do fetch. Has all C) Try to split across two blocks, one for 1st iter and 2nd for rest. Use D) #4915: Remove all but 1st fetch in raw2trace, and add analysis to do the same |
Whoops that last comment was meant for #4948 with this issue being choice D |
We had an offline chat where we discussed that option (A) is doable. Note that ensuring a single instr entry gets trickier if the repstr execution is interrupted by a signal; this is because we'll need to save and restore the mentioned TLS flag on entry and exit from the signal handler, and there can be nested signals too. But we decided that it's actually better to emit a new instr entry when the repstr expansion resumes after such an interruption; this is because the repstr instruction would actually be re-fetched by the processor pipeline. So all we need to do on an signal is to clear the mentioned TLS flag. Question: do we want the same behavior on interruption of the scatter/gather expansion by a fault? Note that the scatter/gather expansion cannot be interrupted by an async signal (DR would delay the async signal, as the expansion is a single fragment that's executed just once per dynamic scatter/gather instruction, unlike the repstr where the same fragment is executed N times), so this is only for a scatter/gather instr resuming after a fault. Today we add a single pc entry for the scatter/gather instruction, even if some store/load faults. Note that the scatter/gather expansion does not use a loop, unlike the repstr expansion. So, it is actually more complex to add a new pc entry on resumption after a fault. We'll need to have a TLS-flag-protected instr-entry emit before each memref-emit. |
Not sure I understand the concern about scatter-gather: there is no problem today. We don't have to do anything extra for proper resumption of a rep-string or scatter-gather after a signal that is delivered to the app: that will always re-enter the original block from the start but now with a different resumption state (rcx for rep string; mask for scatter-gatther) because we translate the expansion to the original instruction. |
Oh, so after the fault, DR will not resume at the faulting cache pc, but at the faulting app pc (which will take it to the top of the scatter/gather fragment). Yes in that case we're fine, as the scatter/gather instr entry will indeed be emitted again. |
The online failure to identify non-fetched after a thread switch lumped into this issue at #4915 (comment) can result in a PC discontinuity error:
|
Adds a timestamp+cpuid pair at the end of normal buffers, to help separate trace output i/o time. Adds a timestamp+cpuid pair before and after each application syscall. It appears around the syscall marker. The trace buffer is no longer output pre-syscall (this addresses #3113) except for i-filtered traces or small-window traces where we need a frequent trace size check. Adds a timestamp+cpuid pair when a kernel transfer event occurs. The trace buffer is no longer output here. Fixes the existing behavior where the post-syscall-buffer's timestamp contains the pre-syscall time. Bumps the trace version for this change as the contents of post-syscall timestamps are now different. Augments the documentation around timestamps and adds a missing entry on syscall markers. Adds some invariant checks, and improve existing syscall marker checks. Further checks are more difficult as the post-syscall time is not present when there is no post-syscall event, such as with exit or sigreturn. Did some manual testing with the view tool. Applies several auxiliary changes required to get all modes to work with these additional timestamps: + Removes the timestamp+cpuid from the split-pipe header, as that timestamp is delayed by i/o and results in misleading out-of-order timestamps. + Prevents pipe-splitting before a cpuid, to avoid a confusing sequence. + Fixes a bug where a syscall marker is missing when the prior block wrote out its buffer (#6291). + Fixes #6245 by re-instating the footer for non-split window files and removing the template `.*` from PR #6165 which was hiding the warning (this one could possibly be split into its own PR). + Adjusts the PC discontinuity relaxation for online unfetched instrs (#4915) to check the size now that there's no timestamp. Issue: #4915 Fixes #6289 Fixes #3113 Fixes #6291 Fixes #6245
Xref #2051 and discussion on PR #4912: #4912 (comment)
We should consider removing non-fetched instrs for repstr. Today, if the string loop is executed N times, we'll have the
rep stringop
instr N times in the instr stream. The last N-1 instances are marked asTRACE_TYPE_INSTR_NO_FETCH
so that cache simulators can choose to ignore these duplicate entries.The thinking today is that maybe core simulators also don't need these duplicate instrs, and we should just have only one single instr entry and then multiple memref entries based on loop count.
The text was updated successfully, but these errors were encountered: