-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drcachesim incorrectly counts each rep string iter as an ifetch #2051
Comments
Offline now looks like this:
|
|
We'd like to support feeding our traces to core simulators as well as cache simulators. Middle ground would be to include the count with the first rep instr, There could be a thread switch in the middle of a rep loop? Yes, but Since it’s easier to ignore the instr if it matches the prior instr than to |
I looked at some other tools to see how they handle rep string instructions. I made a tiny app with no crt that just executes some assembly:
No matter how many iterations I give the rep stos loop, the hardware perfctrs count it as one instruction:
Other stats, all with :u in the command line:
This is as expected, and where L1-icache-loads is supported we'd expect to see only one load per loop, not one per iter. Yet cachegrind has an L1i ref per iter:
Simple simulators like Pin's That doesn't mean that we should follow suit and be inaccurate in our simulator, but it wouldn't be unprecedented to do so. |
My plan is:
|
To satisfy both cache and core simulators we mark subsequent iterations of rep string loops with a new trace entry type TRACE_TYPE_INSTR_NO_FETCH, which cache simulators can ignore. For offline traces, raw2trace does this for us. Since online traces would need extra overhead to distinguish the first from subsequent iters, they use a new internal type TRACE_TYPE_INSTR_MAYBE_FETCH which is converted by reader_t. Adds instr_is_string_op() and instr_is_rep_string_op() to DR's API to facilitate this. Adds no-fetch stats to the basic_counts tool and updates the basic_counts tests. Fixes #2051
To satisfy both cache and core simulators we mark subsequent iterations of rep string loops with a new trace entry type TRACE_TYPE_INSTR_NO_FETCH, which cache simulators can ignore. For offline traces, raw2trace does this for us. Since online traces would need extra overhead to distinguish the first from subsequent iters, they use a new internal type TRACE_TYPE_INSTR_MAYBE_FETCH which is converted by reader_t. Adds instr_is_string_op() and instr_is_rep_string_op() to DR's API to facilitate this. Adds no-fetch stats to the basic_counts tool and updates the basic_counts tests. Fixes #2051
To satisfy both cache and core simulators we mark subsequent iterations of rep string loops with a new trace entry type TRACE_TYPE_INSTR_NO_FETCH, which cache simulators can ignore. For offline traces, raw2trace does this for us. Since online traces would need extra overhead to distinguish the first from subsequent iters, they use a new internal type TRACE_TYPE_INSTR_MAYBE_FETCH which is converted by reader_t. Adds instr_is_string_op() and instr_is_rep_string_op() to DR's API to facilitate this. Adds no-fetch stats to the basic_counts tool and updates the basic_counts tests. Fixes #2051
Xref #2011
The original rep string loop only has one ifetch for the whole loop, while the drutil-expanded instru in drcachesim has an ifetch per iteration.
This will be easy to solve for offline, but harder for online: in fact it seems that some kind of explicit iter count check is needed.
The text was updated successfully, but these errors were encountered: