Add fast-forward feature to skip N instructions in drmemtrace analysis tools #5538

derekbruening · 2022-06-21T16:14:14Z

This is a feature request to make it easier to simulate a subset of a long
trace. Today, this would be done by running a trace splitter offline. The
proposal is to support skipping forward N instructions by seeking in the
trace file, to make it usable online during simulation.

The records are fixed-size, but the instruction type density is not
uniform, so we'd have to do something like embed markers with instruction
counts every N records so the seeking can find the proper instruction
boundary. We could limit the fast-forward jumps to every N instructions.

There are several features which may not interact well with this feature,
where we want to only emit non-changing data once early in the trace and
assume a trace reader can cache the data:

Instruction encodings (Add instruction encoding entries to drmemtrace #5520) only included for the first dynamic instance of a static
instruction, to save space. One solution is to re-emit these at the fast-foward
point.
Physical address translation markers (Add support for -use_physical with drcachesim -offline #4014), which are only emitted on the first
instance of a new page (or when mappings change, if that can be detected).
Again, a solution would be to re-emit these at the fast-forward "checkpoints".

derekbruening · 2022-06-21T16:15:24Z

A related feature is having an instruction count in the view tool in addition to the total record count used today, and corresponding -skip_instrs and -sim_instrs to go with today's -skip_refs and -sim_refs (for view and simulator tools).

Also, as part of this feature we should probably solve #4915 / #4948: what about non-fetched instrs?

Splits post-processed offline drmemtrace files into chunks of a fixed instruction count. These chunks are combined inside one zipfile per thread, maintaining the current file-per-thread invariant. The minizip library, a contributed part of the zlib sources, is added as a submodule and used to write and read the zipfile (via new zipfile_ostream_t and zipfile_file_reader_t classes, respectively). If the submodule is not present, we fall back to a gzipped single file (if we have a system zlib). Issue: #5538

Splits post-processed offline drmemtrace files into chunks of a fixed instruction count. These chunks are combined inside one zipfile per thread, maintaining the current file-per-thread invariant. The minizip library, a contributed part of the zlib sources, is added as a submodule and used to write and read the zipfile (via new zipfile_ostream_t and zipfile_file_reader_t classes, respectively). If the submodule is not present, we fall back to a gzipped single file (if we have a system zlib). Adds a new marker type holding the chunk instr count. Adds a new -chunk_instr_count option for specifying the count. We pass a small value to the drcacheoff.simple and invariant checker tests to test multiple chunks in one zip file. Documents the change. Refactors drmemtrace_get_timestamp_from_offline_trace to handle new markers. Updates tests to handle the new marker. Adds a new marker type as a chunk footer, to identify truncation. Re-emits the last timestamp+cpu at the top of each new chunk. The reader skips them in linear walking. Refactors delayed branch handling so we can identify instr entries. Removes the histogram.gzip test as it is superfluous since we've been gzipping files for a long time; plus, it blindly assumes it can gzip any output, including a .zip. Issue: #5538

Cleans up the drmemtrace i/o classes by removing redundant "virtual" keywords and adding missing "explicit" constructor qualifiers. Issue: #5538

) Cleans up the drmemtrace i/o classes by removing redundant "virtual" keywords and adding missing "explicit" constructor qualifiers. Issue: #5538

Adds buffering to the zipfile reader, which eliminates the 60% slowdown it showed compared to our gzip reader and in fact results in faster read times than the gzip reader. Adds similar buffering to the gzip reader, which results in an 18% speedup and matches the new zipfile speed. Issue: #5538

PR #5633 added code to skip the duplicate timestamps at the top of each chunk: but its logic assumed there would never be two legitimate timestamps with identical values. That does happen, in particular in our online-drcachesim tests on Windows. This resulted in the invariant_checker not seeing some timestamp entries, causing its exception for non-fetched instrs across thread switches to not apply and resulting in invariant error reports. We fix this by skipping the first timestamp in each chunk by instruction count instead. We'll want the insruction and chunk counts for #5538 and I was about to add those fields in any case. Fixes #5636

) Cleans up the drmemtrace i/o classes by removing redundant "virtual" keywords and adding missing "explicit" constructor qualifiers. Issue: #5538

Adds buffering to the zipfile reader, which eliminates the 60% slowdown it showed compared to our gzip reader and in fact results in faster read times than the gzip reader. Adds similar buffering to the gzip reader, which results in an 18% speedup and matches the new zipfile speed. Issue: #5538

PR #5633 added code to skip the duplicate timestamps at the top of each chunk: but its logic assumed there would never be two legitimate timestamps with identical values. That does happen, in particular in our online-drcachesim tests on Windows. This resulted in the invariant_checker not seeing some timestamp entries, causing its exception for non-fetched instrs across thread switches to not apply and resulting in invariant error reports. We fix this by skipping the first timestamp in each chunk by instruction count instead. We'll want the insruction and chunk counts for #5538 and I was about to add those fields in any case. Fixes #5636

Adds an invariant check that chunk boundaries contain the proper number of instructions. Fixes off-by-one errors in chunk counts due to not checking at end-of-loop, found by visual inspection (with a local view tool that prints instr counts; that will be committed later) and confirmed to break this new check without the fix in the tool.drcacheoff.invariant_checker test. Issue: #5538

Adds two new files generated by raw2trace, each containing 4-entry records of <tid, timestamp, cpuid, instr_count>. These are used to take a global or per-cpu target instruction count and determine the corresponding instruction count within each software thread, for seeking within each trace file. One file is globally sorted by timestamp, and the other is a zip file with a separate component for each cpuid holding entries sorted by timestamp for that cpu. Adds new invariant_checker tests for each file by passing them in for test_mode. The invariant_checker re-constructs the same sorted sequences using the trace data and confirms it matches the data in the files. This involves adding a new zipfile_stream_t interface for a simple continuous stream of each zipfile component in turn. Issue: #5538

Adds two new files generated by raw2trace, each containing 4-entry records of <tid, timestamp, cpuid, instr_count>. These are used to take a global or per-cpu target instruction count and determine the corresponding instruction count within each software thread, for seeking within each trace file. One file is globally sorted by timestamp, and the other is a zip file with a separate component for each cpuid holding entries sorted by timestamp for that cpu. Adds new invariant_checker tests for each file by passing them in for test_mode. The invariant_checker re-constructs the same sorted sequences using the trace data and confirms it matches the data in the files. This involves adding a new zipfile_stream_t interface for a simple continuous stream of each zipfile component in turn. Switches raw2trace's thread_data_ to a protected field using unique_ptr heap indirection so subclasses can extend raw2trace_thread_data_t and use the new schedule generation code directly. Issue: #5538

Adds a missing piece from part 5 PR #5713 where raw2trace's thread_data_ was indirected to support a subclass extending it. However, its destructor was not virtual, which prevented such extension. We address that here. Issue: #5538

Adds a new memref_stream_t interface class which provides the record and instruction count to drmemtrace analysis tools. A pointer to this interface is passed to new extended-argument versions of analysis_tool_t's initialize() and parallel_shard_init() functions, which are now what is called by the analyzer. The base class implementation of these new functions simply calls the old versions, which are now deprecated but will continue to work. This new interface is not just for convenience: the tool itself cannot accurately count when the reader skips over records, as will happen with seeking. The counting must be done in the reader. (If the tool indeed wants to count only records/instrs that it actually sees, it can continue using its own counters.) Updates the view tool to use the new interface to obtain the record ordinal, replacing its own counter. The tool is expanded to print a new column with the instruction ordinal. The view tool test is updated along with example output in the docs. Issue: #5538

Adds a new memtrace_stream_t interface class which provides the record and instruction count to drmemtrace analysis tools. A pointer to this interface is passed to new extended-argument versions of analysis_tool_t's initialize() (initialize_stream()) and parallel_shard_init() (parallel_shard_init_stream()) functions, which are now what is called by the analyzer. The base class implementation of these new functions simply calls the old versions, which are now deprecated but will continue to work. This new interface is not just for convenience: the tool itself cannot accurately count when the reader skips over records, as will happen with seeking. The counting must be done in the reader. (If the tool indeed wants to count only records/instrs that it actually sees, it can continue using its own counters.) We had considered other avenues for analysis_tool_t to obtain things like the record and instruction ordinals within the stream, in the presence of skipping: we could add fields to memref but we'd either have to append and have them at different offsets for each type or we'd have to break compatbility to prepend every time we added more; or we could add parameters to process_memref(). Passing an interface to the init routines seems the simplest and most flexible. Updates the view tool to use the new interface to obtain the record ordinal, replacing its own counter. The tool is expanded to print a new column with the instruction ordinal. The view tool test is updated along with example output in the docs. Issue: #5538

Adds a new skip_instructions() reader iterator interface. It is a linear walk for every type of reader except a chunked zipfile walking a single thread. Adds a drcachesim command line option -skip_instrs which triggers the analyzer to skip from the start before passing anything to the tool. Refactors the reader_t++ to provide a process_input_entry to update state while skipping. Adds a unit test with an added trace file with a small chunk size. The test checks the view output for every skip value from 0 to over double the chunk size. Leaves several pieces for future work: + Recording the record count in each chunk so we have an accurate count after skipping. + Presenting global headers skipped over as memtrace_stream_t values that tools can query. + Reading the schedule files for serial skipping (or the planned cpu iterator and skipping). + Repeating the timestamp+cpu for non-zipfile skipping. Issue: #5538

Adds a new skip_instructions() reader iterator interface. It is a linear walk for every type of reader except a chunked zipfile walking a single thread. Adds a drcachesim command line option -skip_instrs which triggers the analyzer to skip from the start before passing anything to the tool. Refactors the reader_t++ to provide a process_input_entry to update state while skipping. Adds a unit test with an added trace file with a small chunk size. The test checks the view output for every skip value from 0 to over double the chunk size. Leaves several pieces for future work: + Full support for skipping from the midde: the timestamp,cpuid will not always be duplicated with the current code. + Recording the record count in each chunk so we have an accurate count after skipping. + Presenting global headers skipped over as memtrace_stream_t values that tools can query. + Reading the schedule files for serial skipping (or the planned cpu iterator and skipping). + Repeating the timestamp+cpu for non-zipfile skipping. Issue: #5538

Adds a reader subclass to raw2trace for computing the memref_t record count for each chunk. A new marker is inserted in the chunk header which the zipfile skip code uses to obtain the correct ref count when skipping over chunks. Adds a count suppression feature where the view tool prints 0 for the record count for the synthetic timestamp+cpu added after a seek. Updates the seek test with a new trace and new expected output. Issue: #5538

Adds cached values of the 5 top-level headers to the metrace_stream_t interface and implements this for the readers. Adds checks of these values to invariant_checker. Adds a test with skipped instructions using the skip_unit_tests checked-in trace. Reverses the order of the initial 2 markers and the tid,pid pair sent to the reader to avoid a 0 tid in tools. I am surprised this hasn't caused more problems and I thought it was already this fixed way. Issue: #5538

Adds a reader subclass to raw2trace for computing the memref_t record count for each chunk. A new marker is inserted in the chunk header which the zipfile skip code uses to obtain the correct ref count when skipping over chunks. Adds a count suppression feature where the view tool prints 0 for the record count for the synthetic timestamp+cpu added after a seek. Updates the seek test with a new trace and new expected output. Issue: #5538

Adds cached values of the 5 top-level headers to the memtrace_stream_t interface and implements this for the readers. Adds checks of these values to invariant_checker. Adds a test with skipped instructions using the skip_unit_tests checked-in trace. Reverses the order of the initial 2 markers and the tid,pid pair sent to the reader to avoid a 0 tid in tools. I am surprised this hasn't caused more problems and I thought it was already this fixed way. Long-term maybe we could swap in the file itself but this is complex as it moves the version field. Issue: #5538

derekbruening · 2023-01-28T01:04:16Z

There is a bug in how reader_t is skipping the duplicate top-of-chunk timestamp,cpu header pair: it is assuming single-thread operation and completely fails for serial mode. We saw this in a larger traces with serial mode and it can be reproduced in a small trace where the headers are skipped when there is no chunk:

        8        0: T3 <marker: timestamp 1001>
        9        0: T3 <marker: tid 3 on core 2>
       10        1: T3 ifetch       4 byte(s) @ 0x000000000000002a non-branch
       11        2: T3 ifetch       4 byte(s) @ 0x000000000000002a non-branch
------------------------------------------------------------
       12        3: T7 ifetch       4 byte(s) @ 0x000000000000002a non-branch
       13        4: T7 ifetch       4 byte(s) @ 0x000000000000002a non-branch
------------------------------------------------------------
       14        5: T3 ifetch       4 byte(s) @ 0x000000000000002a non-branch
------------------------------------------------------------
       15        5: T7 <marker: timestamp 1004>
       16        5: T7 <marker: tid 7 on core 3>
       17        6: T7 ifetch       4 byte(s) @ 0x000000000000002a non-branch

Fixes a bug where reader_t's detection of duplicated timestamp,cpuid headers at the start of a chunk assumed single-threaded mode. We switch to using a simple per-tid chunk footer trigger. Adds a test to view_test via a new serial mock which takes in trace_entry_t and allows testing of the interleaving code. Tests both proper chunk header elision as well as replicating the bug where elision should not happen. The test revealed a separate bug in the view tool where the version and filetype ordinals, for delaying, were not updated on new threads. That is fixed here as well as otherwise the new tests fail. Issue: #5538

Fixes a bug where reader_t's detection of duplicated timestamp,cpuid headers at the start of a chunk assumed single-threaded mode. We switch to using a simple per-tid chunk footer trigger. Adds a test to view_test via a new serial mock which takes in trace_entry_t and allows testing of the interleaving code. Tests both proper chunk header elision as well as replicating the bug where elision should not happen. Fixes problems revealed in the drcacheoff.skip test by this change: do not increment the ref count for the hidden markers at the start of a chunk when skipping in a zipfile as well as in raw2trace. The test revealed a separate bug in the view tool where the version and filetype ordinals, for delaying, were not updated on new threads. That is fixed here as well as otherwise the new tests fail. Issue: #5538

Removes multi-input support from file_reader_t and other readers now that the scheduler_t owns that. Specifically: + Removes read_next_thread_entry() and requires that read_next_entry() always check the queue (via a provided helper function). + Removes skip_thread_instructions() and refactors the pre-skip header reading and the post-skip walking while remembering timestamps. Places these latter two inside reader_t for use by all readers, with zipfile overriding just the fast skip in the middle and sharing all the other code. This refactoring and sharing solves the problem of missing timestamps when skipping from the middle. + Removes the arrays of data for multiple inputs from file_reader_t and all subclasses. Updates the view_test to use a scheduler for its multiple-input mock reader. While at it, removes is_complete(). Issue: #5843, #5538

Fixes a boundary case of skipping 1 instruction when the scheduler has already read an instruction record but not yet passed it to the user. Fixes a boundary case of back-to-back regions of interest. Adds test cases. Issue: #5538

derekbruening · 2024-02-15T17:04:57Z

Split repeating physical address markers as #6654

derekbruening added Type-Feature Component-DrMemtrace labels Jun 21, 2022

derekbruening mentioned this issue Jun 21, 2022

Add instruction encoding entries to drmemtrace #5520

Closed

derekbruening mentioned this issue Aug 29, 2022

i#5505 kernel trace: Add drir2trace to convert DR's IR to trace entries. #5624

Closed

derekbruening mentioned this issue Aug 31, 2022

i#5538 memtrace seek, part 1: Write in pieces to zipfiles #5633

Merged

derekbruening added a commit that referenced this issue Sep 6, 2022

i#5538 memtrace seek, part 2: Remove redundant "virtual" keywords

f28d07b

Cleans up the drmemtrace i/o classes by removing redundant "virtual" keywords and adding missing "explicit" constructor qualifiers. Issue: #5538

derekbruening mentioned this issue Sep 6, 2022

i#5538 memtrace seek, part 2: Remove redundant "virtual" keywords #5639

Merged

derekbruening mentioned this issue Sep 8, 2022

i#5538 memtrace seek, part 3: Add reader buffering #5640

Merged

derekbruening mentioned this issue Sep 8, 2022

i#5636: Fix elided duplicate timestamps #5643

Merged

dolanzhao pushed a commit that referenced this issue Sep 12, 2022

i#5538 memtrace seek, part 2: Remove redundant "virtual" keywords (#5639

6d7d062

) Cleans up the drmemtrace i/o classes by removing redundant "virtual" keywords and adding missing "explicit" constructor qualifiers. Issue: #5538

derekbruening mentioned this issue Oct 20, 2022

Add core-oriented drmemtrace iterator #5694

Open

derekbruening mentioned this issue Oct 28, 2022

i#5538 memtrace seek, part 4: Check chunk boundaries #5710

Merged

derekbruening mentioned this issue Oct 31, 2022

i#5538 memtrace seek, part 5: Add schedule index files #5713

Merged

derekbruening mentioned this issue Nov 3, 2022

i#5538 memtrace seek, part 6: Virtualize raw2trace_thread_data #5715

Merged

derekbruening mentioned this issue Nov 8, 2022

i#5538 memtrace seek, part 7: Add instr count to tools #5721

Merged

derekbruening mentioned this issue Nov 11, 2022

i#5538 memtrace seek, part 8: Add skip_thread_instructions() #5731

Merged

derekbruening mentioned this issue Nov 15, 2022

i#5538 memtrace seek, part 9: Add accurate record count #5737

Merged

derekbruening mentioned this issue Nov 15, 2022

i#5538 memtrace seek, part 10: Top-level headers #5739

Merged

derekbruening mentioned this issue Jan 28, 2023

i#5538 memtrace seek, part 11: Fix omitted header problems #5840

Merged

derekbruening mentioned this issue Mar 9, 2023

i#5843 scheduler: Refactor readers to be single-input #5900

Merged

derekbruening mentioned this issue Aug 17, 2023

i#5538 skip: Fix skip-1-instr and skip-0-instr cases #6270

Merged

derekbruening mentioned this issue Feb 15, 2024

Repeat drmemtrace physical address markers in new chunks #6654

Open

derekbruening mentioned this issue Feb 15, 2024

i#6648,i#6593: Add drmemtrace trimming tool with re-chunking support #6651

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fast-forward feature to skip N instructions in drmemtrace analysis tools #5538

Add fast-forward feature to skip N instructions in drmemtrace analysis tools #5538

derekbruening commented Jun 21, 2022 •

edited

Loading

derekbruening commented Jun 21, 2022

derekbruening commented Jan 28, 2023

derekbruening commented Feb 15, 2024

Add fast-forward feature to skip N instructions in drmemtrace analysis tools #5538

Add fast-forward feature to skip N instructions in drmemtrace analysis tools #5538

Comments

derekbruening commented Jun 21, 2022 • edited Loading

derekbruening commented Jun 21, 2022

derekbruening commented Jan 28, 2023

derekbruening commented Feb 15, 2024

derekbruening commented Jun 21, 2022 •

edited

Loading