Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add core-oriented drmemtrace iterator #5694

Open
derekbruening opened this issue Oct 20, 2022 · 0 comments
Open

Add core-oriented drmemtrace iterator #5694

derekbruening opened this issue Oct 20, 2022 · 0 comments

Comments

@derekbruening
Copy link
Contributor

When drmemtrace tools want to analyze a trace using the recorded schedule they may want to look at hardware threads rather than software threads. An iterator over each core would be useful for this purpose. For interactions with fast seeking #5538 the plan is to generate a schedule summary file that contains <tid, timestamp, cpuid, instr-count> tuples to make it possible to compute instruction counts on a core across multiple software threads.

derekbruening added a commit that referenced this issue May 12, 2023
Adds replay of the as-traced schedule using the cpu_schedule.zip file
written by raw2trace.  That file is converted into the record+replay
format for dynamic schedules and the existing replay code leveraged.

This implements as-traced cpu-oriented parallel iteration in the
scheduler which is part of #5694.  Adding analysis tool support for
this will be done separately.

Updates all replays (whether as-traced or as-previously) to consider
DEPENDENCY_TIMESTAMPS to indicate whether to have one output wait if
it gets ahead of another's timestamp.

Adds a missing check for output count mismatches on as-previously
replay.

Issue: #5843, #5694
derekbruening added a commit that referenced this issue May 15, 2023
Adds replay of the as-traced schedule using the cpu_schedule.zip file
written by raw2trace. That file is converted into the record+replay
format for dynamic schedules and the existing replay code leveraged.

This implements as-traced cpu-oriented parallel iteration in the
scheduler which is part of #5694. Adding analysis tool support for this
will be done separately.

Updates all replays (whether as-traced or as-previously) to consider
DEPENDENCY_TIMESTAMPS to indicate whether to have one output wait if it
gets ahead of another's timestamp.

Removes STATUS_IDLE since we always use STATUS_WAIT. We're between
releases so we can delete from the public enum for this
still-under-development component without worrying about binary
compatibility.

Renames schedule_entry_t.instr_count to start_instruction to make it clearer
it is an ordinal and not a duration.

Adds a missing check for output count mismatches on as-previously
replay.

Adds unit tests of synthetic and from-file replay.

Issue: #5843, #5694
@derekbruening derekbruening self-assigned this Aug 24, 2023
derekbruening added a commit that referenced this issue Aug 25, 2023
Adds a new type of sharding for drmemtrace analysis tools: by core
instead of by thread.

Introduces a shard_type_t enum (SHARD_BY_THREAD and SHARD_BY_CORE)
passed to a new analysis_tool_t::initialize_shard_type() function to
inform tools of the shard type (this cannot be easily added to the
stream interface as the scheduler is not aware of the shard type).

Adds a new memtrace_stream_t::get_output_cpuid() query to get the
output cpu ordinal, or for replaying as-traced the original traced
cpuid (#6262).  Implements this for the scheduler.

Generalizes analyzer_t to take in scheduler options for SHARD_BY_CORE
to support analysis tools using the full range of schedules.  In this
mode, the core count is the worker count.  Updates the shard index to
be the core ordinal.  Adds time-based scheduling support with
analyzer_t using wall-clock time as the current time.

Adds a number of options to set sharding mode (-core_sharding,
-core_serial (not yet implemented)) and control the schedule
(-sched_quantum, -sched_time, sched_order_time, -record_file,
-replay_file, -cpu_schedule_file).

Updates the basic_counts tool to support core sharding.

Adds a new test core_sharded_test which leverages the analyzer_multi
and option parsing to test the top-level options within a framework
that can capture the output and run multiple tests sequentially in a
simpler framework than having a separate test with an output file for
each parameter being tested.

Left as future work:
+ Convert scheduler_launcher into a new schedule_stats tool
+ Add a new record to indicate STATUS_WAIT
+ Add -core_serial support
+ Convert drcachesim default and -cpu_scheduling to use
  get_output_cpuid()

Issue: #5694
derekbruening added a commit that referenced this issue Aug 31, 2023
Adds a new type of sharding for drmemtrace analysis tools: by core
instead of by thread.

Introduces a shard_type_t enum (SHARD_BY_THREAD and SHARD_BY_CORE)
passed to a new analysis_tool_t::initialize_shard_type() function to
inform tools of the shard type (this cannot be easily added to the
stream interface as the scheduler is not aware of the shard type).

Adds a new memtrace_stream_t::get_output_cpuid() query to get the output
cpu ordinal, or for replaying as-traced the original traced cpuid
(#6262). Implements this for the scheduler. This addresses #6262.

Generalizes analyzer_t to take in scheduler options for SHARD_BY_CORE to
support analysis tools using the full range of schedules. In this mode,
the core count is the worker count. Updates the shard index to be the
core ordinal. Adds time-based scheduling support with analyzer_t using
wall-clock time as the current time.

Adds a number of options to set sharding mode (-core_sharding,
-core_serial (not yet implemented)) and control the schedule
(-sched_quantum, -sched_time, sched_order_time, -record_file,
-replay_file, -cpu_schedule_file).

Updates the basic_counts tool to support core sharding.

Adds a new test core_sharded_test which leverages the analyzer_multi and
option parsing to test the top-level options within a framework that can
capture the output and run multiple tests sequentially in a simpler
framework than having a separate test with an output file for each
parameter being tested.

Left as future work:
+ Convert scheduler_launcher into a new schedule_stats tool
+ Add a new record to indicate STATUS_WAIT
+ Add -core_serial support
+ Convert drcachesim default and -cpu_scheduling to use
get_output_cpuid()

Issue: #5694, #6262
Fixes #6262
derekbruening added a commit that referenced this issue Sep 21, 2023
Adds first-class counting of threads per shard in the basic_counts
tool, for use with core-sharded operation.

Updates the core-sharded tests with sanity checks.

Issue: #5694
derekbruening added a commit that referenced this issue Sep 22, 2023
Adds first-class counting of threads per shard in the basic_counts tool,
for use with core-sharded operation.

Updates the core-sharded tests with sanity checks.

Issue: #5694
derekbruening added a commit that referenced this issue Sep 29, 2023
Adds several routines to the memtrace_stream_t interface for
drmemtrace analysis tools in core-sharded mode:
+ get_workload_id()
+ get_input_id()
+ get_input_interface()

Adds a new analysis_unit_tests executable with some sanity tests.
Splits out the mock_reader_t and helpers used by scheduler_unit_tests
to share them with the new test.

Documents the additions.

Fixes a bug with core-sharded analysis tools where parallel_shard_exit
was called for every thread, resulting use-after-frees when more than
one thread was on a core and the tool deletes its shard data structure
in the parallel_shard_exit routine (most of our tools do not, which is
why this was not noticed before).

Issue: #5694
derekbruening added a commit that referenced this issue Oct 2, 2023
Adds several routines to the memtrace_stream_t interface for drmemtrace
analysis tools in core-sharded mode:
+ get_workload_id()
+ get_input_id()
+ get_input_interface()

Adds a new analysis_unit_tests executable with some sanity tests. Splits
out the mock_reader_t and helpers used by scheduler_unit_tests to share
them with the new test.

Documents the additions.

Fixes a bug with core-sharded analysis tools where parallel_shard_exit
was called for every thread, resulting use-after-frees when more than
one thread was on a core and the tool deletes its shard data structure
in the parallel_shard_exit routine (most of our tools do not, which is
why this was not noticed before).

Issue: #5694
derekbruening added a commit that referenced this issue Nov 7, 2023
Adds a new TRACE_MARKER_TYPE_WAIT marker which is a synthetic marker
inserted in core-sharded drmemtrace analysis tool mode when the
scheduler returns STATUS_WAIT.  This is meant for tools which analyze
schedules themselves.

Adds a unit test.

Issue: #5694
derekbruening added a commit that referenced this issue Nov 8, 2023
Adds a new TRACE_MARKER_TYPE_CORE_WAIT marker which is a synthetic
marker inserted in core-sharded drmemtrace analysis tool mode when the
scheduler returns STATUS_WAIT. This is meant for tools which analyze
schedules themselves.

Adds a unit test.

Issue: #5694
derekbruening added a commit that referenced this issue Dec 20, 2023
Adds a new scheduler option single_lockstep_output which multiplexes
the virtual core output streams onto a single global stream.  This is
simple to implement as the existing scheduler_t::stream_t class
already multiplexes inputs onto an output.

Hooks up the drcachesim launcher -core_serial option to this new
scheduler mode.

Updates the schedule_stats, basic_counts, and cache_simulator tools to
support core_serial.  For cache_simulator, the existing thread-to-core
mapping code for round-robin and for -cpu_scheduling is kept for when
in thread-sharded mode; in core-sharded mode, the scheduler's cpuid is
mapped to a core index.

Adds a core_serial test of schedule_stats and basic_counts and a test
of cache_simulator using the scheduler's -cpu_schedule_file as-traced
mode.

Adds some dr$sim unit tests for cpuid to core mapping and error modes.

Issue: #5694
derekbruening added a commit that referenced this issue Dec 21, 2023
Adds a new scheduler option single_lockstep_output which multiplexes the
virtual core output streams onto a single global stream. This is simple
to implement as the existing scheduler_t::stream_t class already
multiplexes inputs onto an output.

Hooks up the drcachesim launcher -core_serial option to this new
scheduler mode.

Updates the schedule_stats, basic_counts, and cache_simulator tools to
support core_serial. For cache_simulator, the existing thread-to-core
mapping code for round-robin and for -cpu_scheduling is kept for when in
thread-sharded mode; in core-sharded mode, the scheduler's cpuid is
mapped to a core index.

Adds a core_serial test of schedule_stats and basic_counts and a test of
cache_simulator using the scheduler's -cpu_schedule_file as-traced mode.

Adds some dr$sim unit tests for cpuid to core mapping and error modes.

Issue: #5694
derekbruening added a commit that referenced this issue Jan 19, 2024
Adds 2 new memtrace_stream_t interfaces to simplify generalizing
tools to handle either thread or core sharded operation:

+ get_shard_index() returns a 0-based shard ordinal regardless
  of whether core-sharded or thread-sharded.
+ get_input_tid() returns the thread id of the current input.
  This is a convenience method for use in parallel_shard_init_stream()
  prior to access to any memref_t records.

Changes an existing interface:

+ Guarantees that the shard_index passed to parallel_shard_init_stream()
  is a 0-based ordinal.

Implements the 2 new interfaces in the scheduler and adds two new
interface there:

+ get_output_stream_ordinal() to get the underlying output when using
  single_lockstep_output.
+ get_output_cpuid(ord) taking in an ordinal so the analyzer or other
  user can get the cpuids statically when using single_lockstep_output.

Removes dr$sim's manual mapping of cpuid to core index in favor of
using the new get_shard_index().

Updates all the analysis tools to use the new interfaces and to
generalize their code to either handle both thread and core shards
(reuse_time, reuse_distance, basic_counts, histogram, opcode_mix,
syscall_mix, record_filter) or explicitly return an error for
core-sharded modes (func_view, invariant_checker).  (schedule_stats
and record_filter needed no changes.)

Adds some sanity tests on the new interfaces.

Adds a new end-to-end test running the newly-updated tools as
-core_sharded.

Issue: #5694
derekbruening added a commit that referenced this issue Jan 25, 2024
Adds 2 new memtrace_stream_t interfaces to simplify generalizing tools
to handle either thread or core sharded operation:

+ get_shard_index() returns a 0-based shard ordinal regardless of
whether core-sharded or thread-sharded.
+ get_tid() returns the thread id of the current input. This is a
convenience method for use in parallel_shard_init_stream() prior to
access to any memref_t records.

For online analysis where there's a single input, the scheduler
remembers and returns the last memref.data.tid for get_tid() and uses
the dynamic tid discovery order for get_shard_index().

Changes an existing interface:

+ Guarantees that the shard_index passed to parallel_shard_init_stream()
is a 0-based ordinal.

Implements the 2 new interfaces in the scheduler and adds two new
interface there:

+ get_output_stream_ordinal() to get the underlying output when using
single_lockstep_output.
+ get_output_cpuid(ord) taking in an ordinal so the analyzer or other
user can get the cpuids statically when using single_lockstep_output.
Analysis tools must dynamically discover the cpuids (stopped short of
making this a memtrace_stream_t interface, as analysis tools in general
must dynamically discover most things already).

Removes dr$sim's manual mapping of cpuid to core index in favor of using
the new get_shard_index().

Updates all the analysis tools to use the new interfaces and to
generalize their code to either handle both thread and core shards
(reuse_time, reuse_distance, basic_counts, histogram, opcode_mix,
syscall_mix, record_filter) or explicitly return an error for
core-sharded modes (func_view, invariant_checker). (schedule_stats and
record_filter needed no changes.)

Updates several unit tests to handle these changes:
+ Expands the default_memtrace_stream_t to be suitable as a mock stream
for unit tests with the new interfaces.
+ Skips invariant stream checks for the mock stream by checking its
input interface, since the stream itself is no longer null.
+ Fixes drcachesim unit tests which were not initializing tid.   

Adds some sanity tests on the new interfaces.

Adds a new end-to-end test running the newly-updated tools as
-core_sharded. Limits the reuse_time histogram printing output to avoid
hanging CMake's regex matcher in this test.

Issue: #5694
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant