Vis2 (ROSS-org#104)

* Proof of concept. Prints ROSS parameters to given log file at each GVT which is a percent completion of the end time * Two new functions 'tw_stats_log()' and 'tw_gvt_log()' to facilitate output of event and gvt data respectively throughout the simulation * pulled out stats collection from tw_stats() for use in collecting time series data * some changes to printing stats to file * changed stats output so it only reports for each time interval instead of running totals * added in command line option for changing name of stats output file * set up stats output files with header line * whoops, used local PE id instead of global for printing file headers * command line param to turn stats on/off * cleaned up stats output * made some minor changes to the way the stats are collected * added tree for collecting ross level stats over specified time interval * adding parallel io for stats collection * fixed some bugs, added clock cycle counters * changed increment_stat() to increment by a variable amount * print out only 100 files max per directory * fixed some bugs in stats output * cleaning up some code * minor change to the stats output filenames * some code clean up * fixed mistake in directory name for stats output * fixed bug in tree node deletion for stats collection * finished adding in all stat increment calls * fixed some minor issues * fixed data collection cycle counters * fixed bug that caused ROSS to segfault when not running data collection * removed some unnecessary stats from gvt data collection * added a readme for the vis component * starting to add real time sampling for ROSS stats * added collection for time ahead of GVT * buffer added for storing data being collected throughout sim * fixed some minor issues with buffer implementation * pulling data collection related functions into its own file * better implementation for the buffer * cleaning up/better orgainizing the data collection code * fixed bug in data collection * fixed a bug in the buffer * integrating gvt collection with buffer * added option to disable stats output (but not any computation) * fixed bug in offsets for writing out buffer * fixed configurable filename to use with buffer * changing naming style for data collection to be more consistent * added in collection for LP time ahead of GVT (instead of just KPs) * added collection of cycle counters to real time collection * collecting more data with real time sampling * Updated README and changed to a markdown file * changed name for readme file for each sim * added some runtime options for buffer configuration * fixed a couple of small bugs in the real-time data collection * minor fix in data collection and reordering output * fixed bugs in writing GVT data collection to buffer * added a couple of extra counters to real time collection * enable gvt and real time data collections to be run together * minor adjustments to real time data collection * some minor changes to data collection * fix for issue with data type for real time sampling on BGQ * updated vis README * adding some basic event level collection * fixed some minor issues in the real time data collection * removed some redundant collection * commented out some calls related to the virtual time sampling * collect sending LP instead of just sending PE for event collection * adding in support for models to add their own event collection through function pointers * adding support for data collection of all events for specified LP types * adding documentation for event data collection * some minor changes for debugging * adding in KP and LP level stats for GVT data collection * adding in KP/LP level stats for RT data collection * collecting info from each PE about the number of LPs it has in order to correctly read binary output * fixed minor issues with buffer * trying to make binary output smaller for gvt collection * making binary output smaller for real time sampling * fixed minor issue related to src LP for event tracing * cleaning up the instrumentation code * Update README-vis.md * Forgot a detail in the vis README * minor changes to naming and README-vis * minor update to README-vis * adding in metadata to instrumentation README output for binary reader * updating subproject commit for template-model * subproject commits for IO and ROSS-Models * removing old virtual time sampling cruft * missed some of the virtual time remains * moved instrumentation src into its own directory * added in error handling for event tracing setup * some more event tracing error handling * added ability for model to choose not to collect specific events * updates reflecting event tracing changes * added in real time duration of event to tracing * better handling when output file already exists * getting rid of some magic numbers and some minor cleanups * fixed cycle counters for stats related work * removed some redundancy from data being collected * divide cycle counters by clock rate before storing in buffer/writing out * enabled instrumentation for optimistic real time scheduling * enabling instrumentation for conservative * fixing issue in file creation * updates to documentation * fixing error reported by Travis * and another error reported by Travis * a couple more minor fixes * minor error fix * fixed issue with using PRIu64 * adding tests to PHOLD for instrumentation * added a couple more tests to PHOLD * one more test * Removed some unused code and added some more tests * fixed issue with using event tracing * removing some unused code * some minor edits to warnings/edits in instrumentation
carothersc-zz · Dec 5, 2016 · ec35ef2 · ec35ef2
1 parent e8086b8
commit ec35ef2
Show file tree

Hide file tree

Showing 24 changed files with 1,454 additions and 62 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,4 @@
 # ignore models that aren't already in tree
 # (must manually add models to override this)
 models/*
+*.swp
diff --git a/README-vis.md b/README-vis.md
@@ -0,0 +1,173 @@
+## README for ROSS Instrumentation
+
+Currently several different types of instrumentation have been added to ROSS: GVT-based, real time sampling, and event tracing.  
+All 3 instrumentation types can be used independently or together.  The options for 
+the data collection show under the title "ROSS Stats" when you run `--help` with a ROSS/CODES model.  
+No instrumentation type requires changes to the model-level code in order to run. 
+Just make sure to update ROSS for the instrumentation and then rebuild your model (including CODES if necessary). 
+Although the event tracing feature can run without requiring changes to the model code, some minor changes to model code are necessary in order to collect event type information. 
+The details are described in the event tracing section below.
+
+For all instrumentation types, you can use `--stats-filename` option to set a prefix for the output files.  
+All of the output files are stored in a directory named `stats-output` that is created in the running directory.
+
+### GVT-based Instrumentation
+This collects data immediately after each GVT and can be turned on by using `--enable-gvt-stats=1` at runtime. By default, the data is collected on a PE basis, but some metrics can be changed to tracking on a KP or LP basis (depending on the metric).  To turn on instrumentation for the KP/LP granularity, use `--granularity=1`.    
+
+When collecting only on a PE basis (i.e., `--granularity=0`), this is the format of the data:
+
+```
+PE_ID, GVT, all_reduce_count, events_processed, events_aborted, events_rolled_back, event_ties, 
+total_rollbacks, secondary_rollbacks, fossil_collects, network_sends,
+network_recvs, remote_events, efficiency
+```
+
+When collecting on a KP/LP basis, this is the format:
+
+```
+PE_ID, GVT, all_reduce_count, events_aborted, event_ties, fossil_collect, efficiency, 
+total_rollbacks_KP0, secondary_rollbacks_KP0, ..., total_rollbacks_KPi, secondary_rollbacks_KPi, 
+events_processed_LP0, events_rolled_back_LP0, network_sends_LP0, network_recvs_LP0, remote_events_LP0, 
+..., events_processed_LPj, events_rolled_back_LPj, network_sends_LPj,network_recvs_LPj,
+remote_events_LPj
+```
+
+where i,j are the total number of KPs and LPs per PE, respectively.  
+
+### Real Time Sampling
+This collects data at real time intervals specified by the user.  
+It is turned on using 
+`--real-time-samp=n`, where n is the number of milliseconds per interval.  
+This collects all of the same data as the GVT-based instrumentation, as well as some other metrics, which is the difference in GVT and virtual time for each KP and cycle counters for the PEs. 
+The difference in GVT and KP virtual time is always recorded per KP, regardless of how `--granularity` is set.
+The granularity can be switched as described in the section on the GVT-based instrumentation.
+
+When collecting on a PE basis, this is the data format:
+
+```
+PE_ID, current_real_time, current_GVT, time_ahead_GVT_KP0, ..., time_ahead_GVT_KPi,
+network_read_CC, gvt_CC, fossil_collect_CC, event_abort_CC, event_processing_CC,
+priority_queue_CC, rollbacks_CC, cancelq_CC, avl_CC, buddy_CC, lz4_CC,
+events_aborted, pq_size, remote_events, network_sends, network_recvs,
+event_ties, fossil_collects, num_GVT, events_processed, events_rolled_back,
+total_rollbacks, secondary_rollbacks, 
+```
+
+For the KP/LP granularity:
+```
+PE_ID, current_real_time, current_GVT, time_ahead_GVT_KP0, ..., time_ahead_GVT_KPi,
+network_read_CC, gvt_CC, fossil_collect_CC, event_abort_CC, event_processing_CC,
+priority_queue_CC, rollbacks_CC, cancelq_CC, avl_CC, buddy_CC, lz4_CC,
+events_aborted, pq_size, event_ties, fossil_collects, num_GVT,
+total_rollbacks_KP0, secondary_rollbacks_KP0, ..., total_rollbacks_KPi, secondary_rollbacks_KPi,
+events_processed_LP0, events_rolled_back_LP0, network_sends_LP0, network_recvs_LP0, remote_events_LP0,
+..., events_processed_LPj, events_rolled_back_LPj, network_sends_LPj,network_recvs_LPj,
+remote_events_LPj
+```
+
+
+### Event Tracing
+There are two ways to collect the event trace.  One is to collect data only about events that are causing rollbacks.
+When an event that should have been processed in the past is received, data about this event is collected (described below).  The other event collection is for all events.  
+For this collection, ROSS can directly access the source and destination LP IDs for each event, as well as the 
+received virtual timestamp and real time duration of the event.  It will also record the real time that the event is computed at.
+Because event types are determined by the model developer, ROSS cannot directly access this.
+The user can create a callback function that ROSS will use to collect the event type, and ROSS will then handle storing
+the data in the buffer and the I/O.  The benefit to this is that the user can choose to collect other data about the event,
+and ROSS will handle this as well.  It's important to remember that this data collection will result in a lot of output, since it's happening per event.  
+
+This is implemented similarly to how the LP type function pointers are implemented.  
+(As a side note, it is a different
+struct, so that non-instrumented ROSS can still be run without requiring any additions to models).  
+Once you have functions implemented and their pointers set to be registered with ROSS, you can turn on the event tracing with either `--event-trace=1` for full event trace or `--event-trace=2` for tracing only events that cause rollbacks. 
+
+##### Function pointers:
+```C
+typedef void (*rbev_trace_f) (void *msg, tw_lp *lp, char *buffer, int *collect_flag);
+typedef void (*ev_trace_f) (void *msg, tw_lp *lp, char *buffer, int *collect_flag);
+```
+`msg` is the message being passed, `lp` is the LP pointer.  `buffer` is the pointer to where the data needs to be copied for ROSS to manage.
+`collect_flag` is a pointer to a ROSS flag.  By default `*collect_flag == 1`.  This means that the event will be collected.  
+Change the value to 0 for any events you do not want to show in the trace.  This means even the ROSS level data will not be collected for that event (e.g., src_LP, dest_LP, etc).  
+You can use this feature to turn off event tracing for certain events, reducing the amount of data you will be storing.
+
+For instance in the dragonfly CODES model, we can do:
+```C
+void dragonfly_event_trace(terminal_message *msg, tw_lp *lp, char *buffer, int *collect_flag)
+{
+    int type = (int) msg->type;
+    memcpy(buffer, &type, sizeof(type));
+}
+```
+This is just a simple example; we could of course get more complicated with this and save other data.
+
+##### Event type struct for function pointers
+```C
+typedef struct st_trace_type st_trace_type;
+struct st_trace_type {
+    rbev_trace_f rbev_trace; /* function pointer to collect data about events causing rollbacks */
+    size_t rbev_sz;          /* size of data collected from model about events causing rollbacks */
+    ev_trace_f ev_trace;     /* function pointer to collect data about all events for given LP */
+    size_t ev_sz;            /* size of data collected from model for each event */
+};
+```
+This is the struct where we provide the function pointers to ROSS.  
+`rbev_trace_f` is the pointer for the event collection for only events causing rollbacks, while `ev_trace_f` is for
+the full event collection.  
+`rbev_sz` and `ev_sz` are the sizes of the data that need to be pushed to the buffer for the rollback causing events, or for all events, respectively.
+
+Going back to the CODES dragonfly example, we could implement the event tracing functions with:
+
+```C
+st_trace_type trace_types[] = {
+    {(rbev_trace_f) NULL,
+    0,
+    (ev_trace_f) dragonfly_event_trace,
+    sizeof(int)},
+    {0}
+};
+```
+This example assumes that we want to use the same `dragonfly_event_trace()` for both the terminal and router LPs in the dragonfly model and we'll use trace_types[0], when registering the trace types for both LP types.  
+
+To register the function pointers with ROSS, call `st_evtrace_settype(tw_lpid i, st_trace_type *trace_types)` right after you call the `tw_lp_settype()` function when initializing your LPs.  You can also choose to turn event tracing on for only certain LPs.  To do this, you only need to call `st_evtrace_settype()` with the appropriate arguments for the LPs you want event tracing turned on.
+
+If your model is a part of CODES, the CODES mapping will handle this for you.  Right now the model net base LPs, the dragonfly router and terminal LPs, and dragonfly synthetic workload LPs have this implemented, but it's in my [forked CODES repo](https://xgitlab.cels.anl.gov/caitlinross/codes) (event-collection branch) at the moment.  
+It should be merged into the main CODES repo soon after this is merged into ROSS master.
+See that repo for more details on making event tracing changes on CODES models.  
+
+
+
+### Output formatting
+All collected data is pushed to a buffer as it is collected, in order to reduce 
+the amount of I/O accesses.  Currently the buffer is per PE.  If multiple instrumentation types
+are used, each has its own buffer.
+The default buffer size is 8 MB but this can be changed using `--buffer-size=n`, where n is the size 
+of the buffer in bytes. 
+After GVT, the buffer's free space is checked.  By default, if there is less than 15% free space, 
+then it is dumped to file in a binary format.  This can be changed using `--buffer-free=n`, where n 
+is the percentage of free space it checks for before writing out.  
+
+The output is in binary and right now it outputs one file per simulation per instrumentation type 
+(e.g., if you run both GVT and real time instrumentation, you get a file with the GVT data and a 2nd file
+for the real time sampling). ROSS will create a directory called stats-output that these files will be
+placed in.
+
+There is a basic reader for all of the instrumentation modes being developed in the 
+[CODES-vis repo](https://xgitlab.cels.anl.gov/codes/codes-vis) (ross-reader branch).  
+In the future we may switch to an already established file format (perhaps something like XDMF), 
+or just further develop what is being used currently.  For the time being, ROSS will output a README file in 
+the stats-output directory with the given filename prefix.  The file contains some general information about 
+values of input parameters, but also has data that the reader in the CODES-vis repo can use to correctly read the
+instrumentation data.
+
+### Other notes
+There are a couple of other options that show up in the ROSS stats options.
+One is `--disable-output=1`.  This is for use when examining the perturbation of the data collection 
+on the simulation.  
+It means that data (for GVT and real time collections) will be pushed to the buffer, but the buffer 
+will never be dumped, so it will just keep overwriting data.  
+This is so we can measure the effects of the computation of data collection itself without the I/O, otherwise
+you'll want to leave this turned off.  At some point in the future, this will probably be converted into allowing
+data to be streamed to an in situ analysis system.  
+
+
diff --git a/core/CMakeLists.txt b/core/CMakeLists.txt
@@ -56,7 +56,14 @@ tw-sched.c
 tw-setup.c
 tw-signal.c
 tw-stats.c
-tw-util.c)
+tw-util.c
+
+instrumentation/st-stats-buffer.h
+instrumentation/st-stats-buffer.c
+instrumentation/st-data-collection.h
+instrumentation/st-data-collection.c
+instrumentation/st-event-collection.h
+instrumentation/st-event-collection.c)
 
 
 # ROSS VERSION INFORMATION

diff --git a/core/gvt/mpi_allreduce.c b/core/gvt/mpi_allreduce.c
@@ -109,7 +109,6 @@ tw_gvt_step2(tw_pe *me)
 
 	if(me->gvt_status != TW_GVT_COMPUTE)
 		return;
-
 	while(1)
 	  {
 	    tw_net_read(me);
@@ -173,7 +172,8 @@ tw_gvt_step2(tw_pe *me)
 				me->id, me->GVT, gvt);
 	}
 
-	if (gvt / g_tw_ts_end > percent_complete && (g_tw_mynode == g_tw_masternode)) {
+	if (gvt / g_tw_ts_end > percent_complete && (g_tw_mynode == g_tw_masternode))
+	{
 		gvt_print(gvt);
 	}
 
@@ -198,6 +198,23 @@ tw_gvt_step2(tw_pe *me)
 	    me->stats.s_fossil_collect += tw_clock_read() - start;
 	  }
 
+    if (g_st_stats_enabled && gvt <= g_tw_ts_end)
+    {
+        tw_clock start_cycle_time = tw_clock_read();
+        tw_statistics s;
+        bzero(&s, sizeof(s));
+        tw_get_stats(me, &s);
+		st_gvt_log(me, gvt, &s, all_reduce_cnt);
+        g_st_stat_comp_ctr += tw_clock_read() - start_cycle_time;
+    }
+
+    if (!g_st_disable_out && g_st_stats_enabled)
+        st_buffer_write(g_st_buffer_gvt, 0, GVT_COL);
+    if (!g_st_disable_out && g_st_real_time_samp)
+        st_buffer_write(g_st_buffer_rt, 0, RT_COL);
+    if (!g_st_disable_out && (g_st_ev_trace))
+        st_buffer_write(g_st_buffer_evrb, 0, EV_TRACE);
+
 	g_tw_gvt_done++;
 
 	// reset for the next gvt round -- for use in realtime GVT mode only!!

diff --git a/core/gvt/mpi_allreduce.h b/core/gvt/mpi_allreduce.h
@@ -22,18 +22,18 @@ gvt_print(tw_stime gvt)
 		return;
 	}
 
-	printf("GVT #%d: simulation %d%% complete, max event queue size %u (",
+    printf("GVT #%d: simulation %d%% complete, max event queue size %u (",
                g_tw_gvt_done,
                (int) ROSS_MIN(100, floor(100 * (gvt/g_tw_ts_end))),
                tw_pq_max_size(g_tw_pe[0]->pq));
 
-	if (gvt == DBL_MAX)
-		printf("GVT = %s", "MAX");
-	else
-		printf("GVT = %.4f", gvt);
+    if (gvt == DBL_MAX)
+        printf("GVT = %s", "MAX");
+    else
+        printf("GVT = %.4f", gvt);
 
-	printf(").\n");
-    
+    printf(").\n");
+		
 #ifdef AVL_TREE
     printf("AVL tree size: %d\n", g_tw_pe[0]->avl_tree_size);
 #endif