# Interactive quick start guide for the scarab stats library
This jupyter notebook shows how to use the scarab stats library to obtain and graph statistics from scarab runs

## Required data
To use this library, you need to have run an experiment using a json experiment file (found in the dcworkloads repository). You will need a path to the experiment file used, the path to the files generated by the simulations (likely allbench_home/simpoint_flow/simulations), and the path to the traces used (likely /soe/hlitz/lab/traces on bohr3).

## Loading an experiment
To load an experiment from a json file, you need to import the stats library and create a stat agregator object to load the data. Then use its `load_experiment_json` method to load the file

In [None]:
from scarab_stats import stat_aggregator

# Create a stat aggregator object
aggregator = stat_aggregator()

# Paths to required data to load an experiment
TRACE_PATH = "/soe/hlitz/lab/traces/"
EXPERIMENT_PATH = "/path/to/docker_home/exp.json"
SIMULATION_PATH = "/path/to/docker_home/simpoint_flow/simulations/"

# Load the experiment file at EXPERIMENT_PATH using the simulations created at SIMULATION_PATH by running scarab,
# using the traces located at TRACE_PATH
# experiment = aggregator.load_experiment_json(EXPERIMENT_PATH, SIMULATION_PATH, TRACE_PATH)

Alternatively, you can load an experiment from a CSV file directly. CSV representations can be saved using experiment.to_csv(path)

In [None]:
# Path for experiment to be saved to
SAVED_PATH = "saved_experiment.csv"

# Save loaded experiment
experiment.to_csv(SAVED_PATH)

# Load experiment from saved CSV
experiment = aggregator.load_experiment_csv(SAVED_PATH)

# Or equivalently:
# from scarab_stats import Experiment
# experiment = Experiment(SAVED_PATH)

## Visualizing data
First you may want to check what different statistics, configurations, or workloads are available for graphing. The stats library provides the following funcitons to do this

In [None]:
# Get the experiments stored in the experiment file (usually just one)
print("Experiments:", ", ".join(experiment.get_experiments()))

# Get the different configurations stored in the experiment file
print("Configurations:", ", ".join(experiment.get_configurations()))

# Get the workloads that were ran
print("Workloads:", ", ".join(experiment.get_workloads()))

# Get first 15 statistics stored in the file
print("Statistics:", ", ".join(experiment.get_stats()[:5]))

You can now use the experiment to create graphs to visualize the data. There are several graphing functions to plot statistics aggregated at different levels, and to plot stats by their relative proportions.

In [None]:
### This code cell creates a bare minimum graph to plot 

# Reload the stat library if it is modified
from importlib import reload
import scarab_stats
reload(scarab_stats)
from scarab_stats import stat_aggregator

# Create a stat aggregator object
aggregator = stat_aggregator()

# Path for experiment to be saved to
SAVED_PATH = "saved_experiment.csv"

# Save loaded experiment
experiment.to_csv(SAVED_PATH)

# Load experiment from saved CSV
experiment = aggregator.load_experiment_csv(SAVED_PATH)

# Get the name of the experiment as a string, and select workloads and config(s) to plot
experiment_name = experiment.get_experiments()[0]
workloads_to_plot = ["clang", "gcc", "mongodb", "mysql", "postgres", "verilator", "xgboost"]
configs_to_plot = ["baseline", "perfect_fdip_lookahead_10000", "perfect_fdip_lookahead_50000", "perfect_fdip_lookahead_100000"]

# A statistic where you want to plot raw numbers
stat_to_plot = ['Periodic IPC']

# Call the plot function
aggregator.plot_workloads(experiment, stat_to_plot, workloads_to_plot, configs_to_plot, y_label="IPC", x_label="Workloads", average=True)

# A statistic where you want to plot speedups
stat_to_plot = ['Periodic IPC']

# Call the plot function for speedups
aggregator.plot_workloads_speedup(experiment, stat_to_plot, workloads_to_plot, configs_to_plot, y_label="IPC Speedup (%)", x_label="Workloads", speedup_baseline="bp1", average=True)


In [None]:
### This creates the same graph as above, but with custom colors

# Create a custom list of colors to use
colors = [(1,0,0), (0,1,0), (0,0,1), (1,1,0)]

# Call the plot function
aggregator.plot_workloads(experiment, stat_to_plot, workloads_to_plot, configs_to_plot, y_label="IPC", x_label="Benchmarks", average=True, colors=colors)

In [None]:
### This code cell plots the relative proportions for MAP_STAGE_RECEIVED_OPS_[0-8]_count

# Reload the stat library if it is modified
from importlib import reload
import scarab_stats
reload(scarab_stats)
from scarab_stats import stat_aggregator

# Create a stat aggregator object
aggregator = stat_aggregator()

# Path for experiment to be saved to
SAVED_PATH = "saved_experiment.csv"

# Save loaded experiment
experiment.to_csv(SAVED_PATH)

# Load experiment from saved CSV
experiment = aggregator.load_experiment_csv(SAVED_PATH)

# Get the name of the experiment as a string, and select workloads and config(s) to plot
experiment_name = experiment.get_experiments()[0]
workloads_to_plot = ["clang", "gcc", "mongodb", "mysql", "postgres", "verilator", "xgboost"]
configs_to_plot = ["baseline", "perfect_fdip_lookahead_10000", "perfect_fdip_lookahead_50000", "perfect_fdip_lookahead_100000"]

# Get desired statistics
stats_to_plot = [f"MAP_STAGE_RECEIVED_OPS_{x}_count" for x in range(0,9)]
print("Plotting:", stats_to_plot)

# Print error if not in experiment
for stat in stats_to_plot:
    if not stat in experiment.get_stats():
        print(f"ERROR: Stat not in experiment: {stat}")
    
# Get the experiment's name as a string, along with all available workloads and configs to plot
experiment_name = experiment.get_experiments()[0]
workloads_to_plot = experiment.get_workloads()
configs_to_plot = experiment.get_configurations()

# Plot the stats from workloads/configs/experiment in a bar graph, as their proportion of sum of all stats_to_plot
# (like a pie chart, but in bar graph form)
aggregator.plot_stacked(experiment, stats_to_plot, workloads_to_plot, configs_to_plot, y_label="MAP_STAGE_RECEIVED_OPS count")
aggregator.plot_stacked_fraction(experiment, stats_to_plot, workloads_to_plot, configs_to_plot)

## Derived stats
You can create your own stats using parenthesis, addition, subtraction, multiplication, and division of columns and scalars. To use variables, you can use a format string to inject the values of variables as scalars. Equations are written with similar format to the following: `new_stat_name = stat_name_1 + stat_name_2` 

#

In [None]:
### This code cell creates a derived stat that is the sum of all the stats plotted in the previous plot

# Create equation that sums all of the stats
equation = f"MAP_STAGE_REVIEVED_OPS_ALL_COUNT = {' + '.join(stats_to_plot)}"
print("Equation:", equation)

# Print if it exists
stat_exists = "MAP_STAGE_REVIEVED_OPS_ALL_COUNT" in experiment.get_stats()
print(f"Stat does {'' if stat_exists else 'not '}exist")

# Add stat as new entry
experiment.derive_stat(equation)

# Print if it exists now
stat_exists = "MAP_STAGE_REVIEVED_OPS_ALL_COUNT" in experiment.get_stats()
print(f"Stat does {'' if stat_exists else 'not '}exist")

# Plot previous stats and new derived stats. To plot only derived stat, remove stats_to_plot
new_stats_to_plot = stats_to_plot + ["MAP_STAGE_REVIEVED_OPS_ALL_COUNT"]
aggregator.plot_workloads(experiment, new_stats_to_plot, workloads_to_plot, configs_to_plot, label_method=0)