# Using JupyterNB to explore collected data

This Jupyter Notebook works through a series of examples to show how you can explore, display and analyze data collected by `run_experiment.py` (and stored in `output_database.sqlite`).

It is **strongly** recommended that you view this notebook in VSCode.

## Generating some data to analyze

To *produce* data that you can analyze here, you should:

1. execute `run_experiment.py` from the command line (with suitable arguments).
2. copy the resulting `data` folder into this directory.

Note that you can learn about the arguments to `run_experiment.py` by running:

`./run_experiment.py -h`

To generate a small amount of data to analyze (should take one or two minutes), try running the following *in directory* `../example`:

`./run_testing.sh`

Note that this runs a set of experiments defined by `_user_experiments.py` in `testing` mode. In `testing` mode, only a subset of the experimental trials are run, and each trial run is for a greatly reduced time. Thus, the results are essentially meaningless, but they can be generated quickly. (This `testing` mode is primarily included to allow you to quickly run through many different experiments and check that they don't experience immediate crashes, or incomplete output.)

(You can `cat run_testing.sh` to see how it does this. It's quite simple...)

To run a more comprehensive test (which should take an hour or less):

`./run_production.sh`

## Alternatively, you can run the following code (3-5 min):

In [None]:
def define_experiment(exp_dict, args):
    set_dir_compile  (exp_dict, os.getcwd() + '/../../microbench')
    set_dir_tools    (exp_dict, os.getcwd() + '/../../tools')
    set_dir_run      (exp_dict, os.getcwd() + '/bin')

    add_run_param    (exp_dict, 'INS_DEL_FRAC'    , ['0.0 0.0', '0.5 0.5', '20.0 10.0'])
    add_run_param    (exp_dict, 'MAXKEY'          , [2000000, 20000000])
    add_run_param    (exp_dict, 'DS_TYPENAME'     , ['brown_ext_ist_lf', 'brown_ext_abtree_lf', 'bronson_pext_bst_occ'])
    add_run_param    (exp_dict, 'thread_pinning'  , ['-pin ' + shell_to_str('cd ' + get_dir_tools(exp_dict) + ' ; ./get_pinning_cluster.sh', exit_on_error=True)])
    add_run_param    (exp_dict, '__trials'        , [1])
    add_run_param    (exp_dict, 'TOTAL_THREADS'   , [1] + shell_to_listi('cd ' + get_dir_tools(exp_dict) + ' ; ./get_thread_counts_numa_nodes.sh', exit_on_error=True))

    set_cmd_compile  (exp_dict, 'make -j8 bin_dir={__dir_run}')
    set_cmd_run      (exp_dict, 'LD_PRELOAD=../../../lib/libjemalloc.so timeout 300 numactl --interleave=all time ./{DS_TYPENAME}.debra -nwork {TOTAL_THREADS} -insdel {INS_DEL_FRAC} -k {MAXKEY} -t 500 {thread_pinning} -rq 0 -rqsize 1 -nrq 0')

    add_data_field   (exp_dict, 'total_throughput'  , coltype='INTEGER', validator=is_positive)
    add_data_field   (exp_dict, 'PAPI_L3_TCM'       , coltype='REAL')
    add_data_field   (exp_dict, 'PAPI_L2_TCM'       , coltype='REAL')
    add_data_field   (exp_dict, 'PAPI_TOT_CYC'      , coltype='REAL')
    add_data_field   (exp_dict, 'PAPI_TOT_INS'      , coltype='REAL')
    add_data_field   (exp_dict, 'maxresident_mb'    , coltype='REAL', extractor=get_maxres)
    add_data_field   (exp_dict, 'tree_stats_height' , coltype='INTEGER')
    add_data_field   (exp_dict, 'validate_result'   , validator=is_equal('success'))
    add_data_field   (exp_dict, 'MILLIS_TO_RUN'     , validator=is_positive)
    add_data_field   (exp_dict, 'RECLAIM')
    add_data_field   (exp_dict, 'POOL')

def get_maxres(exp_dict, file_name, field_name):
    ## manually parse the maximum resident size from the output of `time` and add it to the data file
    maxres_kb_str = shell_to_str('grep "maxres" {} | cut -d" " -f6 | cut -d"m" -f1'.format(file_name))
    return float(maxres_kb_str) / 1000

import sys ; sys.path.append('../../tools/data_framework') ; from run_experiment import *
enable_tee_stdout()
run_in_jupyter(define_experiment, cmdline_args='--testing -crd')

## Loading the data into this notebook

Once you have run one of the commands above to generate data, we can load it into this notebook to work with it. We start by appending the path `../../tools/data_framework` to the system PATH variable, so that `python` can find `run_experiment.py`. This file contains many functions for analyzing data and producing plots.

We then run `init_for_jupyter('_user_experiment.py')` which actually **runs** `run_experiment.py` with specific arguments that cause it to simply **load** the existing sqlite database (without running any experiments, or modifying any stored data).

If you are running VSCode and want to see how `init_for_jupyter` works, you can hold CTRL (or CMD on OSX) and click the function name in the code below.

While we're at it, we can set the style for plots in this notebook to use a dark theme.

*Note that you must run the following initialization code cell before any cells below it will work...*

In [None]:
import sys ; sys.path.append('../../tools/data_framework') ; from run_experiment import *
init_for_jupyter('../example/_user_experiment.py')
plt.style.use('dark_background')
print("Initialized.")

## Listing available data columns

Next let's list the available data columns. We use a function `select_to_dataframe()` provided in `run_experiment.py` to query the sqlite database to fetch the names of columns, along with the IPython `display()` function to pretty print the output in this notebook.

In [None]:
## the following query will select ZERO rows because of the WHERE clause.
## however, it will still fetch all columns.
df = select_to_dataframe('select * from data where 1==0') 
display(df)

In [None]:
## we can also retrieve the columns as a list and print as plain text
df = select_to_dataframe('select * from data where 1==0') 
col_list = df.columns.values
print(col_list)

In [None]:
## run_experiment.py also provides a convenience function for this:
print(get_headers())

## Querying the DATA table

Let's try to query some columns from the DATA table. We will again use `select_to_dataframe`. We will also demonstrate the use of the WHERE clause to filter data. (If you need to brush up on your SQL, you might check out the sqlite documentation.)

In [None]:
df = select_to_dataframe('select DS_TYPENAME, TOTAL_THREADS, total_throughput from DATA where MAXKEY == 2000000 and INS_DEL_FRAC == "0.5 0.5"')
display(df)

In [None]:
select_to_dataframe('select * from data')

# Showing `bar charts` *inline* in a data frame

`Pandas` `DataFrame` offers a very cool `style.bar` function, which lets you turn your columns into a sort of simple `bar chart`. For example, we can create bar charts out of `total_throughput` and 'PAPI_TOT_CYC'.

This feature can make it *much* easier to visually parse numeric table data!

In [None]:
df = select_to_dataframe('select * from data where MAXKEY==2000000 and INS_DEL_FRAC=="0.5 0.5" order by DS_TYPENAME, TOTAL_THREADS')
df.style.bar(subset=['total_throughput', 'PAPI_TOT_CYC'], color='#5fba7d')

## Plotting the data

We can also easily create a *plot* from this data, by calling the `plot_to_axes` function provided by the data framework in `_jupyter_libs`.

In [None]:
plot_to_axes(series='DS_TYPENAME', x='TOTAL_THREADS', y='total_throughput', where='WHERE MAXKEY == 2000000 AND INS_DEL_FRAC == "0.5 0.5"')

If you prefer to provide a python dictionary containing field / value pairs instead of an explicit where clause, then use `plot_to_axes_dict`, and we will construct the WHERE clause for you.

In [None]:
plot_to_axes_dict(series='DS_TYPENAME', x='TOTAL_THREADS', y='total_throughput', filter_values=dict({'MAXKEY': 2000000, 'INS_DEL_FRAC': '0.5 0.5'}))

If you'd like to **also** display the **data** being plotted, you can specify argument `display_data=True`.

In [None]:
plot_to_axes(series='DS_TYPENAME', x='TOTAL_THREADS', y='total_throughput', where='WHERE MAXKEY == 2000000 AND INS_DEL_FRAC == "0.5 0.5"', display_data=True)

## A quick sanity check

The ability to quickly display data for a plot is especially useful when you want to take a quick glance at one slice of your data as a sanity check to ensure that, for example, you aren't taking an inappropriate average over rows which differ in an important column you've forgotten to query.

You might spot such a scenario by observing that the number of rows being aggregated looks wrong, or by noticing that very different looking values are being averaged.

For example, consider the following bar plot, where we're inappropriately averaging `total_throughput` values over rows that differ in the `DS_TYPENAME` column, but we've failed to notice this, because we simply haven't included `DS_TYPENAME` in our query.

It's not too hard to see that we're averaging several `total_throughput`s that look quite different from one another, as we group by `TOTAL_THREADS`.

In [None]:
plot_to_axes(x='TOTAL_THREADS', y='total_throughput', where='WHERE MAXKEY == 2000000 AND INS_DEL_FRAC == "0.5 0.5"', display_data=True)

To help us determine *which field we're missing* from our query, we can change our argument to `display_data='full'`.

In [None]:
plot_to_axes(x='TOTAL_THREADS', y='total_throughput', where='WHERE MAXKEY == 2000000 AND INS_DEL_FRAC == "0.5 0.5"', display_data='full')

In fact, to make it even easier to tell which field we've neglected, you can change the argument to `display_data='diff'`. This only displays columns that:
- are not prefixed with '__'
  (as those are added by `run_experiment.py` rather than the user)
- do not contain identical data in all rows

In [None]:
plot_to_axes(x='TOTAL_THREADS', y='total_throughput', where='WHERE MAXKEY == 2000000 AND INS_DEL_FRAC == "0.5 0.5"', display_data='diff')

## We can filter the returned data more aggressively
Just specify `display_data='diff2'`. In addition to the filtering in `'diff'` mode, this displays only the *subset* of columns in `'diff'` that were specified by the user via `add_run_param()` in `_user_experiment.py`, and also only displays fields that we haven't included in our plot. That is, we only display columns that:

- are not already in our plot
- were explicitly added as a varying experimental parameter using `add_run_param()`
- are not prefixed with '__'
- do not contain identical data in all rows

As we can see in the example below, `display_data='diff2'` immediately highlights the column we forgot.

In [None]:
plot_to_axes(x='TOTAL_THREADS', y='total_throughput', where='WHERE MAXKEY == 2000000 AND INS_DEL_FRAC == "0.5 0.5"', display_data='diff2')

If we add `DS_TYPENAME` to the plot as our `series` field, and rerun with `display_data='diff2'`, we can see that the filtered `diff2` data frame is completely empty!

This is the expected result when using `diff2` and aggregating row values appropriately.

In [None]:
plot_to_axes(series='DS_TYPENAME', x='TOTAL_THREADS', y='total_throughput', where='WHERE MAXKEY == 2000000 AND INS_DEL_FRAC == "0.5 0.5"', display_data='diff2')

## Manually placing multiple plots in a grid

The function `plot_to_axes` takes a matplotlib/pandas/seaborn `axes` object as an optional argument called `ax`. This allows us to easily render plots in columns/rows of a grid.

In [None]:
## create a 2x2 grid of plots
fig, axes = plt.subplots(nrows=2, ncols=2, squeeze=False, figsize=(12, 6))
## fill in the bottom left grid cell
plot_to_axes(ax=axes[1,0], series='DS_TYPENAME', x='TOTAL_THREADS', y='total_throughput', where='WHERE MAXKEY == 2000000 AND INS_DEL_FRAC == "0.5 0.5"')
## fill in the bottom right grid cell
plot_to_axes(ax=axes[1,1], series='DS_TYPENAME', x='TOTAL_THREADS', y='total_throughput', where='WHERE MAXKEY == 2000000 AND INS_DEL_FRAC == "0.0 0.0"')

## Automatically producing plots to fill a grid

Tools like the `Seaborn` library's `factorplot` make it easy to plot **5-dimensional data** as *rows* and *columns* of plots, each containing *series*, *x* and *y* dimensions.

In [None]:
df = select_to_dataframe('SELECT MAXKEY, INS_DEL_FRAC, DS_TYPENAME, TOTAL_THREADS, total_throughput FROM DATA')
g = sns.factorplot(kind='bar', data=df, col='INS_DEL_FRAC', row='MAXKEY', hue='DS_TYPENAME', x='TOTAL_THREADS', y='total_throughput', margin_titles=True)

## Using FacetGrid for more control

You can produce the same sort of plot using Seaborn's `FacetGrid` function. `FacetGrid` creates a grid of plots backed by a dataframe.

You specify `col` and `row` fields, and it will filter the appropriately for each cell in the grid, and make it available to you.

However, unlike `factorplot`, `FacetGrid` does not plot the data for you. Instead, you have to use the `map` function to specify how the data in each grid cell should be plotted. This gives you more control over the end result. (For example, although we don't demonstrate this here, you could use different plot styles in different grid cells.)

In [None]:
df = select_to_dataframe('SELECT MAXKEY, INS_DEL_FRAC, DS_TYPENAME, TOTAL_THREADS, total_throughput FROM DATA')
g = sns.FacetGrid(data=df, col='INS_DEL_FRAC', row='MAXKEY', margin_titles=True)
g.map(sns.barplot, 'TOTAL_THREADS', 'total_throughput', 'DS_TYPENAME')
g.add_legend()

## Customizing a FacetGrid: changing the legend and resizing the figure

In [None]:
df = select_to_dataframe('SELECT MAXKEY, INS_DEL_FRAC, DS_TYPENAME, TOTAL_THREADS, total_throughput FROM DATA')
g = sns.FacetGrid(data=df, col='INS_DEL_FRAC', row='MAXKEY', margin_titles=True)
g.map(sns.barplot, 'TOTAL_THREADS', 'total_throughput', 'DS_TYPENAME')

## five-column legend at the bottom
g.add_legend(loc='lower center', ncol=5)
## shift the bottom of the plots upwards (by 14% of the figure)
## to avoid text axis title text overlapping the legend
g.fig.subplots_adjust(bottom=0.14, top=1, left=0, right=1)

Of course, the bottom legend leaves us with less vertical space to see our plots. We can set our figure size manually to correct the size/shape as we desire.

In [None]:
df = select_to_dataframe('SELECT MAXKEY, INS_DEL_FRAC, DS_TYPENAME, TOTAL_THREADS, total_throughput FROM DATA')
g = sns.FacetGrid(data=df, col='INS_DEL_FRAC', row='MAXKEY', margin_titles=True)
g.map(sns.barplot, 'TOTAL_THREADS', 'total_throughput', 'DS_TYPENAME')

g.add_legend(loc='lower center', ncol=5)
g.fig.subplots_adjust(bottom=0.14, top=1, left=0, right=1)

## set the figure to a desired size
g.fig.set_size_inches(8, 6)

## Heavily customizing the style of a FacetGrid

Here, we define a style for each series, by constructing explicit maps from series values to marker types, colors, line dashing styles, and line sizes. This way, we can ensure a **consistent** mapping from individual series values to style (line/marker/color/size). 

Once we have dictionaries mapping series values to each of these style parameters, we can feed them carefully to `sns.lineplot()` by running the `FacetGrid` `map()` function on our own intermediate function called `plot_facet()`.

In [None]:
df = select_to_dataframe('SELECT MAXKEY, INS_DEL_FRAC, DS_TYPENAME, TOTAL_THREADS, total_throughput FROM DATA')

## define lists of values that will consume for each series,
## wrapping around to the beginning of each list if we run out of elements.
markers = [ '^', 'o', 's', '+', 'x', 'v', '*', 'X', '|', '.', 'd' ]
palette = [ 'red', 'blue', 'yellow', 'green' ]
dashes = [ '' ]
sizes = [ 1 ]

## construct mappings from each series value to round-robin choices from the above
plot_style_kwargs = dict(markers=dict(), palette=dict(), sizes=dict(), dashes=dict())
distinct_series = select_distinct_field('DS_TYPENAME')
for i, series in zip(range(len(distinct_series)), distinct_series):
    plot_style_kwargs['markers'][series] = markers[i % len(markers)]
    plot_style_kwargs['palette'][series] = palette[i % len(palette)]
    plot_style_kwargs['sizes'][series] = sizes[i % len(sizes)]
    plot_style_kwargs['dashes'][series] = dashes[i % len(dashes)]

## feed those mappings into seaborn lineplot via our own plot_facet function
def plot_facet(x, y, series, **kwargs):
    sns.lineplot(x=x, y=y, hue=series, style=series, **plot_style_kwargs, **kwargs)

## pair our own plot_facet function with map()
g = sns.FacetGrid(data=df, col='INS_DEL_FRAC', row='MAXKEY', margin_titles=True)
g.map(plot_facet, 'TOTAL_THREADS', 'total_throughput', 'DS_TYPENAME')

## customize legend and overall figure size
g.add_legend(loc='lower center', ncol=5)
g.fig.subplots_adjust(bottom=0.14, top=1, left=0, right=1)
g.fig.set_size_inches(8, 6)

## Making this simpler...

Of course a lot of the complexity in the above code cell can be wrapped up in helper functions. For example, we encapsulate the setup for plot markers, dashes, palette and sizes in the `_jupyter_libs` function `get_seaborn_series_styles()`.

Note that argument `series` can either be a list of series values, or the name of the series column in the sqlite database's `DATA` table. If it is a column name, then `get_seaborn_series_styles()` will query all distinct values from that column to derive the series values. Otherwise, if it is a list, then styles will be assigned to its elements **in the order they appear in the list**.

In [None]:
df = select_to_dataframe('SELECT MAXKEY, INS_DEL_FRAC, DS_TYPENAME, TOTAL_THREADS, total_throughput FROM DATA')

style_kwargs = get_seaborn_series_styles('DS_TYPENAME', markers=['^', 'o', 's', '+', 'x', 'v', '*', 'X', '|', '.', 'd'], palette=['red', 'blue', 'yellow', 'green'], dashes=[''], sizes=[1])
def plot_facet(x, y, series, **kwargs): ## feed our series styles into sns.lineplot
    sns.lineplot(x=x, y=y, hue=series, style=series, **style_kwargs, **kwargs)

g = sns.FacetGrid(data=df, col='INS_DEL_FRAC', row='MAXKEY', margin_titles=True)
g.map(plot_facet, 'TOTAL_THREADS', 'total_throughput', 'DS_TYPENAME')

g.add_legend(loc='lower center', ncol=5)
g.fig.subplots_adjust(bottom=0.14, top=1, left=0, right=1)
g.fig.set_size_inches(8, 6)

Also note: if all you want is a **consistent** mapping from series values to a style (rather than a **particular** choice of markers, etc.), then you should know that `get_seaborn_series_styles()` has fairly sane *default* values for markers, palette, dashes and sizes.



In [None]:
df = select_to_dataframe('SELECT MAXKEY, INS_DEL_FRAC, DS_TYPENAME, TOTAL_THREADS, total_throughput FROM DATA')

style_kwargs = get_seaborn_series_styles('DS_TYPENAME')
def plot_facet(x, y, series, **kwargs): ## feed our series styles into sns.lineplot
    sns.lineplot(x=x, y=y, hue=series, style=series, **style_kwargs, **kwargs)

g = sns.FacetGrid(data=df, col='INS_DEL_FRAC', row='MAXKEY', margin_titles=True)
g.map(plot_facet, 'TOTAL_THREADS', 'total_throughput', 'DS_TYPENAME')
g.add_legend(loc='lower center', ncol=5)
g.fig.subplots_adjust(bottom=0.14, top=1, left=0, right=1)
g.fig.set_size_inches(8, 6)

We can further simplify by encapsulating the `add_legend`, `subplots_adjust` and `set_size_inches` functions in a `add_legend_and_reshape()` function. This new function takes optional arguments `loc`, `ncol`, `bottom` and `size` (with sane **default values** taken from the code cell above).

In [None]:
df = select_to_dataframe('SELECT MAXKEY, INS_DEL_FRAC, DS_TYPENAME, TOTAL_THREADS, total_throughput FROM DATA')
g = sns.FacetGrid(data=df, col='INS_DEL_FRAC', row='MAXKEY', margin_titles=True)
style_kwargs = get_seaborn_series_styles('DS_TYPENAME')
def plot_facet(x, y, series, **kwargs):
    sns.lineplot(x=x, y=y, hue=series, style=series, **style_kwargs, **kwargs)
g.map(plot_facet, 'TOTAL_THREADS', 'total_throughput', 'DS_TYPENAME')
add_legend_and_reshape(g)

## Doing it all with a single function: `plot_rc`

In fact, we think you might want to generate this kind of figure (as a fast way of exploring five dimensions of data) often enough that this entire code cell should be a function:

`plot_rc(row, col, series, x, y)`

with OPTIONAL arguments:
- `where` (default value `''`)
- `series_styles` (default value `get_seaborn_series_styles(series)`)
- `plot_func` (default value `sns.lineplot`)
- `facetgrid_kwargs` (default value dict())
- `data` (default value None -- see several code blocks below for usage)

The `where` argument (if used) should be a complete sqlite WHERE clause (including the word 'WHERE'). The purpose is to allow you to filter your dataset to ensure that you're aggregating (averaging) values in a sensible way, over a set of rows that it actually makes sense to aggregate.

It should be possible to use most `Seaborn` plot functions as the `plot_func` argument.

The `facetgrid_kwargs` argument allows you to provide any keyword arguments to the `sns.FacetGrid()` call. (I.e., we will call `sns.FacetGrid(..., **facetgrid_kwargs))`.

In [None]:
plot_rc(row='MAXKEY', col='INS_DEL_FRAC', series='DS_TYPENAME', x='TOTAL_THREADS', y='total_throughput')
plot_rc(row='MAXKEY', col='INS_DEL_FRAC', series='DS_TYPENAME', x='TOTAL_THREADS', y='PAPI_TOT_CYC')
plot_rc(row='MAXKEY', col='INS_DEL_FRAC', series='DS_TYPENAME', x='TOTAL_THREADS', y='PAPI_TOT_INS')
plot_rc(row='MAXKEY', col='INS_DEL_FRAC', series='DS_TYPENAME', x='TOTAL_THREADS', y='PAPI_L3_TCM')

We can use the `where` argument to filter before plotting...

In [None]:
plot_rc(row='MAXKEY', col='INS_DEL_FRAC', series='DS_TYPENAME', x='TOTAL_THREADS', y='total_throughput', where='WHERE DS_TYPENAME != "brown_ext_ist_lf" AND MAXKEY <= 10000000')

We can also produce other types of plots using the same function...

In [None]:
plot_rc(row='MAXKEY', col='INS_DEL_FRAC', series='DS_TYPENAME', x='TOTAL_THREADS', y='maxresident_mb', where='WHERE DS_TYPENAME != "brown_ext_ist_lf"', plot_func=sns.barplot)

## Passing a DataFrame to `plot_rc`

You can provide a `pandas` `DataFrame` via an optional `data` argument. If you do this, `plot_rc` will use this data instead of querying the `sqlite` database to obtain all of its data.

(It may still query the database to get all unique values in the series column, but it will only do this to ensure that series are styled consistently, regardless of *which* series are present in the queried data.)

**Why would you want to do this?** Well, as an example, we might want to plot a computed column, such as the average of the `maxresident_mb` column MINUS the minimum value of the same column. 

To do this, you can compute a suitable column using `sqlite` (likely via a subquery), give it a name, and reference it in the arguments to `plot_rc`.

In [None]:
where = 'DS_TYPENAME != "brown_ext_ist_lf"'
df = select_to_dataframe('''
    select
        MAXKEY
      , INS_DEL_FRAC
      , DS_TYPENAME
      , TOTAL_THREADS
      , (maxresident_mb - (select min(maxresident_mb) from DATA where {}))
        as maxresident_mb_over_minimum
    from DATA
    where {}
'''.format(where, where))

plot_rc(data=df, row='MAXKEY', col='INS_DEL_FRAC', series='DS_TYPENAME', x='TOTAL_THREADS', y='maxresident_mb_over_minimum', plot_func=sns.barplot)


Note: if, to compute your custom column(s), you need to perform subqueries on data that is **already filtered** to the particular **grid facet** being rendered, then you will have to query for any relevant data *inside* a **custom function** called by `map()`.

# Concatenating two columns to form a single series field

(This is done via a more complex SQL query string internally.)

In [None]:
import sys ; sys.path.append('../../tools/data_framework') ; from run_experiment import *
init_for_jupyter('_user_experiment.py')

get_dataframe_and_call(
      plot_df_line
    , series=['DS_TYPENAME', 'RECLAIM']
    , x='TOTAL_THREADS'
    , y='total_throughput'
    , where='where MAXKEY == 2000000 and INS_DEL_FRAC == "0.5 0.5"'
    , all_possible_series=select_distinct_field(['DS_TYPENAME', 'RECLAIM'])
    , legend_call_kwargs=dict(ncol=2)
)

# Line plot with error regions

note: error regions are ACTUALLY showing a FAILURE to filter appropriately (we are aggregating over SIX rows for each data point, covering TWO DIFFERENT MAXKEY VALUES). to fix this, try settings "row='MAXKEY'" instead of RECLAIM.

error regions are showing MIN/MAX value ranges for each data point, because we have set "ci=100". by default, that parameter is "ci=95". it can also be set to "sd" for standard deviation, or disabled.

In [None]:
import sys; sys.path.append('../../tools/data_framework'); from run_experiment import *
init_for_jupyter('_user_experiment.py') ; plt.style.use('dark_background') ## initialize

df = select_to_dataframe('select * from data')
plot_rc(data=df, row='RECLAIM', col='INS_DEL_FRAC', series='DS_TYPENAME', x='TOTAL_THREADS', y='total_throughput', plot_func=sns.lineplot, facetgrid_kwargs=dict(sharey=False), plot_kwargs=dict(ci=100))

#df = select_to_dataframe('select * from data where DS_TYPENAME="brown_ext_ist_lf" and INS_DEL_FRAC="0.5 0.5" and TOTAL_THREADS=190 order by __step asc')
#display(df)