# BBH End-to-End Inference Analysis Plots

The key questions I'm trying to address with the plots below are:
- How *quickly* can I process a given volume of data at a given scale of compute?
- How much does processing that data at that scale *cost* me?
- How can I *optimize* my deployment to minimize the cost incurred at a given scale?

In describing how the data below was collected, it will be valuable to establish some more precise language.
- A **server** will refer to one instance of the Triton Inference Server deployed on a single GPU-equipped cluster **node**.
- A **client** will refer to one instance of a Triton Inference Server client deployed on a single CPU-based GCP **VM** instance. Each client has a unique server to which it sends requests, and each server is associated with the same number of client VMs.
- The **scale** of a deployment will refer to the total number of clients and servers, and the resources allocated to each, used in a deployment.
- A **frame** will refer to a [gravitational wave frame file](https://www.gw-openscience.org/read_in_c/), a format for encoding multiple **channels** of concurrent timeseries of a fixed length and their associated metadata. The data below was collected by processing 24 frames of length 4096 seconds.
- Samples in each frame are mapped onto a fixed time grid at read time according to the desired **sampling rate** (fixed at train time).
- In order to parallelize processing across multiple frames, each client is assigned a contiguous (in time) subset of the total. Each client's subset of data will be referred to as a **stream**, which I want to distinguish because in theory each client could process multiple streams.
- DeepClean and BBH both expect as input fixed-length **kernels** of data, in this case one second long. Kernels are sampled from streams at a fixed interval termed the **kernel stride**, which is selected at inference time and parameterizes the inference deployment.
- In order to minimize network I/O, servers maintain the most recently inferred-upon kernel for each stream as a **state**, which the client updates by sending its stream to the server in kernel stride-length packets. Each of these **requests** triggers a new **inference** for that stream which returns a single BBH event probability estimate as its **response**.
- Each server host multiple models which it uses to produce its response.
    - `snapshotter` is the model responsible for updating the state and producing a new kernel. It does this for all the witness channels as well as for the strain channels at once.
    - `deepclean_h` and `deepclean_l` take a kernel of multiple witness channel measurements and produce an estimate of the noise at either the Hanford or Livingston detectors, respectively.
    - `postproc` subtracts the noise estimates from the strain channels.
    - `bbh` uses the cleaned strain channels to produce a single scalar estimate of the likelihood that the signature of a binary blackhole merger occurred in the kernel.
    - The end-to-end execution of these models is handled by an **ensemble model** called `gwe2e`. The execution of this model is not associated with any particular GPU and so is excluded from the data collected below.
- Each model is hosted on every GPU, and a single GPU can even perform concurrent execution of the same model up to a user-defined level of concurrency called the number of model **instances** per GPU.
- The number of `snapshotter` instances is equal to the number of streams (since each stream needs its own state to maintain and update).
- A model's measured **throughput** is the number of inferences it performed in a given interval on a single GPU, divided by the length of that interval. Its **aggregate throughput** is the sum of that model's throughput across _all_ GPUs. The unit for both of these quantities is **inferences per second**
- A model's **queue time** as measured on a given interval is the _average_ amount of time a request had to wait before inference was executed on it during that interval.

The data below was collected by requesting inference metrics from each server in a round-robin fashion during the inference run. The collected data is available in CSV format with the following columns
- `ip` - the IP address of the server. Used to index different servers.
- `step` - indexes the metrics request that a chunk of measurements come from. Also used to roughly align in time metric requests made in serial to different servers.
- `gpu_id` - the ID of the GPU executing the model's inference.
- `model` - the model that the metrics correspond to.
- `process` - subdivides each model's inference into multiple steps. Of particular importance are the `request` process, which measures the end-to-end execution of an inference, and the `queue` process, which measures how long requests spent waiting before execution.
- `time (us)` - the _total_ time spent executing the corresponding process over the previous interval in microseconds
- `interval` - the time since the last metrics request was made in seconds
- `count` - the number of times the corresponding process was executed during the previous interval
- `utilization` the current GPU utilization fraction at the time of the metrics request
- `average_time` - the average time spent executing the process during the previous interval in microseconds (`time (us)` / `count`)
- `throughput` - the rate at which inferences were executed during the interval, measured in inferences per second (`count` / `interval`)
- `time_since_start` - the time since the start of the run at which the metrics were sampled.

Each CSV is associated with a `RunConfig` object that encodes the unique parameters of the associated inference run, as well as with an 8 character hexadecimal string `RunConfig.id` that identifies the config and is used as the directory where it and its metrics are stored. The configs for which data is available are stored in `plot_utils.configs`, and the metrics collected for a given config can be loaded via `plot_utils.load_stats_for_config`.

In [1]:
import plot_utils

print(plot_utils.configs)

[RunConfig(num_nodes=2, gpus_per_node=4, clients_per_node=4, instance_config=InstanceConfig(deepclean_h=6, deepclean_l=6, postproc=1, bbh=1), vcpus_per_gpu=16, kernel_stride=0.05, generation_rate=750), RunConfig(num_nodes=2, gpus_per_node=4, clients_per_node=5, instance_config=InstanceConfig(deepclean_h=6, deepclean_l=6, postproc=1, bbh=1), vcpus_per_gpu=16, kernel_stride=0.05, generation_rate=750), RunConfig(num_nodes=4, gpus_per_node=4, clients_per_node=4, instance_config=InstanceConfig(deepclean_h=6, deepclean_l=6, postproc=1, bbh=1), vcpus_per_gpu=16, kernel_stride=0.05, generation_rate=750.0), RunConfig(num_nodes=2, gpus_per_node=4, clients_per_node=3, instance_config=InstanceConfig(deepclean_h=4, deepclean_l=4, postproc=1, bbh=1), vcpus_per_gpu=16, kernel_stride=0.1, generation_rate=750.0), RunConfig(num_nodes=2, gpus_per_node=4, clients_per_node=3, instance_config=InstanceConfig(deepclean_h=6, deepclean_l=6, postproc=1, bbh=1), vcpus_per_gpu=16, kernel_stride=0.1, generation_rat

In [2]:
print(plot_utils.configs[0])

RunConfig 26b4090 {
	num_nodes: 2,
	gpus_per_node: 4,
	clients_per_node: 4,
	instance_config: {
		deepclean_h: 6,
		deepclean_l: 6,
		postproc: 1,
		bbh: 1
	},
	vcpus_per_gpu: 16,
	kernel_stride: 0.05,
	generation_rate: 750.0
}


In [3]:
plot_utils.load_stats_for_config(plot_utils.configs[0])

Unnamed: 0,ip,step,gpu_id,model,process,time (us),interval,count,utilization,average_time,throughput,time_since_start
0,34.121.47.234,4,06a79eaf-6ae5-2265-f59d-ba028a6e77a4,bbh,compute_infer,2.450350e+05,0.400098,38,0.26,6.448289e+03,94.976710,0.400098
1,34.121.47.234,5,06a79eaf-6ae5-2265-f59d-ba028a6e77a4,bbh,compute_infer,3.692800e+05,0.407985,329,0.26,1.122432e+03,806.402723,0.808083
2,34.121.47.234,6,06a79eaf-6ae5-2265-f59d-ba028a6e77a4,bbh,compute_infer,2.471900e+04,0.422225,67,0.26,3.689403e+02,158.683167,1.230308
3,34.121.47.234,7,06a79eaf-6ae5-2265-f59d-ba028a6e77a4,bbh,compute_infer,1.965000e+04,0.382613,63,0.26,3.119048e+02,164.657058,1.612921
4,34.121.47.234,8,06a79eaf-6ae5-2265-f59d-ba028a6e77a4,bbh,compute_infer,2.193000e+04,0.467288,81,0.12,2.707407e+02,173.340631,2.080209
...,...,...,...,...,...,...,...,...,...,...,...,...
241875,34.123.210.2,1179,9f083de5-7601-2c3c-aa01-cf9cbc9c8091,snapshotter,request,9.675158e+09,0.380345,218,0.39,4.438146e+07,573.163678,465.313346
241876,34.123.210.2,1180,9f083de5-7601-2c3c-aa01-cf9cbc9c8091,snapshotter,request,9.587877e+09,0.385737,220,0.39,4.358126e+07,570.336517,465.699083
241877,34.123.210.2,1181,9f083de5-7601-2c3c-aa01-cf9cbc9c8091,snapshotter,request,9.061254e+09,0.383075,212,0.39,4.274177e+07,553.416434,466.082158
241878,34.123.210.2,1182,9f083de5-7601-2c3c-aa01-cf9cbc9c8091,snapshotter,request,9.347354e+09,0.394586,223,0.39,4.191639e+07,565.149172,466.476744


You can retrieve all the configs matching some desired set of criteria by using `plot_utils.get_configs`, e.g.

In [4]:
for config in plot_utils.get_configs(num_nodes=4):
    print(config)

RunConfig 3f14092 {
	num_nodes: 4,
	gpus_per_node: 4,
	clients_per_node: 4,
	instance_config: {
		deepclean_h: 6,
		deepclean_l: 6,
		postproc: 1,
		bbh: 1
	},
	vcpus_per_gpu: 16,
	kernel_stride: 0.05,
	generation_rate: 750.0
}
RunConfig c58d405d {
	num_nodes: 4,
	gpus_per_node: 4,
	clients_per_node: 3,
	instance_config: {
		deepclean_h: 6,
		deepclean_l: 6,
		postproc: 1,
		bbh: 1
	},
	vcpus_per_gpu: 16,
	kernel_stride: 0.1,
	generation_rate: 750.0
}
RunConfig c5ec405e {
	num_nodes: 4,
	gpus_per_node: 4,
	clients_per_node: 4,
	instance_config: {
		deepclean_h: 5,
		deepclean_l: 5,
		postproc: 2,
		bbh: 2
	},
	vcpus_per_gpu: 16,
	kernel_stride: 0.1,
	generation_rate: 750.0
}
RunConfig c626405e {
	num_nodes: 4,
	gpus_per_node: 4,
	clients_per_node: 4,
	instance_config: {
		deepclean_h: 6,
		deepclean_l: 6,
		postproc: 1,
		bbh: 1
	},
	vcpus_per_gpu: 16,
	kernel_stride: 0.1,
	generation_rate: 750.0
}
RunConfig c6bf405f {
	num_nodes: 4,
	gpus_per_node: 4,
	clients_per_node: 5,
	instance_c

In [5]:
for config in plot_utils.get_configs(num_nodes=4, clients_per_node=5):
    print(config)

RunConfig c6bf405f {
	num_nodes: 4,
	gpus_per_node: 4,
	clients_per_node: 5,
	instance_config: {
		deepclean_h: 6,
		deepclean_l: 6,
		postproc: 1,
		bbh: 1
	},
	vcpus_per_gpu: 16,
	kernel_stride: 0.1,
	generation_rate: 750.0
}


Let's start with the first question: how long does it take to process a given amount of data given a certain level of scale? To borrow Erik's terminology, we can look at this in terms of a multipole expansion around the parameter space.

At the simplest level, we can search over all the different runs done at a given level of scale, find the configuration that ran in the shortest time, then compare this across levels of scale. We'll also compare across kernel strides, since that will obviously dictate the total number of inferences that we need to do. (Though at time of writing, I've only generated data for kernel strides of 100 ms, so this plot won't be quite as interesting.)

I'll note up front that much of this code could be made cleaner and more modular, and I hope to do that sometime this week, but my focus was just on getting these plots together, so I apologize if the code is difficult to make sense of.

In [6]:
import numpy as np
from collections import defaultdict
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.io import show

times_to_run = defaultdict(lambda : np.inf)
color_map = {}
color_iter = iter(plot_utils.palette)
for config in plot_utils.configs:
    df = plot_utils.load_stats_for_config(config)
    df = df[(df.model == "bbh") & (df.process == "request")]
    time_to_run = df["time_since_start"].max()

    index = (config.num_nodes, config.total_clients, config.kernel_stride)
    times_to_run[index] = min(times_to_run[index], time_to_run)
    if config.kernel_stride not in color_map:
        color_map[config.kernel_stride] = next(color_iter)

x = sorted(times_to_run.keys())
counts = [times_to_run[key] for key in x]
colors = [color_map[key[2]] for key in x]

def _make_range(nodes, clients, stride):
    return (f"{nodes} nodes", f"{clients} streams", f"{stride} s")
x = [_make_range(*key) for key in x]

p = plot_utils.make_figure(
    title="Time to Run vs. Scale",
    x_axis_label="Configuration",
    y_axis_label="Time to run (s)",
    x_range=FactorRange(*x)
)

source = ColumnDataSource({"x": x, "counts": counts, "colors": colors})
p.vbar(
    x="x",
    top="counts",
    width=0.9,
    fill_color="colors",
    line_color="colors",
    fill_alpha=0.8,
    source=source
)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None

show(p)

Of course, this might not be the world's most helpful metric if you have a different volume of data that you need to process. So it would perhaps be more insightful to scale by the volume of data as measured in seconds, making the unit along the y-axis *seconds per second of data*.

In [7]:
from bokeh.models import PrintfTickFormatter

seconds_per_seconds = defaultdict(lambda : np.inf)
total_time_of_data = 4096 * 24
for config in plot_utils.configs:
    df = plot_utils.load_stats_for_config(config)
    df = df[(df.model == "bbh") & (df.process == "request")]
    time_to_run = df["time_since_start"].max()

    index = (config.num_nodes, config.total_clients, config.kernel_stride)
    seconds_per_second = time_to_run / total_time_of_data
    seconds_per_seconds[index] = min(seconds_per_seconds[index], seconds_per_second)

x = sorted(seconds_per_seconds.keys())
counts = [seconds_per_seconds[key] for key in x]
colors = [color_map[key[2]] for key in x]

def _make_range(nodes, clients, stride):
    return (f"{nodes} nodes", f"{clients} streams", f"{stride} s")
x = [_make_range(*key) for key in x]

p = plot_utils.make_figure(
    title="Time to Run Per Second of Data vs. Scale",
    x_axis_label="Configuration",
    y_axis_label="Time to run per second of data (s / s')",
    x_range=FactorRange(*x)
)
p.yaxis[0].formatter = PrintfTickFormatter(format="%4.0e")

source = ColumnDataSource({"x": x, "counts": counts, "colors": colors})
p.vbar(
    x="x",
    top="counts",
    width=0.9,
    fill_color="colors",
    line_color="black",
    fill_alpha=0.8,
    source=source
)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None

show(p)

If these point estimates are just the monopole expansion, we can get to the next order by sampling more densely from the distributions which generated these points and plotting the resulting empirical distributions as violin plots. Unfortunately, that would be time and cost intensive to get anywhere near the level of density which would make for a useful plot.

However, we can imagine that these total time (and let's call it _time density_) measurements are really the result of summing a bunch of draws from a _throughput_ distribution, a distribution that we _do_ already have lots of samples from.

A quick note that Bokeh won't support grouping factors here, so it won't be quite as readable, but still manages to the point across I think.

In [8]:
seconds_per_seconds = defaultdict(lambda : np.inf)
best_configs = {}
for config in plot_utils.configs:
    df = plot_utils.load_stats_for_config(config)
    df = df[(df.model == "bbh") & (df.process == "request")]
    time_to_run = df["time_since_start"].max()

    index = (config.num_nodes, config.total_clients, config.kernel_stride)
    seconds_per_second = time_to_run / total_time_of_data
    if seconds_per_second < seconds_per_seconds[index]:
        seconds_per_seconds[index] = seconds_per_second
        best_configs[index] = config

x = sorted(seconds_per_seconds.keys())
configs = [best_configs[key] for key in x]
colors = [color_map[key[2]] for key in x]

def _make_range(nodes, streams, stride):
    return f"{nodes} nodes\n{streams} streams\n{stride} stride"
x = [_make_range(*key) for key in x]

p = plot_utils.make_figure(
    title="Time to Run Per Second of Data vs. Scale",
    x_axis_label="Configuration",
    y_axis_label="Time to run per second of data (s / s')",
    x_range=FactorRange(*x)
)
p.yaxis[0].formatter = PrintfTickFormatter(format="%4.0e")

source = ColumnDataSource()
for i, (key, color, config) in enumerate(zip(x, colors, configs)):
    x_col, y_col = f"x_{i}", f"y_{i}"
    xs, ys = plot_utils.make_violin_patch(config, y_axis="time", percentile=5)
    source.add(ys, y_col)

    xs = list(zip([key]*len(xs), xs))
    source.add(xs, x_col)

    p.patch(
        x_col,
        y_col,
        color=color,
        fill_alpha=0.6,
        line_color="black",
        source=source
    )

show(p)

Getting from the first question of time to the second one of cost is reasonably straightforward: just input the cost per resource per unit time then multiply by the time density. We'll normalize by the cost of a single CPU-hour so that the units are more broadly useful.

In [9]:
cost_per_seconds = defaultdict(lambda : np.inf)
for config in plot_utils.configs:
    df = plot_utils.load_stats_for_config(config)
    df = df[(df.model == "bbh") & (df.process == "request")]
    time_to_run = df["time_since_start"].max()

    index = (config.num_nodes, config.total_clients, config.kernel_stride)
    seconds_per_second = time_to_run / total_time_of_data
    cost_per_second = plot_utils.map_to_cost(seconds_per_second, config)

    # this time I'm just going to collect the winning
    # configs up front that way I don't have to loop
    # through again later
    if cost_per_second < cost_per_seconds[index]:
        cost_per_seconds[index] = cost_per_second
        best_configs[index] = config

x = sorted(cost_per_seconds.keys())
costs = [cost_per_seconds[key] for key in x]
colors = [color_map[key[2]] for key in x]

def _make_range(nodes, clients, stride):
    return (f"{nodes} nodes", f"{clients} streams", f"{stride} s")

cost_unit = "CPU-hour cost / s'"
factors = [_make_range(*key) for key in x]
p = plot_utils.make_figure(
    title="Cost Per Second of Data vs. Scale",
    x_axis_label="Configuration",
    y_axis_label=f"Cost per second of data ({cost_unit})",
    x_range=FactorRange(*factors)
)
p.yaxis[0].formatter = PrintfTickFormatter(format="%4.0e")

source = ColumnDataSource({"x": factors, "costs": costs, "colors": colors})
p.vbar(
    x="x",
    top="costs",
    width=0.9,
    fill_color="colors",
    line_color="black",
    fill_alpha=0.8,
    source=source
)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None

show(p)

In [10]:
configs = [best_configs[key] for key in x]

def _make_range(nodes, streams, stride):
    return f"{nodes} nodes\n{streams} streams\n{stride} stride"

p = plot_utils.make_figure(
    title="Time to Run Per Second of Data vs. Scale",
    x_axis_label="Configuration",
    y_axis_label=f"Cost per second of data ({cost_unit})",
    x_range=FactorRange(*[_make_range(*key) for key in x])
)
p.yaxis[0].formatter = PrintfTickFormatter(format="%4.0e")

source = ColumnDataSource()
for i, (key, color, config) in enumerate(zip(x, colors, configs)):
    x_col, y_col = f"x_{i}", f"y_{i}"
    xs, ys = plot_utils.make_violin_patch(config, y_axis="cost", percentile=5)
    source.add(ys, y_col)

    key = _make_range(*key)
    xs = list(zip([key]*len(xs), xs))
    source.add(xs, x_col)

    p.patch(
        x_col,
        y_col,
        color=color,
        alpha=0.6,
        line_color="black",
        source=source
    )

show(p)

As we might have expected, this plot looks an awful lot like the violin plots for the time density, since each individual violin is just scaled by a constant factor. This might invite the idea of just putting a second y-axis and keeping everything the same, but the issue is that each violin is scaled by a different factor (and as our configurations got more exotic or varied, you can imagine how this might change even more). However, this does leave the possibility of plotting asymmetric violins, with one side plotting the cost distribution and the other side plotting the time distribution. To do this, I'll stop using colors to denote kernel strides and instead use them to color code the sides of the violin to the axes.

In [11]:
from bokeh.models import LinearAxis, Range1d
from bokeh.models import Legend, LegendItem

def _make_range(nodes, streams):
    if nodes is not None:
        return f"{nodes} nodes\n{streams} streams"
    return " " * streams

factors = sorted(list(set([key[:-1] for key in x])))
switch_indices = []
last_nodes = factors[0][0]
for n, f in enumerate(factors):
    if f[0] != last_nodes:
        switch_indices.append(n + len(switch_indices))
    last_nodes = f[0]
for n, idx in enumerate(switch_indices):
    factors.insert(idx, (None, n))

strides = sorted(list(set([key[-1] for key in x])))[::-1]

p = plot_utils.make_figure(
    title="Time to Run and Cost Per Second of Data vs. Scale",
    x_axis_label="Configuration",
    y_axis_label="Time to run per second of data (s / s')",
    x_range=FactorRange(*[_make_range(*key) for key in factors]),
)
p.yaxis[0].formatter = PrintfTickFormatter(format="%4.0e")
p.yaxis[0].axis_label_text_color = plot_utils.palette[0]

max_cost = max([max(v) for k, v in source.data.items() if k.startswith("y")])
p.extra_y_ranges = {"cost": Range1d(start=0, end=0.4*max_cost)}
p.add_layout(
    LinearAxis(
        y_range_name="cost",
        axis_label=f"Cost per second of data ({cost_unit})",
        axis_label_text_color=plot_utils.palette[1],
        formatter=p.yaxis[0].formatter
    ),
    "right"
)

hatches = [" ", "o", "x"]
hatches = {stride: hatch for stride, hatch in zip(strides, hatches)}

hatch_kwargs = {
    "hatch_weight": 0.5,
    "hatch_alpha": 0.9,
    "hatch_scale": 2.7
}
source = ColumnDataSource()
for i, key in enumerate(x):
    config = best_configs[key]
    x_time_col, y_time_col = f"x_time_{i}", f"y_time_{i}"
    x_cost_col, y_cost_col = f"x_cost_{i}", f"y_cost_{i}"

    (x_time, y_time), (x_cost, y_cost) = plot_utils.make_violin_patch(config)
    source.add(y_time, y_time_col)
    source.add(y_cost, y_cost_col)

    key = _make_range(*key[:-1])
    x_time = list(zip([key]*len(x_time), x_time))
    x_cost = list(zip([key]*len(x_cost), x_cost))
    source.add(x_time, x_time_col)
    source.add(x_cost, x_cost_col)

    p.patch(
        x_time_col,
        y_time_col,
        color=plot_utils.palette[0],
        fill_alpha=0.4,
        line_color="black",
        hatch_pattern=hatches[config.kernel_stride],
        source=source,
        **hatch_kwargs
    )
    p.patch(
        x_cost_col,
        y_cost_col,
        color=plot_utils.palette[1],
        fill_alpha=0.4,
        line_color="black",
        hatch_pattern=hatches[config.kernel_stride],
        source=source,
        y_range_name="cost",
        **hatch_kwargs
    )

items = []
for kernel_stride in strides:
    hatch_kwargs["hatch_pattern"] = hatches[kernel_stride]
    r = p.patch(
        [0, 0], [0, 0], fill_alpha=0.0, line_alpha=0.0, **hatch_kwargs
    )
    stride = int(kernel_stride * 1000)
    item = LegendItem(label=f"{stride} ms", renderers=[r])
    items.append(item)

p.add_layout(
    Legend(
        items=items,
        title="Stride",
        orientation="horizontal",
        border_line_alpha=0.0,
        margin=2,
        label_text_line_height=1.0
    ),
    "below"
)

p.y_range.start = 0
show(p)

In [12]:
p = plot_utils.make_figure(
    title="Time to Run and Cost Per Second of Data vs. Scale",
    x_axis_label="Configuration",
    y_axis_label="Time to run per second of data (s / s')",
    x_range=FactorRange(*[_make_range(*key) for key in factors]),
)
p.yaxis[0].formatter = PrintfTickFormatter(format="%4.0e")
p.extra_y_ranges = {"cost": Range1d(start=0, end=0.4*max_cost)}
p.add_layout(
    LinearAxis(
        y_range_name="cost",
        axis_label=f"Cost per second of data ({cost_unit})",
        formatter=p.yaxis[0].formatter
    ),
    "right"
)

colors = {stride: color for stride, color in zip(strides, plot_utils.palette)}
source = ColumnDataSource()
renderers = {}
for i, key in enumerate(x):
    config = best_configs[key]
    x_time_col, y_time_col = f"x_time_{i}", f"y_time_{i}"
    x_cost_col, y_cost_col = f"x_cost_{i}", f"y_cost_{i}"

    (x_time, y_time), (x_cost, y_cost) = plot_utils.make_violin_patch(config)
    source.add(y_time, y_time_col)
    source.add(y_cost, y_cost_col)

    key = _make_range(*key[:-1])
    x_time = list(zip([key]*len(x_time), x_time))
    x_cost = list(zip([key]*len(x_cost), x_cost))
    source.add(x_time, x_time_col)
    source.add(x_cost, x_cost_col)

    r1 = p.patch(
        x_time_col,
        y_time_col,
        color=colors[config.kernel_stride],
        fill_alpha=0.6,
        line_color="black",
        source=source
    )
    r2 = p.patch(
        x_cost_col,
        y_cost_col,
        color=colors[config.kernel_stride],
        fill_alpha=0.3,
        line_color="black",
        source=source,
        y_range_name="cost"
    )

    try:
        renderers[config.kernel_stride].extend([r1, r2])
    except KeyError:
        renderers[config.kernel_stride] = [r1, r2]

items = []
for kernel_stride, rs in renderers.items():
    stride = int(kernel_stride * 1000)
    item = LegendItem(label=f"{stride} ms", renderers=rs)
    items.append(item)

p.add_layout(
    Legend(
        items=items,
        title="Stride",
        orientation="horizontal",
        border_line_alpha=0.0,
        margin=2,
        label_text_line_height=1.0
    ),
    "below"
)
p.y_range.start = 0
show(p)

I think this plot, once appropriately filled out, should capture all the 2nd order information of the first two questions.

In the next couple of days I will start putting together what graphs I think help answer the optimization question.

In [13]:
config = plot_utils.get_configs(num_nodes=4, clients_per_node=5, kernel_stride=0.1)[0]
stats = plot_utils.load_stats_for_config(config)
for ip, df in stats.groupby("ip"):
    plot_utils.plot_throughput_vs_time(df)