# Reading End-to-End results

In this notebook, we read results from the end-to-end tests run by the `run-benchmarks.yml` github workflow, which uses locust to make requests against the titiler-xarray API endpoint. See the README for more details on how to run benchmarks.

## Plotting before and after.

It is desirable to run benchmarks before and after deploying the tiling service to determine if performance has changed. If a new feature is added, it is important to tell if performance has degraded. If a change has been made to improve performance, it is desirable to demonstrate improved performance. In this notebook, the results are labelled "before" and "after", where all but the latest test run is labeled as "before" and the most recent test run is "after".


In [1]:
import hvplot.pandas
import holoviews as hv
import pandas as pd
pd.options.plotting.backend = 'holoviews'
import numpy as np
import os
import sys
import warnings
warnings.filterwarnings('ignore')
sys.path.append('..')
from helpers.dataframe import csv_to_pandas, load_all_into_dataframe
from helpers.s3helpers import list_s3_files
from helpers.eodc_hub_role import fetch_and_set_credentials

In [2]:
dataset_specs_all = csv_to_pandas('zarr_info.csv')
dataset_specs_all.index = dataset_specs_all['collection_name']
ds_specs = dataset_specs_all[['chunks', 'shape_dict', 'chunk_size_mb', 'number_coordinate_chunks', 'number_of_spatial_chunks']]
#ds_specs

In [3]:
credentials = fetch_and_set_credentials()
bucket = 'nasa-eodc-data-store'
s3_list_response = list_s3_files(credentials, bucket_name=bucket, s3_prefix='tile-benchmarking-results')
s3_urls = [f"s3://{bucket}/{file['Key']}" for file in s3_list_response['Contents'] if 'urls_stats.csv' in file['Key']]


In [4]:
df = load_all_into_dataframe(credentials, s3_urls, 'csv')

In [5]:
df['test_time'] = [pd.to_datetime(path.split('/')[-2], format= '%Y-%m-%d_%H-%M-%S') for path in df['s3_url']]

In [6]:
df['dataset'] = [path.split('/')[-1].replace('_urls_stats.csv', '') for path in df['s3_url']]

In [8]:
most_recent_test_time = np.max(df['test_time'])
df['before_or_after'] = 'before'
df.loc[df['test_time'] == most_recent_test_time, 'before_or_after'] = 'after'

In [9]:
aggregated_df = df[df['Name'] == 'Aggregated']

In [10]:
datasets = set(aggregated_df['dataset'])

In [12]:
cmap = ["#994F00", "#006CD1"]

plt_opts = {"width": 400, "height": 300}

plts = []

for dataset in datasets:
    dataset_results = aggregated_df[aggregated_df['dataset'] == dataset]
    plts.append(
        dataset_results.hvplot.box(
            y="Average Response Time",
            by=["before_or_after"],
            c="before_or_after",
            cmap=cmap,
            ylabel="Average Response Time",
            xlabel="Before or After",
            legend=False,
            title=f"Dataset {dataset}",
        ).opts(**plt_opts)
    )
hv.Layout(plts).cols(2)